Statistics API
Statistical analysis functions and models.
Table of Contents
- Overview
- Distributions
- Hypothesis Testing
- Generalized Linear Models
- Formula Syntax
- Model Comparison
- Common Patterns
- See Also
Overview
The ds.stats module provides statistical functions for:
- Hypothesis testing (t-tests, ANOVA, chi-square)
- Generalized Linear Models (GLM)
- Mixed-effects models (GLMM)
- Probability distributions
- Model comparison and selection
Distributions
normal
Normal (Gaussian) distribution functions.
ds.stats.normal.pdf(x, { mean, sd })
ds.stats.normal.cdf(x, { mean, sd })
ds.stats.normal.quantile(p, { mean, sd })
Parameters:
x(number): Value to evaluatemean(number): Mean (default: 0)sd(number): Standard deviation (default: 1)
Example:
ds.stats.normal.pdf(0, { mean: 0, sd: 1 });
ds.stats.normal.cdf(1.96, { mean: 0, sd: 1 }); // ~0.975
ds.stats.normal.quantile(0.975, { mean: 0, sd: 1 }); // ~1.96
uniform
Uniform distribution functions.
ds.stats.uniform.pdf(x, { min, max })
ds.stats.uniform.cdf(x, { min, max })
ds.stats.uniform.quantile(p, { min, max })
gamma
Gamma distribution functions.
ds.stats.gamma.pdf(x, { shape, scale })
ds.stats.gamma.cdf(x, { shape, scale })
ds.stats.gamma.quantile(p, { shape, scale })
beta
Beta distribution functions.
ds.stats.beta.pdf(x, { alpha, beta })
ds.stats.beta.cdf(x, { alpha, beta })
ds.stats.beta.quantile(p, { alpha, beta })
Hypothesis Testing
Functional API
Convenience functions that return result objects directly.
oneSampleTTest
ds.stats.hypothesis.oneSampleTTest(data, { mu0 })
Parameters:
data(Array<number>): Sample datamu0(number): Hypothesized mean (default: 0)
Returns: Object
{
statistic: number, // t-statistic
pValue: number, // p-value
df: number, // degrees of freedom
mean: number, // sample mean
alternative: string
}
twoSampleTTest
ds.stats.hypothesis.twoSampleTTest(sample1, sample2, options)
Parameters:
sample1(Array<number>): First samplesample2(Array<number>): Second sampleoptions(Object, optional):alternative(string):'two-sided','less', or'greater'(default:'two-sided')mu(number): Hypothesized difference in means (default: 0)
Returns: Object
{
statistic: number,
pValue: number,
df: number,
mean1: number,
mean2: number,
pooledSE: number,
alternative: string
}
Example:
const groupA = [5.1, 4.9, 4.7, 4.6, 5.0];
const groupB = [7.0, 6.4, 6.9, 6.5, 6.3];
const result = ds.stats.hypothesis.twoSampleTTest(groupA, groupB);
console.log(`t = ${result.statistic}, p = ${result.pValue}`);
chiSquareTest
ds.stats.hypothesis.chiSquareTest(observed, expected)
Parameters:
observed(Array<number>): Observed frequenciesexpected(Array<number>): Expected frequencies
Returns: Object with statistic, pValue, df
oneWayAnova
ds.stats.hypothesis.oneWayAnova(groups)
Parameters:
groups(Array<Array<number>>): Array of groups to compare
Returns: Object
{
statistic: number, // F-statistic
pValue: number,
dfBetween: number,
dfWithin: number,
ssBetween: number,
ssWithin: number
}
Example:
const groupA = [5.1, 4.9, 4.7];
const groupB = [6.0, 5.8, 6.2];
const groupC = [7.0, 6.9, 7.1];
const result = ds.stats.hypothesis.oneWayAnova([groupA, groupB, groupC]);
console.log(`F = ${result.statistic}, p = ${result.pValue}`);
pairedTTest
Paired samples t-test for dependent samples.
ds.stats.pairedTTest(sample1, sample2, options)
Parameters:
sample1(Array<number>): First sample (must match length of sample2)sample2(Array<number>): Second sampleoptions.alternative(string):'two-sided','less', or'greater'
Returns: Object with statistic, pValue, df, meanDiff
mannWhitneyU
Non-parametric test for comparing two independent samples.
ds.stats.mannWhitneyU(sample1, sample2)
Returns: Object with statistic (U), pValue
kruskalWallis
Non-parametric alternative to one-way ANOVA for comparing multiple groups.
ds.stats.kruskalWallis(groups)
Parameters:
groups(Array<Array<number>>): Array of groups to compare
Returns: Object with statistic (H), pValue, df
tukeyHSD
Post-hoc pairwise comparisons after ANOVA.
ds.stats.hypothesis.tukeyHSD(groups, options)
Parameters:
-
groups(Array<Array<number>>Object): Groups to compare, or { data, y, group }format options.names(Array<string>): Optional group names
Returns: Object with comparisons array containing pairwise results
Example:
const result = ds.stats.hypothesis.tukeyHSD([groupA, groupB, groupC], {
names: ['Control', 'Treatment1', 'Treatment2']
});
// result.comparisons: [{comparison: 'Control - Treatment1', diff, pValue, ci}, ...]
fisherExactTest
Fisher’s exact test for 2x2 contingency tables.
ds.stats.fisherExactTest(table)
Parameters:
table(Array<Array<number>>): 2x2 contingency table[[a, b], [c, d]]
Returns: Object with pValue, oddsRatio
leveneTest
Test for homogeneity of variances across groups.
ds.stats.leveneTest(groups)
Parameters:
groups(Array<Array<number>>): Array of groups
Returns: Object with statistic, pValue, df1, df2
Correlation Tests
pearsonCorrelation
Pearson correlation coefficient with significance test.
ds.stats.pearsonCorrelation(x, y)
Returns: Object
{
r: number, // Correlation coefficient (-1 to 1)
pValue: number, // Two-tailed p-value
n: number // Sample size
}
spearmanCorrelation
Spearman rank correlation coefficient.
ds.stats.spearmanCorrelation(x, y)
Returns: Object with rho, pValue, n
Effect Sizes
cohensD
Cohen’s d for standardized mean difference between two groups.
ds.stats.cohensD(sample1, sample2)
Returns: number (effect size)
Interpretation:
-
d < 0.2: Negligible -
d < 0.5: Small -
d < 0.8: Medium -
d >= 0.8: Large
etaSquared
Eta-squared effect size for ANOVA.
ds.stats.etaSquared(groups)
Returns: number (proportion of variance explained, 0 to 1)
omegaSquared
Omega-squared effect size for ANOVA (less biased than eta-squared).
ds.stats.omegaSquared(groups)
Returns: number
Multiple Testing Corrections
bonferroni
Bonferroni correction for multiple comparisons.
ds.stats.bonferroni(pValues, alpha)
Parameters:
pValues(Array<number>): Array of p-valuesalpha(number): Significance level (default: 0.05)
Returns: Object with adjusted (corrected p-values), significant (boolean array)
holmBonferroni
Holm-Bonferroni step-down correction (less conservative than Bonferroni).
ds.stats.holmBonferroni(pValues, alpha)
Returns: Object with adjusted, significant
fdr
Benjamini-Hochberg False Discovery Rate correction.
ds.stats.fdr(pValues, alpha)
Returns: Object with adjusted, significant
Example:
const pValues = [0.01, 0.03, 0.05, 0.15, 0.20];
const result = ds.stats.fdr(pValues, 0.05);
console.log(result.adjusted); // Adjusted p-values
console.log(result.significant); // [true, true, false, false, false]
Class-based API
Estimator-style classes that follow the fit/predict pattern.
OneSampleTTest
new ds.stats.OneSampleTTest(options)
Methods:
.fit(data)- Perform the test.summary()- Formatted summary
TwoSampleTTest
new ds.stats.TwoSampleTTest(options)
ChiSquareTest
new ds.stats.ChiSquareTest()
OneWayAnova
new ds.stats.OneWayAnova()
TukeyHSD
new ds.stats.TukeyHSD()
Post-hoc pairwise comparisons with Tukey’s Honest Significant Difference.
const tukey = new ds.stats.TukeyHSD();
tukey.fit({ data: myData, y: 'value', group: 'treatment' });
console.log(tukey.summary());
Generalized Linear Models
GLM
Fit generalized linear models including linear regression, logistic regression, and more.
new ds.stats.GLM(options)
Constructor Options
{
family: string, // 'gaussian', 'binomial', 'poisson', 'gamma'
link: string, // 'identity', 'log', 'logit', 'probit', etc.
randomEffects: Object, // For mixed-effects models (optional)
multiclass: string // 'ovr' (one-vs-rest) for multiclass (optional)
}
Common Configurations:
| Model Type | Family | Link | Use For |
|---|---|---|---|
| Linear Regression | gaussian |
identity |
Continuous outcomes |
| Logistic Regression | binomial |
logit |
Binary outcomes (0/1) |
| Poisson Regression | poisson |
log |
Count data |
| Gamma Regression | gamma |
inverse |
Positive continuous, skewed |
Regularization:
GLM supports L2 (Ridge) regularization via the alpha and l1_ratio parameters:
new ds.stats.GLM({
family: 'gaussian',
alpha: 0.1, // Regularization strength
l1_ratio: 0.0 // 0 = pure L2 (Ridge), 1 = pure L1 (Lasso)
})
Known Limitation: L1 (Lasso) regularization is not yet implemented. Setting
l1_ratio > 0will fall back to L2-only regularization with a console warning. Elastic Net and Lasso require coordinate descent optimization which is planned for a future release.
Methods
.fit()
Fit the model to data. Supports three input styles:
Array API:
model.fit(X, y)
Table API:
model.fit({
data: myData,
X: ['feature1', 'feature2'],
y: 'outcome'
})
Formula API:
model.fit({
formula: 'outcome ~ feature1 + feature2',
data: myData
})
.predict()
Make predictions.
Array API:
const predictions = model.predict(XNew)
Table API:
const predictions = model.predict({
data: newData,
X: ['feature1', 'feature2']
})
.summary()
Get model summary statistics.
const summary = model.summary()
console.log(summary)
Returns formatted string with:
- Coefficient estimates
- Standard errors
- z-values
- Confidence intervals
- Model fit statistics (AIC, BIC, R-squared)
Properties
model.coefficients // { '(Intercept)': 2.5, 'feature1': 0.8, ... }
model.fitted // [2.1, 3.4, 2.8, ...]
model.residuals // [-0.1, 0.2, -0.05, ...]
model.deviance // Model deviance
model.nullDeviance // Null deviance
model.aic // Akaike Information Criterion
model.bic // Bayesian Information Criterion
model.pseudoR2 // Pseudo R-squared (for GLMs)
Examples
Linear Regression:
const lm = new ds.stats.GLM({ family: 'gaussian' });
lm.fit({
X: ['height', 'weight'],
y: 'blood_pressure',
data: healthData
});
console.log(lm.summary());
const predictions = lm.predict({
data: newPatients,
X: ['height', 'weight']
});
Logistic Regression:
const logit = new ds.stats.GLM({ family: 'binomial', link: 'logit' });
logit.fit({
X: ['beak_length', 'flipper_length'],
y: 'is_adelie',
data: penguins
});
const probabilities = logit.predict({
data: newPenguins,
X: ['beak_length', 'flipper_length']
});
Multiclass Logistic Regression:
const multiclass = new ds.stats.GLM({
family: 'binomial',
multiclass: 'ovr'
});
multiclass.fit({
X: ['feature1', 'feature2'],
y: 'species',
data: myData
});
Mixed-Effects Model (GLMM):
const lme = new ds.stats.GLM({
family: 'gaussian',
randomEffects: {
intercept: data.map(d => d.site)
}
});
lme.fit({
X: ['treatment', 'age'],
y: 'response',
data: nestedData
});
console.log(lme.summary());
// Shows fixed effects and random effects variance
Formula Syntax
GLM supports R-style formula syntax:
// Simple linear model
formula: 'y ~ x'
// Multiple predictors
formula: 'y ~ x1 + x2 + x3'
// Interactions
formula: 'y ~ x1 * x2' // x1 + x2 + x1:x2
// Transformations
formula: 'y ~ log(x1) + sqrt(x2)'
// Polynomials
formula: 'y ~ poly(x, 3)'
// Mixed effects
formula: 'y ~ x1 + (1 | group)'
formula: 'y ~ x1 + (1 + time | subject)'
Model Comparison
compareModels
Compare multiple models with AIC/BIC.
const comparison = ds.stats.compareModels([model1, model2, model3])
likelihoodRatioTest
Likelihood ratio test for nested models.
const lrt = ds.stats.likelihoodRatioTest(model1, model2)
pairwiseLRT
Pairwise likelihood ratio tests across all model pairs.
const results = ds.stats.pairwiseLRT([model1, model2, model3])
crossValidate / crossValidateModels
Cross-validate one or more models.
ds.stats.crossValidate(model, data, options)
ds.stats.crossValidateModels([model1, model2], data, options)
Common Patterns
Test for Group Differences
// t-test for 2 groups
const result = ds.stats.hypothesis.twoSampleTTest(groupA, groupB);
// ANOVA for 3+ groups
const result = ds.stats.hypothesis.oneWayAnova([groupA, groupB, groupC]);
Predict Continuous Outcome
const model = new ds.stats.GLM({ family: 'gaussian' });
model.fit({ X: features, y: 'price', data: myData });
const predictions = model.predict({ data: newData, X: features });
Predict Binary Outcome
const model = new ds.stats.GLM({ family: 'binomial' });
model.fit({ X: features, y: 'is_fraud', data: transactions });
const probabilities = model.predict({ data: newTransactions, X: features });
Account for Hierarchical Data
const model = new ds.stats.GLM({
family: 'gaussian',
randomEffects: { intercept: data.map(d => d.groupId) }
});
model.fit({ X: features, y: 'outcome', data: nestedData });
See Also
- Machine Learning API - Supervised learning models
- Visualization API - Diagnostic plots for GLMs