Statistics API

Statistical analysis functions and models.

Table of Contents

  1. Overview
  2. Distributions
    1. normal
    2. uniform
    3. gamma
    4. beta
  3. Hypothesis Testing
    1. Functional API
      1. oneSampleTTest
      2. twoSampleTTest
      3. chiSquareTest
      4. oneWayAnova
      5. pairedTTest
      6. mannWhitneyU
      7. kruskalWallis
      8. tukeyHSD
      9. fisherExactTest
      10. leveneTest
    2. Correlation Tests
      1. pearsonCorrelation
      2. spearmanCorrelation
    3. Effect Sizes
      1. cohensD
      2. etaSquared
      3. omegaSquared
    4. Multiple Testing Corrections
      1. bonferroni
      2. holmBonferroni
      3. fdr
    5. Class-based API
      1. OneSampleTTest
      2. TwoSampleTTest
      3. ChiSquareTest
      4. OneWayAnova
      5. TukeyHSD
  4. Generalized Linear Models
    1. GLM
      1. Constructor Options
      2. Methods
        1. .fit()
        2. .predict()
        3. .summary()
      3. Properties
      4. Examples
  5. Formula Syntax
  6. Model Comparison
    1. compareModels
    2. likelihoodRatioTest
    3. pairwiseLRT
    4. crossValidate / crossValidateModels
  7. Common Patterns
    1. Test for Group Differences
    2. Predict Continuous Outcome
    3. Predict Binary Outcome
    4. Account for Hierarchical Data
  8. See Also

Overview

The ds.stats module provides statistical functions for:

  • Hypothesis testing (t-tests, ANOVA, chi-square)
  • Generalized Linear Models (GLM)
  • Mixed-effects models (GLMM)
  • Probability distributions
  • Model comparison and selection

Distributions

normal

Normal (Gaussian) distribution functions.

ds.stats.normal.pdf(x, { mean, sd })
ds.stats.normal.cdf(x, { mean, sd })
ds.stats.normal.quantile(p, { mean, sd })

Parameters:

  • x (number): Value to evaluate
  • mean (number): Mean (default: 0)
  • sd (number): Standard deviation (default: 1)

Example:

ds.stats.normal.pdf(0, { mean: 0, sd: 1 });
ds.stats.normal.cdf(1.96, { mean: 0, sd: 1 }); // ~0.975
ds.stats.normal.quantile(0.975, { mean: 0, sd: 1 }); // ~1.96

uniform

Uniform distribution functions.

ds.stats.uniform.pdf(x, { min, max })
ds.stats.uniform.cdf(x, { min, max })
ds.stats.uniform.quantile(p, { min, max })

gamma

Gamma distribution functions.

ds.stats.gamma.pdf(x, { shape, scale })
ds.stats.gamma.cdf(x, { shape, scale })
ds.stats.gamma.quantile(p, { shape, scale })

beta

Beta distribution functions.

ds.stats.beta.pdf(x, { alpha, beta })
ds.stats.beta.cdf(x, { alpha, beta })
ds.stats.beta.quantile(p, { alpha, beta })

Hypothesis Testing

Functional API

Convenience functions that return result objects directly.

oneSampleTTest

ds.stats.hypothesis.oneSampleTTest(data, { mu0 })

Parameters:

  • data (Array<number>): Sample data
  • mu0 (number): Hypothesized mean (default: 0)

Returns: Object

{
  statistic: number,     // t-statistic
  pValue: number,        // p-value
  df: number,            // degrees of freedom
  mean: number,          // sample mean
  alternative: string
}

twoSampleTTest

ds.stats.hypothesis.twoSampleTTest(sample1, sample2, options)

Parameters:

  • sample1 (Array<number>): First sample
  • sample2 (Array<number>): Second sample
  • options (Object, optional):
    • alternative (string): 'two-sided', 'less', or 'greater' (default: 'two-sided')
    • mu (number): Hypothesized difference in means (default: 0)

Returns: Object

{
  statistic: number,
  pValue: number,
  df: number,
  mean1: number,
  mean2: number,
  pooledSE: number,
  alternative: string
}

Example:

const groupA = [5.1, 4.9, 4.7, 4.6, 5.0];
const groupB = [7.0, 6.4, 6.9, 6.5, 6.3];

const result = ds.stats.hypothesis.twoSampleTTest(groupA, groupB);
console.log(`t = ${result.statistic}, p = ${result.pValue}`);

chiSquareTest

ds.stats.hypothesis.chiSquareTest(observed, expected)

Parameters:

  • observed (Array<number>): Observed frequencies
  • expected (Array<number>): Expected frequencies

Returns: Object with statistic, pValue, df


oneWayAnova

ds.stats.hypothesis.oneWayAnova(groups)

Parameters:

  • groups (Array<Array<number>>): Array of groups to compare

Returns: Object

{
  statistic: number,     // F-statistic
  pValue: number,
  dfBetween: number,
  dfWithin: number,
  ssBetween: number,
  ssWithin: number
}

Example:

const groupA = [5.1, 4.9, 4.7];
const groupB = [6.0, 5.8, 6.2];
const groupC = [7.0, 6.9, 7.1];

const result = ds.stats.hypothesis.oneWayAnova([groupA, groupB, groupC]);
console.log(`F = ${result.statistic}, p = ${result.pValue}`);

pairedTTest

Paired samples t-test for dependent samples.

ds.stats.pairedTTest(sample1, sample2, options)

Parameters:

  • sample1 (Array<number>): First sample (must match length of sample2)
  • sample2 (Array<number>): Second sample
  • options.alternative (string): 'two-sided', 'less', or 'greater'

Returns: Object with statistic, pValue, df, meanDiff


mannWhitneyU

Non-parametric test for comparing two independent samples.

ds.stats.mannWhitneyU(sample1, sample2)

Returns: Object with statistic (U), pValue


kruskalWallis

Non-parametric alternative to one-way ANOVA for comparing multiple groups.

ds.stats.kruskalWallis(groups)

Parameters:

  • groups (Array<Array<number>>): Array of groups to compare

Returns: Object with statistic (H), pValue, df


tukeyHSD

Post-hoc pairwise comparisons after ANOVA.

ds.stats.hypothesis.tukeyHSD(groups, options)

Parameters:

  • groups (Array<Array<number>> Object): Groups to compare, or { data, y, group } format
  • options.names (Array<string>): Optional group names

Returns: Object with comparisons array containing pairwise results

Example:

const result = ds.stats.hypothesis.tukeyHSD([groupA, groupB, groupC], {
  names: ['Control', 'Treatment1', 'Treatment2']
});
// result.comparisons: [{comparison: 'Control - Treatment1', diff, pValue, ci}, ...]

fisherExactTest

Fisher’s exact test for 2x2 contingency tables.

ds.stats.fisherExactTest(table)

Parameters:

  • table (Array<Array<number>>): 2x2 contingency table [[a, b], [c, d]]

Returns: Object with pValue, oddsRatio


leveneTest

Test for homogeneity of variances across groups.

ds.stats.leveneTest(groups)

Parameters:

  • groups (Array<Array<number>>): Array of groups

Returns: Object with statistic, pValue, df1, df2


Correlation Tests

pearsonCorrelation

Pearson correlation coefficient with significance test.

ds.stats.pearsonCorrelation(x, y)

Returns: Object

{
  r: number,        // Correlation coefficient (-1 to 1)
  pValue: number,   // Two-tailed p-value
  n: number         // Sample size
}

spearmanCorrelation

Spearman rank correlation coefficient.

ds.stats.spearmanCorrelation(x, y)

Returns: Object with rho, pValue, n


Effect Sizes

cohensD

Cohen’s d for standardized mean difference between two groups.

ds.stats.cohensD(sample1, sample2)

Returns: number (effect size)

Interpretation:

  • d < 0.2: Negligible
  • d < 0.5: Small
  • d < 0.8: Medium
  • d >= 0.8: Large

etaSquared

Eta-squared effect size for ANOVA.

ds.stats.etaSquared(groups)

Returns: number (proportion of variance explained, 0 to 1)


omegaSquared

Omega-squared effect size for ANOVA (less biased than eta-squared).

ds.stats.omegaSquared(groups)

Returns: number


Multiple Testing Corrections

bonferroni

Bonferroni correction for multiple comparisons.

ds.stats.bonferroni(pValues, alpha)

Parameters:

  • pValues (Array<number>): Array of p-values
  • alpha (number): Significance level (default: 0.05)

Returns: Object with adjusted (corrected p-values), significant (boolean array)


holmBonferroni

Holm-Bonferroni step-down correction (less conservative than Bonferroni).

ds.stats.holmBonferroni(pValues, alpha)

Returns: Object with adjusted, significant


fdr

Benjamini-Hochberg False Discovery Rate correction.

ds.stats.fdr(pValues, alpha)

Returns: Object with adjusted, significant

Example:

const pValues = [0.01, 0.03, 0.05, 0.15, 0.20];
const result = ds.stats.fdr(pValues, 0.05);
console.log(result.adjusted);    // Adjusted p-values
console.log(result.significant); // [true, true, false, false, false]

Class-based API

Estimator-style classes that follow the fit/predict pattern.

OneSampleTTest

new ds.stats.OneSampleTTest(options)

Methods:

  • .fit(data) - Perform the test
  • .summary() - Formatted summary

TwoSampleTTest

new ds.stats.TwoSampleTTest(options)

ChiSquareTest

new ds.stats.ChiSquareTest()

OneWayAnova

new ds.stats.OneWayAnova()

TukeyHSD

new ds.stats.TukeyHSD()

Post-hoc pairwise comparisons with Tukey’s Honest Significant Difference.

const tukey = new ds.stats.TukeyHSD();
tukey.fit({ data: myData, y: 'value', group: 'treatment' });
console.log(tukey.summary());

Generalized Linear Models

GLM

Fit generalized linear models including linear regression, logistic regression, and more.

new ds.stats.GLM(options)

Constructor Options

{
  family: string,          // 'gaussian', 'binomial', 'poisson', 'gamma'
  link: string,            // 'identity', 'log', 'logit', 'probit', etc.
  randomEffects: Object,   // For mixed-effects models (optional)
  multiclass: string       // 'ovr' (one-vs-rest) for multiclass (optional)
}

Common Configurations:

Model Type Family Link Use For
Linear Regression gaussian identity Continuous outcomes
Logistic Regression binomial logit Binary outcomes (0/1)
Poisson Regression poisson log Count data
Gamma Regression gamma inverse Positive continuous, skewed

Regularization:

GLM supports L2 (Ridge) regularization via the alpha and l1_ratio parameters:

new ds.stats.GLM({
  family: 'gaussian',
  alpha: 0.1,      // Regularization strength
  l1_ratio: 0.0    // 0 = pure L2 (Ridge), 1 = pure L1 (Lasso)
})

Known Limitation: L1 (Lasso) regularization is not yet implemented. Setting l1_ratio > 0 will fall back to L2-only regularization with a console warning. Elastic Net and Lasso require coordinate descent optimization which is planned for a future release.

Methods

.fit()

Fit the model to data. Supports three input styles:

Array API:

model.fit(X, y)

Table API:

model.fit({
  data: myData,
  X: ['feature1', 'feature2'],
  y: 'outcome'
})

Formula API:

model.fit({
  formula: 'outcome ~ feature1 + feature2',
  data: myData
})
.predict()

Make predictions.

Array API:

const predictions = model.predict(XNew)

Table API:

const predictions = model.predict({
  data: newData,
  X: ['feature1', 'feature2']
})
.summary()

Get model summary statistics.

const summary = model.summary()
console.log(summary)

Returns formatted string with:

  • Coefficient estimates
  • Standard errors
  • z-values
  • Confidence intervals
  • Model fit statistics (AIC, BIC, R-squared)

Properties

model.coefficients   // { '(Intercept)': 2.5, 'feature1': 0.8, ... }
model.fitted         // [2.1, 3.4, 2.8, ...]
model.residuals      // [-0.1, 0.2, -0.05, ...]
model.deviance       // Model deviance
model.nullDeviance   // Null deviance
model.aic            // Akaike Information Criterion
model.bic            // Bayesian Information Criterion
model.pseudoR2       // Pseudo R-squared (for GLMs)

Examples

Linear Regression:

const lm = new ds.stats.GLM({ family: 'gaussian' });

lm.fit({
  X: ['height', 'weight'],
  y: 'blood_pressure',
  data: healthData
});

console.log(lm.summary());

const predictions = lm.predict({
  data: newPatients,
  X: ['height', 'weight']
});

Logistic Regression:

const logit = new ds.stats.GLM({ family: 'binomial', link: 'logit' });

logit.fit({
  X: ['beak_length', 'flipper_length'],
  y: 'is_adelie',
  data: penguins
});

const probabilities = logit.predict({
  data: newPenguins,
  X: ['beak_length', 'flipper_length']
});

Multiclass Logistic Regression:

const multiclass = new ds.stats.GLM({
  family: 'binomial',
  multiclass: 'ovr'
});

multiclass.fit({
  X: ['feature1', 'feature2'],
  y: 'species',
  data: myData
});

Mixed-Effects Model (GLMM):

const lme = new ds.stats.GLM({
  family: 'gaussian',
  randomEffects: {
    intercept: data.map(d => d.site)
  }
});

lme.fit({
  X: ['treatment', 'age'],
  y: 'response',
  data: nestedData
});

console.log(lme.summary());
// Shows fixed effects and random effects variance

Formula Syntax

GLM supports R-style formula syntax:

// Simple linear model
formula: 'y ~ x'

// Multiple predictors
formula: 'y ~ x1 + x2 + x3'

// Interactions
formula: 'y ~ x1 * x2'  // x1 + x2 + x1:x2

// Transformations
formula: 'y ~ log(x1) + sqrt(x2)'

// Polynomials
formula: 'y ~ poly(x, 3)'

// Mixed effects
formula: 'y ~ x1 + (1 | group)'
formula: 'y ~ x1 + (1 + time | subject)'

Model Comparison

compareModels

Compare multiple models with AIC/BIC.

const comparison = ds.stats.compareModels([model1, model2, model3])

likelihoodRatioTest

Likelihood ratio test for nested models.

const lrt = ds.stats.likelihoodRatioTest(model1, model2)

pairwiseLRT

Pairwise likelihood ratio tests across all model pairs.

const results = ds.stats.pairwiseLRT([model1, model2, model3])

crossValidate / crossValidateModels

Cross-validate one or more models.

ds.stats.crossValidate(model, data, options)
ds.stats.crossValidateModels([model1, model2], data, options)

Common Patterns

Test for Group Differences

// t-test for 2 groups
const result = ds.stats.hypothesis.twoSampleTTest(groupA, groupB);

// ANOVA for 3+ groups
const result = ds.stats.hypothesis.oneWayAnova([groupA, groupB, groupC]);

Predict Continuous Outcome

const model = new ds.stats.GLM({ family: 'gaussian' });
model.fit({ X: features, y: 'price', data: myData });
const predictions = model.predict({ data: newData, X: features });

Predict Binary Outcome

const model = new ds.stats.GLM({ family: 'binomial' });
model.fit({ X: features, y: 'is_fraud', data: transactions });
const probabilities = model.predict({ data: newTransactions, X: features });

Account for Hierarchical Data

const model = new ds.stats.GLM({
  family: 'gaussian',
  randomEffects: { intercept: data.map(d => d.groupId) }
});
model.fit({ X: features, y: 'outcome', data: nestedData });

See Also


Back to top

This site uses Just the Docs, a documentation theme for Jekyll.