Statistics API

Statistical analysis functions and models.

Table of Contents

  1. Overview
  2. Basic Statistics
    1. mean
    2. median
    3. stddev
    4. quantile
  3. Hypothesis Testing
    1. ttest
    2. anova
  4. Generalized Linear Models
    1. GLM
      1. Constructor Options
      2. Methods
        1. .fit()
        2. .predict()
        3. .summary()
      3. Examples
  5. Formula Syntax
  6. Model Diagnostics
    1. Coefficients
    2. Fitted Values
    3. Residuals
    4. Model Metrics
  7. Common Patterns
    1. Test for Group Differences
    2. Predict Continuous Outcome
    3. Predict Binary Outcome
    4. Account for Hierarchical Data
  8. See Also

Overview

The ds.stats module provides statistical functions for:


Basic Statistics

mean

Calculate the arithmetic mean.

ds.stats.mean(values)

Parameters:

Returns: number

Example:

const avg = ds.stats.mean([1, 2, 3, 4, 5]);
// 3

median

Calculate the median value.

ds.stats.median(values)

Parameters:

Returns: number

Example:

const mid = ds.stats.median([1, 2, 3, 4, 5]);
// 3

stddev

Calculate the standard deviation.

ds.stats.stddev(values, options)

Parameters:

Returns: number

Example:

const sd = ds.stats.stddev([1, 2, 3, 4, 5]);
// Sample standard deviation

quantile

Calculate quantiles/percentiles.

ds.stats.quantile(values, q)

Parameters:

Returns: number Array

Example:

// Single quantile
const q25 = ds.stats.quantile(data, 0.25);

// Multiple quantiles
const [q25, q50, q75] = ds.stats.quantile(data, [0.25, 0.5, 0.75]);

Hypothesis Testing

ttest

Perform Student’s t-test for two independent samples.

ds.stats.ttest(sample1, sample2, options)

Parameters:

Returns: Object

{
  statistic: number,     // t-statistic
  pValue: number,        // p-value
  degreesOfFreedom: number,
  mean1: number,         // Mean of sample1
  mean2: number,         // Mean of sample2
  alternative: string
}

Example:

const groupA = [5.1, 4.9, 4.7, 4.6, 5.0];
const groupB = [7.0, 6.4, 6.9, 6.5, 6.3];

const result = ds.stats.ttest(groupA, groupB);
console.log(`t = ${result.statistic}, p = ${result.pValue}`);
// If p < 0.05, reject null hypothesis (means are different)

anova

Perform one-way ANOVA.

ds.stats.anova(groups)

Parameters:

Returns: Object

{
  statistic: number,     // F-statistic
  pValue: number,        // p-value
  dfBetween: number,     // Degrees of freedom between groups
  dfWithin: number,      // Degrees of freedom within groups
  ssBetween: number,     // Sum of squares between groups
  ssWithin: number       // Sum of squares within groups
}

Example:

const groupA = [5.1, 4.9, 4.7];
const groupB = [6.0, 5.8, 6.2];
const groupC = [7.0, 6.9, 7.1];

const result = ds.stats.anova([groupA, groupB, groupC]);
console.log(`F = ${result.statistic}, p = ${result.pValue}`);

Generalized Linear Models

GLM

Fit generalized linear models including linear regression, logistic regression, and more.

new ds.stats.GLM(options)

Constructor Options

{
  family: string,          // 'gaussian', 'binomial', 'poisson'
  link: string,           // 'identity', 'log', 'logit', 'probit', etc.
  randomEffects: Object,  // For mixed-effects models (optional)
  multiclass: string      // 'ovr' (one-vs-rest) for multiclass (optional)
}

Common Configurations:

Model Type Family Link Use For
Linear Regression gaussian identity Continuous outcomes
Logistic Regression binomial logit Binary outcomes (0/1)
Poisson Regression poisson log Count data

Methods

.fit()

Fit the model to data.

Array API:

model.fit(X, y)

Table API:

model.fit({
  data: myData,
  X: ['feature1', 'feature2'],
  y: 'outcome'
})

Formula API:

model.fit({
  formula: 'outcome ~ feature1 + feature2',
  data: myData
})
.predict()

Make predictions.

Array API:

const predictions = model.predict(XNew)

Table API:

const predictions = model.predict({
  data: newData,
  X: ['feature1', 'feature2']
})
.summary()

Get model summary statistics.

const summary = model.summary()
console.log(summary)

Returns formatted string with:

Examples

Linear Regression:

const lm = new ds.stats.GLM({ family: 'gaussian' });

lm.fit({
  X: ['height', 'weight'],
  y: 'blood_pressure',
  data: healthData
});

console.log(lm.summary());

const predictions = lm.predict({
  data: newPatients,
  X: ['height', 'weight']
});

Logistic Regression:

const logit = new ds.stats.GLM({ family: 'binomial', link: 'logit' });

logit.fit({
  X: ['beak_length', 'flipper_length'],
  y: 'is_adelie',  // Binary: 0 or 1
  data: penguins
});

const probabilities = logit.predict({
  data: newPenguins,
  X: ['beak_length', 'flipper_length']
});

Multiclass Logistic Regression:

const multiclass = new ds.stats.GLM({
  family: 'binomial',
  multiclass: 'ovr'  // One-vs-rest strategy
});

multiclass.fit({
  X: ['feature1', 'feature2'],
  y: 'species',  // 3+ categories
  data: myData
});

// Fits 3 binary models (one per class)
// Predicts using the model with highest probability

Mixed-Effects Model:

const lme = new ds.stats.GLM({
  family: 'gaussian',
  randomEffects: {
    intercept: data.map(d => d.site)  // Random intercepts by site
  }
});

lme.fit({
  X: ['treatment', 'age'],
  y: 'response',
  data: nestedData
});

console.log(lme.summary());
// Shows fixed effects and random effects variance

Formula Syntax

GLM supports R-style formula syntax:

// Simple linear model
formula: 'y ~ x'

// Multiple predictors
formula: 'y ~ x1 + x2 + x3'

// Interactions
formula: 'y ~ x1 * x2'  // x1 + x2 + x1:x2

// Categorical variables (auto-encoded)
formula: 'price ~ carat + cut + color'

Model Diagnostics

Coefficients

model.coefficients
// { '(Intercept)': 2.5, 'feature1': 0.8, 'feature2': -0.3 }

Fitted Values

model.fitted
// [2.1, 3.4, 2.8, ...]

Residuals

model.residuals
// [-0.1, 0.2, -0.05, ...]

Model Metrics

model.deviance        // Model deviance
model.nullDeviance    // Null deviance
model.aic             // Akaike Information Criterion
model.bic             // Bayesian Information Criterion
model.pseudoR2        // Pseudo R² (for GLMs)

Common Patterns

Test for Group Differences

// t-test for 2 groups
const result = ds.stats.ttest(groupA, groupB);

// ANOVA for 3+ groups
const result = ds.stats.anova([groupA, groupB, groupC]);

Predict Continuous Outcome

const model = new ds.stats.GLM({ family: 'gaussian' });
model.fit({ X: features, y: 'price', data: myData });
const predictions = model.predict({ data: newData, X: features });

Predict Binary Outcome

const model = new ds.stats.GLM({ family: 'binomial' });
model.fit({ X: features, y: 'is_fraud', data: transactions });
const probabilities = model.predict({ data: newTransactions, X: features });

Account for Hierarchical Data

const model = new ds.stats.GLM({
  family: 'gaussian',
  randomEffects: { intercept: data.map(d => d.groupId) }
});
model.fit({ X: features, y: 'outcome', data: nestedData });

See Also