Multivariate Analysis API

Ordination, canonical correlation, and clustering utilities.

Table of Contents

  1. Overview
    1. Supported Estimators
  2. Data Inputs
    1. Array API
    2. Declarative Table API
  3. Principal Component Analysis (PCA)
    1. Constructor
    2. Methods
    3. Examples
  4. Linear Discriminant Analysis (LDA)
    1. Constructor
    2. Methods
    3. Example
  5. Redundancy Analysis (RDA)
    1. Constructor
    2. Methods
    3. Example
  6. Canonical Correlation Analysis (CCA)
    1. Constructor
    2. Methods
    3. Example
  7. Hierarchical Clustering (HCA)
    1. Constructor
    2. Methods
    3. Example
  8. K-Means Clustering
    1. Constructor
    2. Methods
    3. Example
  9. Workflow Tips

Overview

The ds.mva namespace covers techniques that explain structure in multiple variables at once:

These estimators understand both the numeric Array API (Array<Array<number>>) and the declarative Table API ({ data, X, y }). Use ds.plot.ordiplot(model).show(Plot) to render biplots from any fitted ordination.

Supported Estimators

Estimator Type Goal Works With
ds.mva.PCA Transformer Maximize variance w^T S w subject to ||w|| = 1 Array + Table
ds.mva.LDA Classifier Maximize Fisher ratio |S_B| / |S_W| for labeled groups Array + Table
ds.mva.RDA Transformer Explain responses Y via predictors X before ordination Array + Table
ds.mva.CCA Transformer Maximize correlation between X and Y scores Array + Table
ds.ml.HCA Estimator Agglomerative clustering with linkage strategies Array + Table
ds.ml.KMeans Estimator Partition data into k centroids minimizing SSE Array + Table

Data Inputs

Array API

Pass numeric matrices directly:

const scores = pca.fit(X).transform(Xnew);

Declarative Table API

Provide structured data with column selectors:

pca.fit({
  data: penguins,
  columns: ['bill_length_mm', 'bill_depth_mm'],
  center: true,
  scale: true
});

Principal Component Analysis (PCA)

PCA finds orthogonal directions w that maximize variance w^T S w subject to ||w|| = 1. Components are eigenvectors of the covariance (or correlation) matrix S.

Constructor

const pca = new ds.mva.PCA({
  center: true,
  scale: false,
  columns: null,
  omit_missing: true,
  scaling: 2
});

Methods

Examples

// Declarative workflow
const pca = new ds.mva.PCA({ center: true, scale: true });
pca.fit({
  data: penguins,
  columns: ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm']
});

const scores = pca.getScores('sites');
const loadings = pca.getScores('variables');
ds.plot.ordiplot(pca.model).show(Plot);
// Numeric matrices
const X = [[5.1, 3.5], [4.9, 3.0], [4.7, 3.2]];
pca.fit(X);
const projected = pca.transform(X);

Linear Discriminant Analysis (LDA)

LDA finds projection vectors w that maximize the Fisher criterion w^T S_B w / w^T S_W w (between-class scatter vs within-class scatter). Provides dimensionality reduction and supervised classification.

Constructor

const lda = new ds.mva.LDA({ scale: false, scaling: 2 });

Methods

Example

const lda = new ds.mva.LDA({ scale: true });
lda.fit({
  data: iris,
  X: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
  y: 'species'
});

const preds = lda.predict({
  data: iris,
  X: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
});
const axes = lda.getScores('variables');

Redundancy Analysis (RDA)

RDA combines multivariate regression with PCA: regress responses Y onto predictors X, then perform PCA on fitted values. Constrained axes maximize variance explained by X.

Constructor

const rda = new ds.mva.RDA({
  scale: false,
  omit_missing: true,
  scaling: 2,
  constrained: true
});

Methods

Example

const rda = new ds.mva.RDA({ scale: true });
rda.fit({
  data: drought,
  response: ['soil_moisture', 'soil_temp'],
  predictors: ['precip', 'evap', 'wind']
});

const siteScores = rda.getScores('sites');
const envVectors = rda.getScores('constraints');

Canonical Correlation Analysis (CCA)

CCA finds weight vectors w_x, w_y that maximize the correlation between projected blocks X w_x and Y w_y. Multiple pairs of canonical variates are extracted sequentially.

Constructor

const cca = new ds.mva.CCA({
  center: true,
  scale: false
});

Pass any numeric hyperparameters needed by the functional API (for most uses, centering/scaling flags are sufficient).

Methods

Example

const cca = new ds.mva.CCA();
cca.fit({
  data: study,
  X: ['blood_pressure', 'cholesterol'],
  Y: ['vo2max', 'time_to_exhaustion']
});

const serumScores = cca.transformX({
  data: study,
  X: ['blood_pressure', 'cholesterol']
});
const fitnessScores = cca.transformY({
  data: study,
  Y: ['vo2max', 'time_to_exhaustion']
});

Hierarchical Clustering (HCA)

ds.ml.HCA wraps the agglomerative clustering routines (single, complete, average, ward). It builds a dendrogram by iteratively merging the closest clusters until a single cluster remains.

Constructor

const hca = new ds.ml.HCA({
  linkage: 'average',   // 'single' | 'complete' | 'average' | 'ward'
  omit_missing: true
});

Methods

Example

const hca = new ds.ml.HCA({ linkage: 'ward' });
hca.fit({
  data: penguins,
  columns: ['bill_length_mm', 'flipper_length_mm', 'body_mass_g']
});

const labels = hca.cut(3);

K-Means Clustering

ds.ml.KMeans partitions observations into k centroids by minimizing the within-cluster sum of squares

J = Sigma_i Sigma_{x in C_i} ||x - mu_i||^2

Constructor

const kmeans = new ds.ml.KMeans({
  k: 3,
  maxIter: 300,
  tol: 1e-4,
  seed: 42
});

Methods

Example

const km = new ds.ml.KMeans({ k: 4, seed: 10 });
km.fit({
  data: iris,
  columns: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
});

const clusterIds = km.predict({
  data: iris,
  columns: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
});

console.log(km.summary());

Workflow Tips