Dimensionality Measures

`t2`(X, y)	Calculates the Average number of features per points (T2) metric.
`t3`(X, y)	Calculates the Average number of PCA dimensions per points (T3) metric.
`t4`(X, y)	Calculates the Ration of the PCA dimension to the original dimension (T4) metric.

problexity.classification.t2(X, y)

Calculates the Average number of features per points (T2) metric.

To obtaint this measure, the number of dimensions describing the dataset is divided by the number of instances.

\[T2=\frac{m}{n}\]

Parameters:

X (array-like, shape (n_samples, n_features)) – Dataset
y (array-like, shape (n_samples)) – Labels of binary classification task ([0,1])

Return type:

float

Returns:

T2 score

problexity.classification.t3(X, y)

Calculates the Average number of PCA dimensions per points (T3) metric.

To obtain this measure, first, the number of PCA components needed to represent 95% of data variability is calculated. Then, the value is divided by the instance number in the dataset.

\[T3=\frac{m'}{n}\]

Parameters:

X (array-like, shape (n_samples, n_features)) – Dataset
y (array-like, shape (n_samples)) – Labels of binary classification task ([0,1])

Return type:

float

Returns:

T3 score

problexity.classification.t4(X, y)

Calculates the Ration of the PCA dimension to the original dimension (T4) metric.

To obtain this measure, the number of PCA components needed to represent 95% of data variability is divided by the original number of dimensions. This measure describes the proportion of relevant dimensions in the dataset.

\[T4=\frac{m'}{m}\]

Parameters:

X (array-like, shape (n_samples, n_features)) – Dataset
y (array-like, shape (n_samples)) – Labels of binary classification task ([0,1])

Return type:

float

Returns:

T4 score