Dimensionality Measures

t2(X, y)

Calculates the Average number of features per dimension (T2) metric.

t3(X, y)

Calculates the Average number of PCA dimensions per points (T3) metric.

t4(X, y)

Calculates the Ration of the PCA dimension to the original dimension (T4) metric.

problexity.classification.t2(X, y)

Calculates the Average number of features per dimension (T2) metric.

To obtaint this measure, the number of dimensions describing the dataset is divided by the number of instances.

\[T2=\frac{m}{n}\]
Parameters:
  • X (array-like, shape (n_samples, n_features)) – Dataset

  • y (array-like, shape (n_samples)) – Labels

Return type:

float

Returns:

T2 score

problexity.classification.t3(X, y)

Calculates the Average number of PCA dimensions per points (T3) metric.

To obtain this measure, first, the number of PCA components needed to represent 95% of data variability is calculated. Then, the value is divided by the instance number in the dataset.

\[T3=\frac{m'}{n}\]
Parameters:
  • X (array-like, shape (n_samples, n_features)) – Dataset

  • y (array-like, shape (n_samples)) – Labels

Return type:

float

Returns:

T3 score

problexity.classification.t4(X, y)

Calculates the Ration of the PCA dimension to the original dimension (T4) metric.

To obtain this measure, the number of PCA components needed to represent 95% of data variability is divided by the original number of dimensions. This measure describes the proportion of relevant dimensions in the dataset.

\[T4=\frac{m'}{m}\]
Parameters:
  • X (array-like, shape (n_samples, n_features)) – Dataset

  • y (array-like, shape (n_samples)) – Labels

Return type:

float

Returns:

T4 score