Smoothness Measures

s1(X, y[, normalize])

Calculates the output distribution (S1) measure.

s2(X, y[, normalize])

Calculates the input distribution (S2) measure.

s3(X, y[, normalize])

Calculates the error of nearest neighbor regressor (S3) measure.

problexity.regression.s1(X, y, normalize=True)

Calculates the output distribution (S1) measure.

Calculates complexity based on a similarity of instances adjacent in minimum spanning tree (MST). Returns the average difference of labels (y), of samples connected by MST. By default a 0-1 interval normalization is performed.

\[S1=\frac{1}{n}\sum_{i,j \in MST}|y_i - y_j|\]
Parameters:
  • X (array-like, shape (n_samples, n_features)) – Dataset

  • y (array-like, shape (n_samples)) – Labels

Return type:

float

Returns:

S1 score

problexity.regression.s2(X, y, normalize=True)

Calculates the input distribution (S2) measure.

Calculates complexity based on a similarity of features (X) of instances with close output values (y). Returns the average euclidean norm of difference of input values, of samples neighbouring after sorting them by output values. By default a 0-1 interval normalization is performed.

\[S2=\frac{1}{n}\sum_{i=2}^{n}||x_i-x_{i-1}||_2\]
Parameters:
  • X (array-like, shape (n_samples, n_features)) – Dataset

  • y (array-like, shape (n_samples)) – Labels

Return type:

float

Returns:

S2 score

problexity.regression.s3(X, y, normalize=True)

Calculates the error of nearest neighbor regressor (S3) measure.

Returns mean squared error of a 1-nearest neighbor regressor, established during leave-one-out procedure. By default, the data in normalized with 0-1 interval normalization.

\[S3=\frac{1}{n}\sum_{i=1}^{n}(NN(x_i)-y_i)^2\]
Parameters:
  • X (array-like, shape (n_samples, n_features)) – Dataset

  • y (array-like, shape (n_samples)) – Labels

Return type:

float

Returns:

S3 score