Neighborhood Measures

`n1`(X, y)	Calculates the Fraction of borderline points (N1) metric.
`n2`(X, y)	Calculates the Ratio of intra/extra class NN distance (N2) metric.
`n3`(X, y)	Calculates the Error rate of NN classifier (N3) metric.
`n4`(X, y)	Calculates the Nonlinearity of NN classifier (N4) metric.
`t1`(X, y)	Calculates the Fraction of hyperspheres covering data (T1) metric.
`lsc`(X, y)	Calculates the Local set average cardinality (LSC) metric.

problexity.classification.lsc(X, y)

Calculates the Local set average cardinality (LSC) metric.

The measure is dependent on the distances between instances and the distances to the instances’ nearest enemies – the nearest sample of the opposite class. The number of cases that lie closer to the sample than its closest enemy is taken into account during calculation.

\[LSC=1-\frac{1}{n_2}\sum^{n}_{i=1} |LS(x_i)|\]

Parameters:

X (array-like, shape (n_samples, n_features)) – Dataset
y (array-like, shape (n_samples)) – Labels of binary classification task ([0,1])

Return type:

float

Returns:

LSC score

problexity.classification.n1(X, y)

Calculates the Fraction of borderline points (N1) metric.

The Minimum Spanning Three is generated over input instances. The measure is computed by calculating the number of edges in the MST between instances of different classes over a total number of samples.

\[N1=\frac{1}{n} \sum^{n}_{i=1}I((x_i, x_j) \in MST \wedge y_i \neq y_j)\]

Parameters:

X (array-like, shape (n_samples, n_features)) – Dataset
y (array-like, shape (n_samples)) – Labels of binary classification task ([0,1])

Return type:

float

Returns:

N1 score

problexity.classification.n2(X, y)

Calculates the Ratio of intra/extra class NN distance (N2) metric.

The measure depends on the distances of each problem instance to its nearest neighbor of the same class and the distance to the nearest neighbor of a different class. According to the proportions of those values, the final measure is calculated.

\[N2=\frac{intra\_extra}{1+intra\_extra}\]

Parameters:

X (array-like, shape (n_samples, n_features)) – Dataset
y (array-like, shape (n_samples)) – Labels of binary classification task ([0,1])

Return type:

float

Returns:

N2 score

problexity.classification.n3(X, y)

Calculates the Error rate of NN classifier (N3) metric.

Measure is determined by the error rate of the One Nearest Neighbor Classifier in the Leave One Out evaluation protocol.

\[N3=\frac{\sum^{n}_{i=1}I(NN(x_i) \neq y_i)}{n}\]

Parameters:

X (array-like, shape (n_samples, n_features)) – Dataset
y (array-like, shape (n_samples)) – Labels of binary classification task ([0,1])

Return type:

float

Returns:

N3 score

problexity.classification.n4(X, y)

Calculates the Nonlinearity of NN classifier (N4) metric.

The measure is determined by the error rate of k - Nearest Neighbor Classifier on synthetic points, generated by linearly interpolating original instances. The Classifier is fitted on original points and evaluated on synthetic instances.

\[N4=\frac{1}{l}\sum^{l}_{i=1}I(NN_T(x'_i) \neq y'_i)\]

Parameters:

X (array-like, shape (n_samples, n_features)) – Dataset
y (array-like, shape (n_samples)) – Labels of binary classification task ([0,1])

Return type:

float

Returns:

N4 score

problexity.classification.t1(X, y)

Calculates the Fraction of hyperspheres covering data (T1) metric.

The measure is described by the number of hyperspheres needed to cover the data divided by a number of instances. First, a hypersphere is generated for each problem sample. A sample lies in the center of the hypersphere. Its radius is dependent on the distance to the instance of another class. The hyperspheres are eliminated if a different hypersphere already covers the center instance. The elimination starts from the hyperspheres with the largest radiuses and continues to the ones with smaller radiuses. The hyperspheres that were not eliminated are taken into account during the calculation of complexity.

\[T1=\frac{\#Hyperspheres(T)}{n}\]

Parameters:

X (array-like, shape (n_samples, n_features)) – Dataset
y (array-like, shape (n_samples)) – Labels of binary classification task ([0,1])

Return type:

float

Returns:

T1 score