Fisher Scores for Feature Selection

GenoLearn offers the Fisher Score feature selection method which computes a score for each feature and selects the \(k\) highest scoring features. The Fisher Score, as taken from Aggarwal 2014 [1], is computed by the following equation

\[S_i = \frac{\sum_j n_j(\mu_{ij} - \mu_i)^2}{\sum_j n_j\sigma_{ij}^2}\]

where

  • \(n_j\) is the number of observations belonging to the \(j\)-th class

  • \(\mu_j\) is the global mean of the \(i\)-th feature

  • \(\mu_{ij}\) is the mean of the \(i\)-th feature belonging to the \(j\)-th class

  • \(\sigma_{ij}^2\) is the variance of the \(i\)-th feature belonging to the \(j\)-th class

The above can be vectorized by the following operation \(\mathbf{D}\mathbf{n}\ /\ \Sigma\mathbf{n}\) where \(ij\)-th element of \(\mathbf{D}\) is \((\mu_{ij} - \mu_i)^2\) and the \(ij\)-th element of \(\Sigma\) is \(\sigma_{ij}^2\). Intuitively, the Fisher Score yields a higher score if the local mean is more different to the global mean scaled by the local variation.

Example

\[\begin{split}\begin{align*} & \quad \overbrace {\begin{bmatrix} 0 & 0 & 1 & 0 & 0\\ 1 & 1 & 1 & 0 & 1\\ 0 & 1 & 0 & 1 & 0\\ 0 & 3 & 0 & 1 & 1\\ 1 & 3 & 1 & 0 & 1\\ 0 & 1 & 1 & 1 & 0\\ 1 & 0 & 0 & 1 & 3\\ 2 & 2 & 4 & 2 & 0\\ 0 & 0 & 1 & 1 & 0\\ 0 & 1 & 0 & 1 & 2 \end{bmatrix}}^\mathbf{X} \hspace{14em} \overbrace {\begin{bmatrix} 0\\ 1\\ 0\\ 1\\ 1\\ 2\\ 0\\ 2\\ 0\\ 0 \end{bmatrix}}^\mathbf{y}\\\\ \mathbf{n} &= \begin{bmatrix}5 & 3 & 2\end{bmatrix} && \begin{bmatrix}\text{count of } 0\text{s} & \text{count of } 1\text{s} & \text{count of } 2\text{s}\end{bmatrix}\\\\ \mathbf{D} &= \begin{bmatrix} 0.09 & 0.64 & 0.25 & 0. & 0.04 \\ 0.0277778 & 1.2844444 & 0.0544444 & 0.2177778 & 0.04 \\ 0.25 & 0.09 & 2.56 & 0.49 & 0.64 \end{bmatrix} && \begin{matrix}(\text{mean when } \mathbf{y} = 0 \text{ minus global mean squared})\\ (\text{mean when } \mathbf{y} = 1 \text{ minus global mean squared})\\ (\text{mean when } \mathbf{y} = 2 \text{ minus global mean squared})\end{matrix}\\\\ \Sigma &= \begin{bmatrix} 0.16 & 0.24 & 0.24 & 0.16 & 1.6 \\ 0.2222222 & 0.8888889 & 0.2222222 & 0.2222222 & 0. \\ 1. & 0.25 & 2.25 & 0.25 & 0. \end{bmatrix} && \begin{matrix}(\text{variance when } \mathbf{y} = 0)\\ (\text{variance when } \mathbf{y} = 1)\\ (\text{variance when } \mathbf{y} = 2)\end{matrix} \end{align*}\end{split}\]

resulting in

\[\mathbf{S} = \begin{bmatrix} 0.2980769 & 1.6564885 & 1.026178 & 0.8305085 & 0.2\end{bmatrix}\]

For this example, the feature rankings are \([2, 3, 4, 1, 5]\) i.e. the second feature is the most important and the fifth feature is the least important when ranked according to their associated Fisher Scores.