Monday, October 22, 2018

Shrinkage estimators for stable covariance matrices

Obtaining reliable error bars is often the most difficult aspect of a data analysis project. In cosmology, we usually derive a covariance matrix from simulations as
\[
C = \frac{1}{N-1}\sum^N_{ij}(X_i - \overline{X}_i)(X_j - \overline{X}_j),
\]where $X$ represents any statistic you want to use in the analysis and $\overline{X}$ is the mean over all simulations.

The covariance matrix as estimated above carries noise due to the finite number of simulations. The level of noise depends on the number of bins $n$ in the data vector (which determines the dimension of the matrix) and the number of simulations used $N$. If $N \gg n$ the matrix is usually fine, while with $N \approx n$ the noise can play a significant role and in the case of $N < n$ the matrix cannot even be inverted. In the last two cases, we can use shrinkage estimators to obtain a covariance matrix which avoids the propagation of the noise due to the small $N$ into the parameter estimate.

The shrinkage estimator $C'$ for a covariance matrix $C$ is usually given as a linear combination of the matrix itself and a target matrix $T$
\[
C' = \lambda T + (1 - \lambda)C,
\]where $T$ could, for example, be the diagonal matrix $T = \text{diag}(C)\mathcal{I}$ with $\mathcal{I}$ being the identity matrix.

The question now is how to obtain $\lambda$. For $N \gg n$ we would like to obtain the covariance matrix $C$, since in this case it should have little noise and should be reliable. For $N \approx n$ we would like to obtain $\lambda \approx 1$, so that we get something similar to $T$, avoiding the noise in $C$.

Using $T$ instead of $C$, of course, has consequences since it will ignore the correlation between bins in the data vector, which can significantly underestimate the parameter uncertainties. We therefore can't obtain the parameter error from the likelihood analysis itself but need to use an alternative. For example, we could perform the same analysis on many mock realizations and obtain the parameter error from the variance amongst the mock realizations.

So how can we obtain $\lambda$ in practice? One common approach to estimate $\lambda$ is by using cross-validation. Using the Python package sklearn we can get the inverse of the covariance matrix (also known as the precision matrix) with
from sklearn.covariance import LedoitWolf

N = len(list_of_mock_pk)
X_train = list_of_mock_pk[:int(N*0.75)]
X_test = list_of_mock_pk[int(N*0.75):]

lw = LedoitWolf()
loglik_lw = lw.fit(X_train).score(X_test)
print("loglik_lw = ", loglik_lw)
print("lw.shrinkage_ = ", lw.shrinkage_)
Cinv = lw.get_precision()
Figure 1 shows an example taken from arXiv:0711.2509, where a shrinkage estimator is used to measure $\sigma_8$ from a galaxy power spectrum. The upper panel shows the distribution of $\sigma_8$ from a set of mock realizations, while the lower panel shows the estimated error on $\sigma_8$. The black solid line shows the true result while the blue dashed line shows the result with the original covariance matrix $C$, which carries a lot of noise since $N\approx n$. Using the shrinkage estimator they get the red dashed line, which now agrees very well with the reference distribution. However, the error estimate in the lower panel is far away from the reference case, but so is the original covariance matrix (blue dashed line). So while we still can't trust the error from the analysis, at least the distribution of estimated $\sigma_8$ is now correct and we can estimate the error on $\sigma_8$ from that distribution.
Figure 1: This Figure shows the distribution of $\sigma_8$ measurements in mock realizations using a
covariance matrix based on a finite set of simulations (blue dashed line), and using the shrinkage estimator
(red dashed line). The shrinkage estimator gives the correct distribution of $\sigma_8$ but
provides the wrong error estimate (lower panel). Reference arXiv:0711.2509
So shrinkage estimators usually perform better than $C$ alone and performing a shrinkage estimation is a good approach to get an idea of how reliable the covariance matrix is. If the shrinkage estimate results in a large $\lambda$, one should be cautious about any result based on this covariance matrix.

Let me know if you have any comments/questions below.
best
Florian