Wednesday, October 11, 2017

Study the Universe with Python tutorial, part 3 -- covariance matrix



In the last two blog posts, we discussed how to download the galaxy catalogues of the Baryon Oscillation Spectroscopic Survey (BOSS) and calculate the power spectrum of this dataset. In this blog post, we will calculate uncertainties on the power spectrum measurement.

Uncertainties can be obtained in many different ways. Here we will use simulations of the BOSS dataset. Members of the BOSS collaboration have used supercomputers to simulate what a Universe could look like and they have generated thousands of such simulated BOSS datasets. All we have to do is to calculate the power spectrum of these thousands of simulations and measure the variance between them.

We don't just calculate the uncertainty; we also calculate the correlation between different data points. All the information we need is contained in the covariance matrix, which is defined as
\[C(k_i,k_j) = \begin{pmatrix} \sigma^2_{k_1} & cor(k_1,k_2)\sigma_{k_1}\sigma_{k_2} & \dots & cor(k_1,k_n)\sigma_{k_1}\sigma_{k_n} \\
cor(k_2,k_1)\sigma_{k_2}\sigma_{k_1} & \sigma^2_{k_2} & \dots & cor(k_2,k_n)\sigma_{k_2}\sigma_{k_n} \\
\vdots & \vdots & \ddots & \vdots \\ cor(k_n,k_1)\sigma_{k_n}\sigma_{k_1} & cor(k_n,k_2)\sigma_{k_n}\sigma_{k_2} & \dots & \sigma^2_{k_n} \end{pmatrix}\]Here $\sigma_{k_i}$ is the standard deviation of the power spectrum $P(k_i)$ at wavelength $k_i$. The off-diagonal elements of this matrix contain the uncertainties on a data point contributed by neighbouring data points, which are proportional to the correlation $cor(k_i,k_j)$ between the two data points.

First, we download the simulated BOSS catalogues together with the corresponding random catalogue. Since there are so many of them, this is quite a large file
wget -N https://data.sdss.org/sas/dr12/boss/lss/dr12_multidark_patchy_mocks/Patchy-Mocks-DR12SGC-COMPSAM_V6C.tar.gz -P path/to/folder/
wget -N https://data.sdss.org/sas/dr12/boss/lss/dr12_multidark_patchy_mocks/Patchy-Mocks-DR12NGC-COMPSAM_V6C.tar.gz -P path/to/folder/
wget -N https://data.sdss.org/sas/dr12/boss/lss/dr12_multidark_patchy_mocks/Patchy-Mocks-Randoms-DR12SGC-COMPSAM_V6C_x50.tar.gz -P path/to/folder/
wget -N https://data.sdss.org/sas/dr12/boss/lss/dr12_multidark_patchy_mocks/Patchy-Mocks-Randoms-DR12NGC-COMPSAM_V6C_x50.tar.gz -P path/to/folder/
All 2048 data catalogues can be paired with one random catalogue to measure the power spectrum of the simulated catalogues. Note that you can download the power spectrum from my website if you don't want to calculate it yourself or if you run into disk capacity problems. To read these files you can use the read_power() function provided here.

Using these power spectra, we calculate the covariance matrix using the calc_cov() function, which simply reads the power spectra, subtracts the shot noise, limits the range to $< k_{max} = 0.3h/$Mpc and makes use of the numpy.cov() function.
def calc_cov(N, tag='NGC'):
    ''' Read simulation power spectra and return covariance matrix '''
    list_of_pks = []
    for i in range(1, N):
        pk_file = "./Patchy_V6C_BOSS_DR12_%s_z3_1/ps1D_patchy_%s_z3_COMPnbar_TSC_V6C_%d_0.5_0.75_700_700_700_400_renorm.dat" % (tag, tag, i)
        pk = read_power(pk_file)
        P = pk['pk0'] - pk['SN(data+ran)']
        # Limit the k range
        k = pk['k']
        krange_mask = (k > 0.01) & (k < 0.3)
        P = P[krange_mask]
        list_of_pks.append(P)
    return np.cov(np.vstack(list_of_pks).T)
We can turn the covariance matrix into a correlation matrix
\[
R = \frac{C_{ij}}{\sqrt{C_{ij}C_{ji}}}
\] and plot it as
from statsmodels.stats.moment_helpers import cov2corr
def plot_cov(matrix):
    ''' Plot the correlation matrix derived from the covariance matrix '''
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    plt.imshow(cov2corr(matrix))
    plt.colorbar()
    plt.show()
    return 
which should look like this
Figure 1: Correlation matrix, derived from the covariance
matrix of the northern part of BOSS using 1000 simulated
catalogues.
Figure 1 shows that there is some correlation between the power spectrum bins close to each other, while some bins which are far apart are anti-correlated.

The diagonal elements of the covariance matrix represent the variance in the power spectrum if the different data points would not be correlated
\[
\sigma_{P(k_i)} = \sqrt{C(k_i,k_i)}.
\]
Using these uncertainties, we can plot the BOSS measurements with error bars
Figure 2: The BOSS power spectrum measurements including
the diagonal elements of the covariance matrix as error bars.

With this, we now have the power spectra and covariance matrix, which is all we need to start using this dataset to constrain cosmological parameters. We will start with that in the next blog post.
You can find the code for this project on GitHub. Let me know if you have any questions/comments below.
cheers
Florian

No comments:

Post a Comment