My statistics blog: cosmology research

Showing posts with label cosmology research. Show all posts

Saturday, March 31, 2018

About the assumption of a Gaussian likelihood in a data model comparison

This is a blog post about a recent publication led by ChangHoon Hahn, which I co-authored (Hahn et al. 2018). The paper looks at the impact of non-Gaussian likelihoods for cosmological parameter estimation. This blog post attempts to explain the content of this paper for a general audience.

When finding the best model to describe a given dataset, most studies employ the $\chi^2$ statistic, which is a measure of how well the model and the data agree
\[
\chi^2 = \sum^{d} (data - model)^TC^{-1}(data - model),
\]where $d$ is the number of data points in the dataset, and the covariance matrix $C$ describes the error of the data points. This statistic is directly related to the so-called likelihood, which can be obtained as
\[
p(x|\theta) = \frac{1}{(2\pi)^{d/2}\sqrt{det(C)}} \exp{\left(-0.5\chi^2\right)}.
\]This likelihood describes the probability that one could have observed the data, given that the underlying truth is described by the model with the parameters $\theta$. So whenever somebody observes some data, the final analysis step is usually to calculate this likelihood and find the parameters $\theta$ which have the largest likelihood.

However, the likelihood above has a Gaussian shape, and it is not guaranteed that the probability distribution has that form. So the question we are addressing in this publication is, how close is the likelihood to a Gaussian form and if there is a deviation, how does it impact the analysis.

The first question to answer is, why do we assume a Gaussian likelihood in the first place?

The Gaussian distribution has a central role in statistics because it has been shown that when collecting many different measurements of the same event, these measurements often follow a Gaussian distribution. For example, if you throw a coin 100 times and calculate the mean, this mean is probably somewhere close to 0.5.

If you make that experiment many times, meaning you throw the coin 100 times, calculate the mean of the 100 throws and repeat that many times. The distribution of means will peak close to 0.5 and will show a Gaussian shape (see Figure 1). This is quite surprising because the probability of throwing heads or tails is a top-hat distribution, which looks not at all like a Gaussian. So adding many top-hat distributions leads somehow to a Gaussian distribution.

Figure 1: Probability distribution for the mean of many throws of a coin. The number of throws is shown in the top gray
box. With an increasing number of throws, the distribution looks more and more Gaussian (source: Wikipedia).

Figure 1 shows the probability distribution when looking at the mean of many coin flips. The distribution looks more and more Gaussian if the number of coin flips is increased.

In fact, that happens quite often in nature, the likelihood distribution of a large number of events often looks like a Gaussian, even if the likelihood distribution for a single event does not. This is known in statistics as the central limit theorem.

However, given the example above you can already see that the assumption of a Gaussian likelihood only works for large numbers. Figure 1 shows clearly that the probability distribution is far from Gaussian if one calculates the mean from only a small number of coin flips.

Going back to our paper. We looked at the power spectrum analysis of the Baryon Oscillation Spectroscopic Survey published in Beutler et al. (2016) and the group multiplicity function analysis of Sinha et al (2017).

Let's start with the power spectrum analysis. I discussed this analysis in several blog posts (e.g. here) already so I will skip the details about it. Here we do not care too much about the details of the measurement anyway, we just want to test whether the assumption of a Gaussian likelihood is justified.

In the power spectrum case one calculates the mean of many Fourier modes and the question is whether we have enough modes to make the likelihood distribution Gaussian. In general, there are many modes on small scales (large wavevector k) but only a small number of modes on large scales (small wavevector k). E.g. at $k=0.01h/$Mpc only a few dozen modes contribute to the value of the power spectrum, while at k=0.2h/Mpc many thousand modes contribute. Whether this matters or not depends on how this impacts the parameters of the model, which is not straightforward.

There is a very simple way to check whether the likelihood is Gaussian. We have produced mock catalogs for the BOSS survey, which mimic the BOSS dataset. Each of these mock catalogs represents a different possible realization of the measurement of BOSS. Therefore we just have to measure the power spectrum of all these mock catalogs and see whether those power spectra scatter around their mean in the shape of a Gaussian.

Whatever the distribution is going to be, we then use that distribution rather than assuming a Gaussian when searching for the model with the maximum likelihood.

Figure 2. This plot shows the likelihood distribution for the different cosmological parameters derived from
the BOSS power spectrum. The blue distribution assumes a Gaussian likelihood, which the orange
distribution assumes a likelihood distribution seen in the mock catalogs. You see there is overall good
agreement except for a small shift in $f\sigma_8$.

Figure 2 shows the likelihood distribution for each cosmological parameter. Note this is a different likelihood than what we talked about before. The likelihood we talked about before describes the probability of observing the data vector given that the underlying model is the true model, while the likelihood in Figure 2 describes the probability of observing the data given that parameter value.

The blue distribution assumes a Gaussian likelihood, while in orange we use the likelihood from the mocks. The two likelihoods seem to agree well, except for the parameter $f\sigma_8$, where we found a small shift to larger values.

Next the group multiplicity function analysis of Sinha et al (2017). The likelihood distributions are again shown in blue and orange in Figure 3. For this analysis, you can see that the two likelihoods agree less well, especially in terms of how peaked the distributions are. Removing the assumption of a Gaussian likelihood seems to increase the error on almost all derived parameters.

Figure 3: Comparison between the original group multiplicity function analysis of Sinha et al (2017) with the new analysis dropping the assumption of a Gaussian likelihood. The two distributions show significant
deviations for most parameters.

In conclusion, the assumption of the form of the likelihood can impact cosmological parameter constraints and should be accounted for. With larger datasets, the statistics usually gets better and the issue discussed here will get smaller. However, observables like primordial non-Gaussianity are located in the regime with small mode statistics and hence such studies must check such biases carefully.

I hope this was useful. Let me know if you have any comments/questions in the comment section below.
best
Florian

Thursday, March 29, 2018

Measuring Neutrinos with the distribution of galaxies in the Universe

In this blog post, I will summarise the paper Baumann et al. (2018) which my collaborators and I just published. The paper reports a measurement of the phase shift in the Baryon Acoustic Oscillation signal due to free streaming particles (neutrinos) in the early universe. So what does that mean?

1. What is the Baryon Acoustic Oscillation (BAO) scale? The BAO scale is a special scale in the distribution of galaxies. If you look at the separation of galaxies in the Universe, you will find that at around 150 Megaparsecs (500 000 000 light years) there is a sudden increase in the number of galaxy pairs. This special scale in the distribution of galaxies was imprinted when the Universe was just 300 000 years old at the so-called decoupling era. If you measure the correlation function of a galaxy distribution, meaning you count the number of galaxy pairs as a function of scale, the BAO signal will show up as a peak at 150 Megaparsecs. If you look at the Fourier transform of the correlation function, the power spectrum, it will show up as an oscillation signal. This signal is what we use in this analysis.

2. What is the decoupling era? In the early Universe photons and baryons were coupled together due to the high pressure. 300 000 years after the Big Bang the temperature and pressure dropped enough so that photons and baryons decoupled. From that point forward most of the photons in the Universe just traveled through the Universe without any interactions with anything. For that reason, we can look at these Cosmic Microwave Background (CMB) photons and basically see a snapshot of the Universe as it was 300 000 years after the Big Bang.

3. What is the effective number of relativistic species? This number describes all particles which had a relativistic velocity at the decoupling era. Currently, this number is assumed to be three, given the three known species of neutrinos ($\nu_{e}$, $\nu_{\mu}$ and $\nu_{\tau}$). We know that these particles must have been produced during the Big Bang and since they are very light, they should have had relativistic velocities at decoupling.

Figure 1 shows the impact of the effective number of free-streaming relativistic species ($N_{\rm eff}$) on the BAO signal and you can see that it leads to a shift in the phase.

Figure 1: This figure shows the impact of the effective number of relativistic species ($N_{\rm eff}$)
on the Baryon Acoustic Oscillation (BAO) scale in the matter power spectrum. It is the shift in the
phase of the BAO which we measured for the first time in this analysis (reference: This plot is from
Baumann et al. 2017).

This phase shift is very small, but we now managed to constrain the BAO scale to 1% using the Baryon Oscillation Spectroscopic Survey (BOSS), which makes us sensitive to such small effects.

So why are we interested in this effect?

Well, it could be that there are additional particles which propagated in the early Universe, but which we have not seen yet in particle accelerators. In this sense, we are using the Big Bang as the ultimate high energy collider and test whether the remains of the Big Bang show any evidence for new particles.

Besides that, the neutrinos generated by the Big Bang are also a very interesting probe of the early Universe. As mentioned before we can use the Cosmic Microwave Background (CMB) to get a snapshot of the Universe when it was only 300 000 years old. But we cannot actually look any closer to the Big Bang. If we could see the distribution of neutrinos produced by the Big Bang, we would actually see the Universe when it was only 1 second old, because neutrinos stopped interacting with other components of the Universe much earlier than photons, 1 second after the Big Bang.

So if we want to investigate the Universe at its very early age, the neutrino background produced by the Big Bang would be one way to do that. Another would be the gravitational wave background produced by the Big Bang. However, right now we do not have experiments sensitive enough to detect the neutrino or gravitational wave background. The detection of the phase shift in the BAO is an indirect detection of the neutrino background.

The shift in the BAO signal is not quite scale independent, hence the first step is we have to create a template for the shift, which we include in the standard BAO fitting pipeline, which I developed for the BOSS analysis back in 2017 (Beutler et al. 2017). The template for the phase shift is shown in Figure 2.

Figure 2: Template describing the shift in the BAO phase as a function of wavenumber. We
include this template in our analysis, but put a free parameter as its amplitude.

The figure shows that the shift caused by neutrinos $f(k)$ is larger at larger wavenumber $k$.

The amplitude of this template is a free parameter in the fit, which we call $\beta$. We then use the BOSS dataset to constrain this parameter.

Figure 3: Constraints on the amplitude of the phase shift ($\beta$) obtained from the BOSS dataset
in two redshift bins ($z_1$ and $z_3$).

Figure 3 shows the constraints on the amplitude of the template ($\beta$) obtained from the BOSS dataset. In BOSS we have two redshift bins, $z_1$ ($0.2 < z < 0.5$) and $z_3$ ($0.5 < z < 0.75$). We use both redshift bins for our constraint, which yields $\beta = 1.2\pm 1.8$. This constraint can then be converted into a constraint on the effective number of relativistic species. $\beta = 0$ would correspond to $N_{\rm eff} = 0$, while $\beta = 2.45$ would correspond to $N_{\rm eff} = \infty$.

You can see that our constraint just from the BOSS data can't really constrain $N_{\rm eff}$. However, Figure 3 also shows that there is a strong degeneracy between the parameter $\beta$ and the BAO scale parameter $\alpha$. If we fix the cosmological model, we can get strong priors on the $\alpha$ parameters from the Cosmic Microwave Background. The constraints including these priors are also included in Figure 3 (red and orange contours on the left and red solid line on the right) resulting in $\beta = 2.05\pm0.81$. This excludes $\beta=0$ (corresponding to $N_{\rm eff} = 0$) with more than $2\sigma$. Note that the expected value of $\beta$ in a standard cosmological model with $N_{\rm eff} = 3.046$ is $\beta = 1$, while our data seem to prefer larger values. However, $\beta = 1$ is well within our likelihood distribution (red line in Figure 3, right) and hence we do not see any significant deviation from the standard model.

A similar measurement of the phase shift in the photon distribution (CMB) using the Planck data has been reported in Follin et al. 2015. They measured $N_{\rm eff} = 2.3^{+1.1}_{-0.4}$, which is consistent with our measurement.

In summary, we, for the first time, measured the phase shift in the BAO signal in the galaxy power spectrum, representing an (indirect) detection of the cosmic neutrino background produced by the Big Bang. Our measurement confirms the current standard model with three neutrino species. While the detection significant of our measurement is fairly low, future galaxy surveys like DESI and Euclid will improve these measurements significantly.

Our paper about this measurement was published in Nature (see https://www.nature.com/articles/s41567-019-0435-6)

Let me know if you have any comments or questions below.

best

Florian

Wednesday, March 7, 2018

Measuring the galaxy bispectrum of the large scale structure of the Universe

This blog post summarizes the publication Naonori, Saito, Beutler & Seo (2018), which my collaborators and I published recently. It describes a new way to measure the galaxy bispectrum. Even though I have attempted here to explain the content of this paper for a general audience, the subject is very technical and some knowledge of higher mathematics is required to follow this post.

What is the Bispectrum?

The density field of the Universe, meaning the distribution of dark matter and baryonic matter, can be expressed as the density at point x relative to the mean density of the Universe
\[
\delta(\vec{x}) = \frac{n(\vec{x})}{\overline{n}} - 1,
\]which is also known as the overdensity field. This density field is an important cosmological probe since e.g. a different amount of dark energy would change the density field. So we can use the density field to study dark energy.

Beside dark energy, the density field also contains information about dark matter, the neutrino masses and the inflationary period of the Universe. We can also use it to test general relativity or modified gravity theories.

If the distribution of $\delta(\vec{x})$ at different positions $\vec{x}$ follows a Gaussian distribution, all information contained in it can be captured by the 2-point function, which in configuration-space is called the correlation function, $\xi(r)$, and in Fourier-space is called the power spectrum, $P(k)$. The 2-point function is the ensemble average of correlations between all possible pairs within the density field.
\begin{align}
\xi(\vec{r}) &= \langle \delta(\vec{x} + \vec{r})\delta(\vec{x})\rangle\\
P(\vec{k}) &= \langle \delta(\vec{k})^2\rangle,
\end{align}where $\delta(\vec{k})$ is the Fourier transform of the density field $\delta(\vec{x})$.

It turns out that the density field of the Universe is close to a Gaussian field on large scales, but on smaller scales ($\sim 10 Mpc$) it has non-Gaussian properties. To capture all information in a density field which does not behave like a Gaussian random field, we need to use all possible n-point functions, not just the 2-point function.

Including the 3-point correlation, which in Fourier-space is called the bispectrum, will increase the signal-to-noise ratio, and improve cosmological parameter constraints. Figure 1 shows how much information the bispectrum can add as a function of the smallest scale included ($k_{\rm max}$).

Figure 1: Signal-to-noise ratio as a function of $k_{max}$. Adding the bispectrum allows getting
closer to the Gaussian limit, which represents the maximum amount of information which can be
retrieved from this clustering dataset.

The bispectrum is defined as the ensemble average of the product of the density field at three points
\[
B_{123} = \langle \delta(\vec{k}_1)\delta(\vec{k}_2)\delta(\vec{k}_3)\rangle.
\]This, however, is a 9-dimensional quantity, which is very difficult to analyze. The question is, how can we compress the information contained in this quantity into some lower dimensional space without losing information?

First, we have to figure out how many parameters does one need to describe this three-point configuration?

The bispectrum essentially identifies all triangle configurations in the density field and averages them. Any triangle can be described by 3 parameters. However, with galaxy surveys, we are also interested in the orientation of the triangle with respect to the line-of-sight, which then requires 2 additional parameters, so 5 in total. Just as a comparison, in the power spectrum, this can be done with only 2 parameters, the amplitude $k$ and the angle to the line-of-sight $\mu$.

Other estimators

One important aspect of the bispectrum is its computational complexity. Since one needs to identify all possible triangles in the density field, the complexity of such a procedure is naively given by $\mathcal{O}(N^3)$. However, in 2015 we learned that the bispectrum can actually be estimated in $\mathcal{O}(N\ln N)$ using Fast Fourier Transforms (FFTs). For the rest of this post I will only consider FFT based estimators for the bispectrum.

The first FFT-based bispectrum estimator has been proposed by Roman Scoccimarro (Roman Scoccimarro 2015) and it looks like this
\[
B^m_{\ell}(k_1, k_2, k_3) = \frac{(2\ell + 1)}{N^T_{123}} \prod^3_{i=1} \int_{k_i} d^3 q_i \delta_{123}\delta_{\ell}(q_1)\delta_0(q_2)\delta_0(q_3).
\]This estimator is based on the approach suggested in Scoccimarro (2015) and Bianchi et al. (2015), which calculates the $\delta_{\ell}(q_1)$ using FFTs.

However, there is a fundamental issue with this estimator. Any Fourier-space quantity like the power spectrum or bispectrum needs to account for the survey window function. For example, the power spectrum we measure in a galaxy survey is given by

\[
P^{\rm conv}(\vec{k}) = \int d\vec{k}P^{\rm true}(\vec{k})W^2(\vec{k} - \vec{k}'),
\]where $W(\vec{k})$ represents the survey window function. The window function basically refers to the power spectrum one would measure with a completely random distribution of points within a certain geometric configuration. Even if all galaxies in our galaxy survey were at completely random positions, we would still measure a power spectrum. This power spectrum of random positions is $W(\vec{k})$. So in principle, we know the window function very well, but accounting for it in a power spectrum analysis can be quite tricky.

The approach we take in the power spectrum analysis goes through the following steps:

(1) Fourier transform the power spectrum model
(2) Multiply it by the window function
(3) Fourier transform back into Fourier-space

After we have convolved the power spectrum model with the window function in this way, we can compare it to the power spectrum measurement from the data.

To do the same with the bispectrum we need to define a Hankel transform for the bispectrum. A Hankel transform is a Fourier transform of the ensemble average quantity (bispectrum/power spectrum).

However, a Hankel transform for Roman's bispectrum has not been found yet (and might not exist). This does not mean that there is no way to include the window function in Roman's estimator, it just means that so far no formalism has been published and we found it very hard to find such a formalism. For that reason, we developed a new bispectrum estimator, which has a Hankel relation to its configuration-space equivalent and therefore can include the survey window function effect in the same way as described above for the power spectrum.

Our new estimator

We use the 3D bispectrum in the form $B(\vec{k}_1, \vec{k}_2, \hat{n})$, where we chose the $\vec{k}_3$ axis as the line-of-sight $\hat{n}$. We now use spherical harmonic expansion in all three vectors
\begin{align}
B(\vec{k}_1,\vec{k}_2,\hat{n}) &=& \sum_{\ell_1\ell_2 L} \sum_{m_1m_2M}B_{\ell_1\ell_2L}^{m_1m_2M}(k_1,k_2) y_{\ell_1}^{m_1}(\hat{k}_1) y_{\ell_2}^{m_2}(\hat{k}_2) y_{L}^{M}(\hat{n}),
\end{align}with the coefficients
\begin{align}
B_{\ell_1\ell_2L}^{m_1m_2M}(k_1,k_2) &=
N_{\ell_1\ell_2L} \int \frac{d^2\hat{k}_1}{4\pi}\int \frac{d^2\hat{k}_2}{4\pi}\int \frac{d^2\hat{n}}{4\pi} \\
&\times y^{m_1*}_{\ell_1}(\hat{k}_1) y^{m_2*}_{\ell_2}(\hat{k}_2)y^{M*}_{L}(\hat{n}) B(\vec{k}_1,\vec{k}_2,\hat{n}),
\end{align}and the normalized spherical harmonics $y_{\ell}^m = \sqrt{4\pi/(2\ell+1)}\, Y_{\ell}^m$. The step above mapped the 3D bispectrum into the 8 dimensional coefficients $B_{\ell_1\ell_2L}^{m_1m_2M}(k_1,k_2)$. However, as we established above, the bispectrum can be captured with only 5 parameters. We found that the 8 dimensional quantity above can be further compressed into 5 parameters using
\begin{align}
B_{\ell_1\ell_2L}(k_1,k_2) &=
H_{\ell_1\ell_2L} \sum_{m_1m_2M} \left( \begin{smallmatrix} \ell_1 & \ell_2 & L \\ m_1 & m_2 & M \end{smallmatrix} \right)
B_{\ell_1\ell_2L}^{m_1m_2M}(k_1,k_2),
\end{align}where $H_{\ell_1\ell_2L}=\left( \begin{smallmatrix} \ell_1 & \ell_2 & L \\ 0 & 0 & 0 \end{smallmatrix} \right)$. This last step assumes statistical isotropy and parity-symmetry as well as homogeneity of the density field.

Properties of our estimator

We found a few interesting properties for this particular decomposition:

1. All anisotropies with respect to the line of sight are compressed into one multipole, labeled $L$. These anisotropies are caused by redshift-space distortions and the Alcock-Paczynski effect, which represent two of the most important observables in cosmology.

2. The shot noise term associated with this decomposition is k dependent. This is rather surprising since for the power spectrum we only have to deal with a constant shot noise (not considering halo exclusion). The non-constant shot noise has very interesting implications, e.g. it does not vanish for the configuration-space quantity (the three-point function).

But the most important point is that for this estimator we can write down a Hankel transform
\begin{align}
B_{\ell_1\ell_2L}(k_1,k_2) &= (-i)^{\ell_1+\ell_2}(4\pi)^2 \int dr_1 r_1^2 \int dr_2 r_2^2 \\
&\times j_{\ell_1}(k_1r_1) j_{\ell_2}(k_2r_2) \zeta_{\ell_1\ell_2L}(r_1,r_2) \\
\zeta_{\ell_1\ell_2L}(r_1,r_2) &=
i^{\ell_1+\ell_2}\int \frac{dk_1k_1^2}{2\pi^2} \int \frac{dk_2k_2^2}{2\pi^2} \\
&\times j_{\ell_1}(r_1k_1)j_{\ell_2}(r_2k_2)B_{\ell_1\ell_2L}(k_1,k_2),
\end{align}which allows us to apply the window function to any bispectrum model before comparing it to the measurement, just as outlined above for the power spectrum. The window function effects are shown in Figure 2.

Figure 2: This plot shows the impact of the survey window function onto the bispectrum multipoles. Compare the black dashed line, which shows the measurement of the bispectrum in periodic box simulations with the grey shaded region, which shows the measurement of the bispectrum in lightcones, following the geometry of the Baryon Oscillation Spectroscopic Survey, North Galactic Cap. You can see that there is a suppression of power on almost all scales, due to the window function. The red dashed line and red solid line show the impact of the window function onto a second order perturbation theory model using our window function formalism. You can see that we can reproduce the effect measured in the simulations.

Finally, we used this analysis pipeline to measure the highest order anisotropic bispectrum component $B_{202}$ which we detect with $14\sigma$ significance. This represents the first detection of the anisotropic bispectrum signal (to our knowledge).

To summarize, the new bispectrum decomposition outlined in this paper allows us to write down a clear analysis pipeline which self-consistently accounts for the survey window function. The next steps will be to apply this analysis to galaxy survey datasets.

I hope this summary was useful and please let me know if you have any comments/questions in the comment section below.
cheers
Florian

Wednesday, January 3, 2018

Plotting MCMC chains in Python using getdist

This is a quick introduction to the getdist package by Antony Lewis, which allows visualizing MCMC chains.

from getdist import plots, MCSamples
import numpy as np

def main():
    mean = [0, 0, 0]
    cov = [[1, 0, 0], [0, 100, 0], [0, 0, 8]]
    x1, x2, x3 = np.random.multivariate_normal(mean, cov, 5000).T
    s1 = np.c_[x1, x2, x3]

    mean = [0.2, 0.5, 2.]
    cov = [[0.7, 0.3, 0.1], [0.3, 10, 0.25], [0.1, 0.25, 7]]
    x1, x2, x3 = np.random.multivariate_normal(mean, cov, 5000).T
    s2 = np.c_[x1, x2, x3]

    names = ['x_1', 'x_2', 'x_3']
    samples1 = MCSamples(samples=s1, labels=names, label='sample1')
    samples2 = MCSamples(samples=s2, labels=names, label='sample2')

    g = plots.getSubplotPlotter()
    g.triangle_plot([samples1, samples2], filled=True)
    g.export('triangle_plot.png')
    g.export('triangle_plot.pdf')
    return 

if __name__ == "__main__":
    main()

which results in

We can also plot the constraints on the individual parameters

def get_constraints(samples):
    for i, mean in enumerate(samples.getMeans()):
        upper = samples.confidence(i, upper=True, limfrac=0.05)
        print("\nupper limit 95 C.L. = %f" % upper)
        lower = samples.confidence(i, upper=False, limfrac=0.05)
        print("lower limit 95 C.L. = %f" % lower)
        print("%s = %f +/- %f +/- %f" % (samples.parLabel(i),\

        mean, mean - samples.confidence(i, limfrac=0.16),\
        mean - samples.confidence(i, limfrac=0.025)) )
    return

which in our case looks like this

upper limit 95 C.L. = 1.653261
lower limit 95 C.L. = -1.637202
x_1 = 0.000167 +/- 0.996626 +/- 1.958270

upper limit 95 C.L. = 16.365135
lower limit 95 C.L. = -16.484502
x_2 = -0.010400 +/- 9.874866 +/- 19.682727

upper limit 95 C.L. = 4.637981
lower limit 95 C.L. = -4.656869
x_3 = -0.009280 +/- 2.827152 +/- 5.571246

While in our case we set the variance and covariance of the parameters manually at the beginning to generate the samples, usually one gets an MCMC chain and wants to determine these parameters. To get the covariance matrix between two parameters we have to use the cov() function and provide the index of the parameters

print("cov(x_1, x_3) = ", samples2.cov(pars=[0,2]))
print("cov(x_1, x_2) = ", samples2.cov(pars=[0,1]))

which in our case just reproduces the input values

cov(x_1, x_3) =  [[ 0.69364079  0.10129875]
 [ 0.10129875  7.13148423]]
cov(x_1, x_2) =  [[  0.69364079   0.30451831]
 [  0.30451831  10.05805092]]

I posted the example code above on GitHub
best
Florian

Thursday, October 12, 2017

Study the Universe with Python tutorial, part 5 -- Monte Carlo Markov Chain

This is the fifth blog post in this series, which discusses the Baryon Oscillation Spectroscopic dataset (BOSS). In the last 4 posts, we downloaded the data, calculated the power spectrum and covariance matrix and isolated the BAO feature. Now we want to apply this procedure to the data and extract cosmological information.

In the last post, we used the scipy minimize function to find the best-fitting model. However, the best-fitting model is not very useful if we don't know the uncertainties attached to it.

What we really need is a probability for each possible model which could describe our Universe. The probability for model $Y$ with parameters x should reflect the probability that we would observe the BOSS dataset if the Universe were described by $Y(x)$. Such a probability distribution is called a likelihood and a very powerful way to get such a likelihood is a Monte Carlo Markov Chain (MCMC).

What is an MCMC? An MCMC is a chain of parameter sets which has the nice property that the appearance of each parameter in the chain also represents its likelihood distribution. This means one can take the chain, produce a histogram of parameter x, which directly provides the (marginalised) likelihood for parameter x (marginalised just means that you integrate over the likelihood of all other parameters).

There are many algorithms which can produce such a chains, but the most basic and well-known is the Metropolis-Hastings algorithm. Here we will just focus on the Python implementation of such a procedure, but if you have never encountered MCMC it is certainly worth learning about it.

We will use the PocoMC package. First, we have to define a likelihood function, which can make use of the calc_chi2() function we defined earlier

def get_loglike(parameters, data, templates, func):
    return -0.5*calc_chi2(parameters, data, templates, func)

Now we define a function to produce an MCMC chain

import pickle
def runMCMC(boss_data):

    labels = ['b', 'A1', 'A2', 'A3', 'A4', 'A5', 'alpha', 'sigmaNL']

    if os.path.isfile("samples.pickle"):
        with open('samples.pickle', 'rb') as handle:
            samples = pickle.load(handle)
        return samples, labels
    
    k, pk, pk_nw = get_Pk(0.3)
    pk_nw_interp = interp1d(k, pk_nw)
    krange = np.arange(0.001, 0.5, 0.001)
    os_model = get_oscillation(krange, k, pk, pk_nw)

    prior = pc.Prior([
        uniform(loc=0.01, scale=4.0),           # [0.01, 4]
        uniform(loc=-10.0, scale=20.0),         # [-10, 10]
        uniform(loc=-100.0, scale=200.0),       # [-100, 100]
        uniform(loc=-10000.0, scale=20000.0),   # [-10000, 10000]
        uniform(loc=-100000.0, scale=200000.0), # [-100000, 100000]
        uniform(loc=-100000.0, scale=200000.0), # [-100000, 100000]
        norm(loc=1.0, scale=0.2),               # Normal(1.0, 0.2)
        uniform(loc=0.1, scale=20.0),           # [0.1, 20]
    ])

    # Extra safety: ensure prior.n_dim is a Python int
    if hasattr(prior, "n_dim"):
        prior.n_dim = int(prior.n_dim)

    # Initialise sampler
    sampler = pc.Sampler(
        prior=prior,
        likelihood=get_loglike,
        likelihood_args=(boss_data, {'noBAO': pk_nw_interp, 'os_model': os_model}, get_shifted_model),
        random_state=0,
        n_effective=1024,
        n_active=512,
    )

    # Start sampling
    sampler.run()

    samples, logl, logp = sampler.posterior(resample=True)

    with open('samples.pickle', 'wb') as handle:
        pickle.dump(samples, handle, protocol=pickle.HIGHEST_PROTOCOL)
    return samples, labels

This code does save the samples as a pickle file. Note that if you want to produce a new file, you need to delete the file otherwise, the code will just use the existing file. We can now plot the chain in a triangle plot using

import corner
def plot_chain(samples, labels=None, filename="corner.png"):
    samples = np.asarray(samples)
    if samples.ndim != 2:
        raise ValueError("samples must have shape (n_samples, n_dim)")

    if labels is None:
        labels = [f"p{i}" for i in range(samples.shape[1])]

    mean = samples.mean(axis=0)
    std = samples.std(axis=0, ddof=1)

    fig = corner.corner(
        samples,
        labels=labels,
        quantiles=[0.16, 0.5, 0.84],
        show_titles=True,
        title_fmt=".3f",
    )
    fig.savefig(filename, dpi=150, bbox_inches="tight")
    plt.show()
    return fig

which gives

Figure 2: 2D likelihood distributions for the 7 parameters in our fit.

The 1D marginalised likelihood is given on the right together with the

$68\%$ confidence levels.

Now what does that mean in terms of cosmological parameters? Well as mentioned before, the only parameter of interest here is the scaling parameter $\alpha$. This parameter is defined as
\[
\alpha = \frac{D_V}{D^{\rm fid}_V}
\]where the the distance $D_V$ is defined as
\[
D_V = \left[(1+z)^2D^2_A\frac{cz}{H(z)}\right]^{1/3}
\]The superscribed $^{\rm fid}$ just means that this is the same quantity in the fiducial cosmology, which is the one we defined in the cosmo class in nbodykit.

The distance $D_V(z)$ is an average of the angular diameter distance $D_A(z)$ and the Hubble parameter $H(z)$
\[
\begin{align}
D_A(z) &= \frac{c}{1+z}\int^z_0\frac{dz'}{H(z')}\\
H(z) &= H_0\left[(\Omega_cdm+\Omega_b)(1+z)^3 + \Omega_{\Lambda}\right]^{1/2}
\end{align}
\]With this we can now turn the constraint on $\alpha$ into a constraint on $D_V$

    from astropy.cosmology import Planck15
    import astropy
    Hz = Planck15.H(zmean)
    DC = (1.+zmean)*Planck15.angular_diameter_distance(zmean)
    c_km_s = astropy.constants.c/1000.
    DVfid = ( DC**2*(zmean*c_km_s/Hz) )**(1./3.)
    print("DV = ", mean[6]*DVfid, "+/-", std[6]*DVfid)

Which should show something like
>> DV = 2154.4702196107396 +/- 56.825103044911536
The next step would be to combine this constraint with measurements from the Cosmic Microwave Background (CMB) to get constraints on actual cosmological parameters. But for now, I will leave it here.

Thanks a lot for making it to the end of this series. If you are interested in going further, you should consider a few extensions to this analysis:

Have a look at density field reconstruction
Consider an anisotropic analysis to constrain $D_A(z)$ and $H(z)$ independently
Do you think the polynomial model we use is the best choice? Can you come up with a better model?
Use the simulated BOSS catalogues to test whether our procedure is robust

The final code for this analysis is available on GitHub.

cheers

Florian

Study the Universe with Python tutorial, part 4 -- minimizing $\chi^2$

This is the fourth blog post about how to use the Baryon Oscillation Spectroscopic (BOSS) dataset to constrain cosmology. In the first three posts, we downloaded the data, calculated the power spectrum and obtained a covariance matrix.

The power spectrum we measured carries a huge amount of information about cosmological parameters, but most of that information cannot be accessed easily. The reason is that we cannot calculate reliable models for the power spectrum, to compare to the data. In a previous post, we calculated models using the class code, but these models are only accurate on very large scales (small wavenumbers).

For that reason, we will focus here on one particular signal, which can be described with a very simple model. This signal is called Baryon Acoustic Oscillations (BAO). BAO in the galaxy distribution were detected for the first time in 2005 and since that first detection, BAO have become one of the most powerful cosmological probes. In fact, the BAO measurement is one of the main reasons why the BOSS dataset has been assembled.

The BAO signal shows up as an oscillation signal on top of the power spectrum. In a previous plot, we saw model power spectra for different amounts of dark matter

Figure 1: Model power spectrum for different amounts
of dark matter.

You can see the BAO signal as an oscillation. This is the signal we are after. The BAO signal is much more robust (easier to model) compared to the rest of the power spectrum and, therefore, we will use a model for the power spectrum where we get rid of all but the oscillation signal.

We can do that by fitting a power spectrum without a BAO signal to a power spectrum with BAO signal. Fitting just means that we test many different power spectra and pick the one which matches the best.

The nbodykit package provides power spectra without BAO signal, all that we have to do is call the modelling function we used earlier but include 'numerical_nowiggle': 'yes', to the parameter file. We can than extract the power spectrum without wiggles as

Pno_wiggle = np.array([cosmo.pk_numerical_nw(ki, z) for ki in k])

And if we plot such power spectrum with the models we had earlier we get

Figure 2: Same power spectra as in Figure 1, but calculated without
BAO signal (dashed lines).

Comparing the dashed and solid lines in Figure 2 shows that these power spectra are the same except for the BAO wiggles, which have been removed.

With this we can now get a model for the BAO signal only

from scipy.interpolate import interp1d
def get_oscillation(krange, k, pk, pk_nw):
    ''' Get an oscillation only power spectrum '''
    pk_interp = lambda x: interp1d(k, pk)(x)
    pk_nw_interp = lambda x: interp1d(k, pk_nw)(x)
    yolin = pk_interp(krange)/pk_nw_interp(krange)
    return yolin

Here is a potential output of the get_oscillation() function

Figure 3: The galaxy power spectrum divided by the best fitting
no-BAO power spectrum. With this technique, we can isolate the
BAO feature.

Our procedure removed the entire broadband power spectrum and isolated the oscillation signature. Before we apply this to the data, we have to extend the model

def get_shifted_model(parameters, x, templates):
    ''' Calculate a model including a scaling parameter '''
    model = get_smooth_model(parameters, x, templates)
    return model*(1. + (templates['os_model'](x/parameters[6]) - 1.)*np.exp(-0.5*x**2*parameters[7]**2))

with

def get_smooth_model(parameters, x, templates):
    ''' Combine a noBAO model with polynomials and linear bias '''
    polynomials = parameters[1]/x**3 + parameters[2]/x**2 + parameters[3]/x + parameters[4] + parameters[5]*x
    return parameters[0]*parameters[0]*templates['noBAO'](x) + polynomials

Here we get the smooth model which we use to marginalise over the broadband power spectrum. We also introduce a smoothing scale (parameters[7]) and a scaling parameter (parameters[6]). The scaling parameter is the parameter we are mainly interested in, since it carries the wavelength of the oscillation signature and is the only parameter here which we will exploit for cosmological constraints.

Now let's apply this to the data. For that we need a metric which determined the goodness of fit. Here we use the $\chi^2$

def calc_chi2(parameters, data, templates, func):
    ''' Compares the model with the data '''
    if within_priors(parameters):
        chi2 = 0.
        # Loop over all datasets which are fit together
        for dataset in data:
            model = func(parameters, dataset['k'], templates)
            diff = (model - dataset['pk'])
            chi2 += np.dot(diff,np.dot(dataset['cov_inv'],diff))
        return chi2
    else:
        return 100000.

After storing the BOSS data as

boss_data = [{ 'k': pk_south['k'][krange_mask], 'pk': P_south[krange_mask],\

'label': 'BOSS south', 'cov': C_south, 'cov_inv': C_south_inv},\

{ 'k': pk_north['k'][krange_mask], 'pk': P_north[krange_mask],\

'label': 'BOSS north', 'cov': C_north, 'cov_inv': C_north_inv }]

we can call the following function

import scipy.optimize as op
from scipy.interpolate import interp1d
def fit(boss_data):
    '''
    Find the best fitting model
    '''
    k, pk, pk_nw = get_Pk(0.3)
    pk_nw_interp = interp1d(k, pk_nw)
    krange = np.arange(0.001, 0.5, 0.001)
    os_model = get_oscillation(krange, k, pk, pk_nw)
    start = [3.13, 0.005, 11., -1384, 7681., -12612., 1., 10.]
    result = op.minimize(calc_chi2, start, args=(boss_data, { 'noBAO': pk_nw_interp, 'os_model': os_model }, get_shifted_model))
    print("result['x'] = ", result['x'])
    return result

which returns the best fitting parameters. If you plot the best fitting model against the data you should get something like

Figure 4: BOSS measurements (data points) together with the

best fitting model.

And the best fitting parameters are

result['x'] = [ 3.13331263e+00 5.02073064e-03 1.12619733e+01 -1.38476373e+03

7.68158635e+03 -1.26123063e+04 9.95567388e-01 1.07156063e+01]

The problem is that we have no uncertainties attached to these parameters; we only have the best fitting values. Without uncertainties, we cannot use this to obtain cosmological constraints. So the next blog post will extend this by not just finding the best fitting parameters, but also getting the uncertainties.

cheers
Florian

Wednesday, October 11, 2017

Study the Universe with Python tutorial, part 3 -- covariance matrix

In the last two blog posts, we discussed how to download the galaxy catalogues of the Baryon Oscillation Spectroscopic Survey (BOSS) and calculate the power spectrum of this dataset. In this blog post, we will calculate uncertainties on the power spectrum measurement.

Uncertainties can be obtained in many different ways. Here we will use simulations of the BOSS dataset. Members of the BOSS collaboration have used supercomputers to simulate what a Universe could look like and they have generated thousands of such simulated BOSS datasets. All we have to do is to calculate the power spectrum of these thousands of simulations and measure the variance between them.

We don't just calculate the uncertainty; we also calculate the correlation between different data points. All the information we need is contained in the covariance matrix, which is defined as
\[C(k_i,k_j) = \begin{pmatrix} \sigma^2_{k_1} & cor(k_1,k_2)\sigma_{k_1}\sigma_{k_2} & \dots & cor(k_1,k_n)\sigma_{k_1}\sigma_{k_n} \\
cor(k_2,k_1)\sigma_{k_2}\sigma_{k_1} & \sigma^2_{k_2} & \dots & cor(k_2,k_n)\sigma_{k_2}\sigma_{k_n} \\
\vdots & \vdots & \ddots & \vdots \\ cor(k_n,k_1)\sigma_{k_n}\sigma_{k_1} & cor(k_n,k_2)\sigma_{k_n}\sigma_{k_2} & \dots & \sigma^2_{k_n} \end{pmatrix}\]Here $\sigma_{k_i}$ is the standard deviation of the power spectrum $P(k_i)$ at wavelength $k_i$. The off-diagonal elements of this matrix contain the uncertainties on a data point contributed by neighbouring data points, which are proportional to the correlation $cor(k_i,k_j)$ between the two data points.

First, we download the simulated BOSS catalogues together with the corresponding random catalogue. Since there are so many of them, this is quite a large file

wget -N https://data.sdss.org/sas/dr12/boss/lss/dr12_multidark_patchy_mocks/Patchy-Mocks-DR12SGC-COMPSAM_V6C.tar.gz -P path/to/folder/
wget -N https://data.sdss.org/sas/dr12/boss/lss/dr12_multidark_patchy_mocks/Patchy-Mocks-DR12NGC-COMPSAM_V6C.tar.gz -P path/to/folder/
wget -N https://data.sdss.org/sas/dr12/boss/lss/dr12_multidark_patchy_mocks/Patchy-Mocks-Randoms-DR12SGC-COMPSAM_V6C_x50.tar.gz -P path/to/folder/
wget -N https://data.sdss.org/sas/dr12/boss/lss/dr12_multidark_patchy_mocks/Patchy-Mocks-Randoms-DR12NGC-COMPSAM_V6C_x50.tar.gz -P path/to/folder/

All 2048 data catalogues can be paired with one random catalogue to measure the power spectrum of the simulated catalogues. Note that you can download the power spectrum from my website if you don't want to calculate it yourself or if you run into disk capacity problems. To read these files you can use the read_power() function provided here.

Using these power spectra, we calculate the covariance matrix using the calc_cov() function, which simply reads the power spectra, subtracts the shot noise, limits the range to $< k_{max} = 0.3h/$Mpc and makes use of the numpy.cov() function.

def calc_cov(N, tag='NGC'):
    ''' Read simulation power spectra and return covariance matrix '''
    list_of_pks = []
    for i in range(1, N):
        pk_file = "./Patchy_V6C_BOSS_DR12_%s_z3_1/ps1D_patchy_%s_z3_COMPnbar_TSC_V6C_%d_0.5_0.75_700_700_700_400_renorm.dat" % (tag, tag, i)
        pk = read_power(pk_file)
        P = pk['pk0'] - pk['SN(data+ran)']
        # Limit the k range
        k = pk['k']
        krange_mask = (k > 0.01) & (k < 0.3)
        P = P[krange_mask]
        list_of_pks.append(P)
    return np.cov(np.vstack(list_of_pks).T)

We can turn the covariance matrix into a correlation matrix
\[
R = \frac{C_{ij}}{\sqrt{C_{ii}C_{jj}}}
\] and plot it as

from statsmodels.stats.moment_helpers import cov2corr

def plot_cov(matrix):
    ''' Plot the correlation matrix derived from the covariance matrix '''
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    plt.imshow(cov2corr(matrix))
    plt.colorbar()
    plt.show()
    return

which should look like this

Figure 1: Correlation matrix, derived from the covariance

matrix of the northern part of BOSS using 1000 simulated

catalogues.

Figure 1 shows that there is some correlation between the power spectrum bins close to each other, while some bins which are far apart are anti-correlated.

The diagonal elements of the covariance matrix represent the variance in the power spectrum if the different data points would not be correlated
\[
\sigma_{P(k_i)} = \sqrt{C(k_i,k_i)}.
\]
Using these uncertainties, we can plot the BOSS measurements with error bars

Figure 2: The BOSS power spectrum measurements including
the diagonal elements of the covariance matrix as error bars.

With this, we now have the power spectra and covariance matrix, which is all we need to start using this dataset to constrain cosmological parameters. We will start with that in the next blog post.
You can find the code for this project on GitHub. Let me know if you have any questions/comments below.
cheers
Florian