My statistics blog: About the assumption of a Gaussian likelihood in a data model comparison

This is a blog post about a recent publication led by ChangHoon Hahn, which I co-authored (Hahn et al. 2018). The paper looks at the impact of non-Gaussian likelihoods for cosmological parameter estimation. This blog post attempts to explain the content of this paper for a general audience.

When finding the best model to describe a given dataset, most studies employ the $\chi^2$ statistic, which is a measure of how well the model and the data agree
\[
\chi^2 = \sum^{d} (data - model)^TC^{-1}(data - model),
\]where $d$ is the number of data points in the dataset, and the covariance matrix $C$ describes the error of the data points. This statistic is directly related to the so-called likelihood, which can be obtained as
\[
p(x|\theta) = \frac{1}{(2\pi)^{d/2}\sqrt{det(C)}} \exp{\left(-0.5\chi^2\right)}.
\]This likelihood describes the probability that one could have observed the data, given that the underlying truth is described by the model with the parameters $\theta$. So whenever somebody observes some data, the final analysis step is usually to calculate this likelihood and find the parameters $\theta$ which have the largest likelihood.

However, the likelihood above has a Gaussian shape, and it is not guaranteed that the probability distribution has that form. So the question we are addressing in this publication is, how close is the likelihood to a Gaussian form and if there is a deviation, how does it impact the analysis.

The first question to answer is, why do we assume a Gaussian likelihood in the first place?

The Gaussian distribution has a central role in statistics because it has been shown that when collecting many different measurements of the same event, these measurements often follow a Gaussian distribution. For example, if you throw a coin 100 times and calculate the mean, this mean is probably somewhere close to 0.5.

If you make that experiment many times, meaning you throw the coin 100 times, calculate the mean of the 100 throws and repeat that many times. The distribution of means will peak close to 0.5 and will show a Gaussian shape (see Figure 1). This is quite surprising because the probability of throwing heads or tails is a top-hat distribution, which looks not at all like a Gaussian. So adding many top-hat distributions leads somehow to a Gaussian distribution.

Figure 1: Probability distribution for the mean of many throws of a coin. The number of throws is shown in the top gray
box. With an increasing number of throws, the distribution looks more and more Gaussian (source: Wikipedia).

Figure 1 shows the probability distribution when looking at the mean of many coin flips. The distribution looks more and more Gaussian if the number of coin flips is increased.

In fact, that happens quite often in nature, the likelihood distribution of a large number of events often looks like a Gaussian, even if the likelihood distribution for a single event does not. This is known in statistics as the central limit theorem.

However, given the example above you can already see that the assumption of a Gaussian likelihood only works for large numbers. Figure 1 shows clearly that the probability distribution is far from Gaussian if one calculates the mean from only a small number of coin flips.

Going back to our paper. We looked at the power spectrum analysis of the Baryon Oscillation Spectroscopic Survey published in Beutler et al. (2016) and the group multiplicity function analysis of Sinha et al (2017).

Let's start with the power spectrum analysis. I discussed this analysis in several blog posts (e.g. here) already so I will skip the details about it. Here we do not care too much about the details of the measurement anyway, we just want to test whether the assumption of a Gaussian likelihood is justified.

In the power spectrum case one calculates the mean of many Fourier modes and the question is whether we have enough modes to make the likelihood distribution Gaussian. In general, there are many modes on small scales (large wavevector k) but only a small number of modes on large scales (small wavevector k). E.g. at $k=0.01h/$Mpc only a few dozen modes contribute to the value of the power spectrum, while at k=0.2h/Mpc many thousand modes contribute. Whether this matters or not depends on how this impacts the parameters of the model, which is not straightforward.

There is a very simple way to check whether the likelihood is Gaussian. We have produced mock catalogs for the BOSS survey, which mimic the BOSS dataset. Each of these mock catalogs represents a different possible realization of the measurement of BOSS. Therefore we just have to measure the power spectrum of all these mock catalogs and see whether those power spectra scatter around their mean in the shape of a Gaussian.

Whatever the distribution is going to be, we then use that distribution rather than assuming a Gaussian when searching for the model with the maximum likelihood.

Figure 2. This plot shows the likelihood distribution for the different cosmological parameters derived from
the BOSS power spectrum. The blue distribution assumes a Gaussian likelihood, which the orange
distribution assumes a likelihood distribution seen in the mock catalogs. You see there is overall good
agreement except for a small shift in $f\sigma_8$.

Figure 2 shows the likelihood distribution for each cosmological parameter. Note this is a different likelihood than what we talked about before. The likelihood we talked about before describes the probability of observing the data vector given that the underlying model is the true model, while the likelihood in Figure 2 describes the probability of observing the data given that parameter value.

The blue distribution assumes a Gaussian likelihood, while in orange we use the likelihood from the mocks. The two likelihoods seem to agree well, except for the parameter $f\sigma_8$, where we found a small shift to larger values.

Next the group multiplicity function analysis of Sinha et al (2017). The likelihood distributions are again shown in blue and orange in Figure 3. For this analysis, you can see that the two likelihoods agree less well, especially in terms of how peaked the distributions are. Removing the assumption of a Gaussian likelihood seems to increase the error on almost all derived parameters.

Figure 3: Comparison between the original group multiplicity function analysis of Sinha et al (2017) with the new analysis dropping the assumption of a Gaussian likelihood. The two distributions show significant
deviations for most parameters.

In conclusion, the assumption of the form of the likelihood can impact cosmological parameter constraints and should be accounted for. With larger datasets, the statistics usually gets better and the issue discussed here will get smaller. However, observables like primordial non-Gaussianity are located in the regime with small mode statistics and hence such studies must check such biases carefully.

I hope this was useful. Let me know if you have any comments/questions in the comment section below.
best
Florian

My statistics blog

Saturday, March 31, 2018

About the assumption of a Gaussian likelihood in a data model comparison

No comments:

Post a Comment