When finding the best model to describe a given dataset, most studies employ the $\chi^2$ statistic, which is a measure of how well the model and the data agree
\[
\chi^2 = \sum^{d} (data - model)^TC^{-1}(data - model),
\]where $d$ is the number of data points in the dataset, and the covariance matrix $C$ describes the error of the data points. This statistic is directly related to the so-called likelihood, which can be obtained as
\[
p(x|\theta) = \frac{1}{(2\pi)^{d/2}\sqrt{det(C)}} \exp{\left(-0.5\chi^2\right)}.
\]This likelihood describes the probability that one could have observed the data, given that the underlying truth is described by the model with the parameters $\theta$. So whenever somebody observes some data, the final analysis step is usually to calculate this likelihood and find the parameters $\theta$ which have the largest likelihood.
However, the likelihood above has a Gaussian shape, and it is not guaranteed that the probability distribution has that form. So the question we are addressing in this publication is, how close is the likelihood to a Gaussian form and if there is a deviation, how does it impact the analysis.
The first question to answer is, why do we assume a Gaussian likelihood in the first place?
The Gaussian distribution has a central role in statistics because it has been shown that when collecting many different measurements of the same event, these measurements often follow a Gaussian distribution. For example, if you throw a coin 100 times and calculate the mean, this mean is probably somewhere close to 0.5.
If you make that experiment many times, meaning you throw the coin 100 times, calculate the mean of the 100 throws and repeat that many times. The distribution of means will peak close to 0.5 and will show a Gaussian shape (see Figure 1). This is quite surprising because the probability of throwing heads or tails is a top-hat distribution, which looks not at all like a Gaussian. So adding many top-hat distributions leads somehow to a Gaussian distribution.
Figure 1: Probability distribution for the mean of many throws of a coin. The number of throws is shown in the top gray box. With an increasing number of throws, the distribution looks more and more Gaussian (source: Wikipedia). |
In fact, that happens quite often in nature, the likelihood distribution of a large number of events often looks like a Gaussian, even if the likelihood distribution for a single event does not. This is known in statistics as the central limit theorem.
However, given the example above you can already see that the assumption of a Gaussian likelihood only works for large numbers. Figure 1 shows clearly that the probability distribution is far from Gaussian if one calculates the mean from only a small number of coin flips.
Going back to our paper. We looked at the power spectrum analysis of the Baryon Oscillation Spectroscopic Survey published in Beutler et al. (2016) and the group multiplicity function analysis of Sinha et al (2017).
Let's start with the power spectrum analysis. I discussed this analysis in several blog posts (e.g. here) already so I will skip the details about it. Here we do not care too much about the details of the measurement anyway, we just want to test whether the assumption of a Gaussian likelihood is justified.
In the power spectrum case one calculates the mean of many Fourier modes and the question is whether we have enough modes to make the likelihood distribution Gaussian. In general, there are many modes on small scales (large wavevector k) but only a small number of modes on large scales (small wavevector k). E.g. at $k=0.01h/$Mpc only a few dozen modes contribute to the value of the power spectrum, while at k=0.2h/Mpc many thousand modes contribute. Whether this matters or not depends on how this impacts the parameters of the model, which is not straightforward.
There is a very simple way to check whether the likelihood is Gaussian. We have produced mock catalogs for the BOSS survey, which mimic the BOSS dataset. Each of these mock catalogs represents a different possible realization of the measurement of BOSS. Therefore we just have to measure the power spectrum of all these mock catalogs and see whether those power spectra scatter around their mean in the shape of a Gaussian.
Whatever the distribution is going to be, we then use that distribution rather than assuming a Gaussian when searching for the model with the maximum likelihood.
The blue distribution assumes a Gaussian likelihood, while in orange we use the likelihood from the mocks. The two likelihoods seem to agree well, except for the parameter $f\sigma_8$, where we found a small shift to larger values.
Next the group multiplicity function analysis of Sinha et al (2017). The likelihood distributions are again shown in blue and orange in Figure 3. For this analysis, you can see that the two likelihoods agree less well, especially in terms of how peaked the distributions are. Removing the assumption of a Gaussian likelihood seems to increase the error on almost all derived parameters.
I hope this was useful. Let me know if you have any comments/questions in the comment section below.
best
Florian
No comments:
Post a Comment