My statistics blog: Study the Universe with Python tutorial, part 4 -- minimizing $\chi^2$

This is the fourth blog post about how to use the Baryon Oscillation Spectroscopic (BOSS) dataset to constrain cosmology. In the first three posts, we downloaded the data, calculated the power spectrum and obtained a covariance matrix.

The power spectrum we measured carries a huge amount of information about cosmological parameters, but most of that information cannot be accessed easily. The reason is that we cannot calculate reliable models for the power spectrum, to compare to the data. In a previous post, we calculated models using the class code, but these models are only accurate on very large scales (small wavenumbers).

For that reason, we will focus here on one particular signal, which can be described with a very simple model. This signal is called Baryon Acoustic Oscillations (BAO). BAO in the galaxy distribution were detected for the first time in 2005 and since that first detection, BAO have become one of the most powerful cosmological probes. In fact, the BAO measurement is one of the main reasons why the BOSS dataset has been assembled.

The BAO signal shows up as an oscillation signal on top of the power spectrum. In a previous plot, we saw model power spectra for different amounts of dark matter

Figure 1: Model power spectrum for different amounts
of dark matter.

You can see the BAO signal in both power spectra as an oscillation. This is the signal we are after. The BAO signal is much more robust (easier to model) compared to the rest of the power spectrum and therefore we will use a model for the power spectrum where we get rid of all but the oscillation signal.

We can do that by fitting a power spectrum without a BAO signal to a power spectrum with BAO signal. Fitting just means that we test many different power spectra and pick the one which matches the best.

The nbodykit package provides power spectra without BAO signal, all that we have to do is call the LinearPower() function with the transfer parameter 'NoWiggleEisensteinHu'.

Pk = nbodykit.lab.cosmology.power.linear.LinearPower(cosmo, redshift=0, transfer='NoWiggleEisensteinHu')

And if we plot such a power spectrum we get

Figure 2: Same power spectra as in Figure 1, but calculated
with the transfer parameter 'NoWiggleEisensteinHu', which
calculates the power spectrum without BAO signal. You can see
that the BAO wiggles have been removed.

Comparing the power spectra in Figure 1 and Figure 2 shows that these power spectra are the same except for the BAO wiggles, which have been removed.

Our model uses the noBAO power spectrum together with 5 additional polynomial terms. These parameters have no physical motivation, but their whole purpose is to make this fit independent of the broadband and focus on the oscillation signature.
\[
P^{model}(k) = p^2_0P^{noBAO}(k) + \frac{p_1}{k^3} + \frac{p_2}{k^2}+ \frac{p_3}{k} + p_4 + p_5k.
\]
and in Python

def get_smooth_model(parameters, x, templates):
    ''' Combine a noBAO model with polynomials and linear bias '''
    polynomials = parameters[1]/x**3 + parameters[2]/x**2 + parameters[3]/x + parameters[4] + parameters[5]*x
    return parameters[0]*parameters[0]*templates['noBAO'](x) + polynomials

With this, we can now extract the oscillation signature from any power spectrum model

def get_oscillation(krange, Pk_class, Pk_without_BAO):
    ''' Get an oscillation only power spectrum '''
    cov_inv = np.identity(len(krange))
    start = [1., 0., 0., 0., 0., 0.]
    result = op.minimize(calc_chi2, start, args=( [{ 'k': krange, 'pk': Pk_class(krange), 'cov_inv': cov_inv }],\
     { 'noBAO': Pk_without_BAO }, get_smooth_model))
    yolin = Pk_class(krange)/get_smooth_model(result["x"], krange, { 'noBAO': Pk_without_BAO })
    return interp.interp1d(krange, yolin)

Here we use the minimize() function from the scipy.optimize package

import scipy.optimize as op

We also need to define a metric, meaning we need to specify how we distinguish good models from bad models. Here we use the $\chi^2$, which we can calculate as

def calc_chi2(parameters, data, templates, func):
    ''' Compares the model with the data '''
    if within_priors(parameters):
        chi2 = 0.
        # Loop over all datasets which are fit together
        for dataset in data:
            model = func(parameters, dataset['k'], templates)
            diff = (model - dataset['pk'])
            chi2 += np.dot(diff,np.dot(dataset['cov_inv'],diff))
        return chi2
    else:
        return 100000.

The within_priors() function tests whether the parameters are within a certain range (prior), otherwise it returns a very large $\chi^2$.

Here is a potential output of the get_oscillation() function

Figure 3: The galaxy power spectrum divided by the best fitting
no-BAO power spectrum. With this technique, we can isolate the
BAO feature.

Our procedure removed the entire broadband power spectrum and isolated the oscillation signature. Before we apply this to the data, we have to extend the model

def get_shifted_model(parameters, x, templates):
    ''' Calculate a model including a scaling parameter '''
    model = get_smooth_model(parameters, x, templates)
    return model*(1. + (templates['os_model'](x/parameters[6]) - 1.)*np.exp(-0.5*x**2*parameters[7]**2))

Here we get the smooth model we discussed above, but modify it by introducing a smoothing scale (parameters[7]) and a scaling parameter (parameters[6]). The scaling parameter is the parameter we are actually interested in, since it carries the wavelength of the oscillation signature and is the only parameter here which we will exploit for cosmological constraints.

Now let's apply this to the data.

Pk_without_BAO = nbodykit.lab.cosmology.power.linear.LinearPower(cosmo, redshift=0, transfer='NoWiggleEisensteinHu')
Pk_class = nbodykit.lab.cosmology.power.linear.LinearPower(cosmo, redshift=0, transfer='CLASS')
krange = np.arange(0.001, 0.5, 0.001)
os_model = get_oscillation(krange, Pk_class, Pk_without_BAO)
start = [2.37, -0.076, 38., -3547., 15760., -22622., 1., 9.41]
result = op.minimize(calc_chi2, start, args=(boss_data, { 'noBAO': Pk_without_BAO, 'os_model': os_model },\
get_shifted_model))
print("result['x'] = ", result['x'])

And with this, you should get something like this

Figure 4: BOSS measurements (data points) together with the

best fitting model.

And the best fitting parameters are
>> result['x'] = [ 2.35228409e+00 -1.40192579e-01 4.30619066e+01 -3.57546008e+03

1.58788229e+04 -2.25314741e+04 9.96151333e-01 9.47806970e+00]

The problem is that we have no uncertainties attached to these parameters; we only have the best fitting values. Without uncertainties, we cannot use this to obtain cosmological constraints. So the next blog post will extend this by not just finding the best fitting parameters, but also getting the uncertainties.

The code for this project is available on GitHub. Let me know if you have any questions/comments below.
cheers
Florian

My statistics blog

Thursday, October 12, 2017

Study the Universe with Python tutorial, part 4 -- minimizing $\chi^2$

No comments:

Post a Comment