My statistics blog: Study the Universe with Python tutorial, part 4 -- minimizing $\chi^2$

This is the fourth blog post about how to use the Baryon Oscillation Spectroscopic (BOSS) dataset to constrain cosmology. In the first three posts, we downloaded the data, calculated the power spectrum and obtained a covariance matrix.

The power spectrum we measured carries a huge amount of information about cosmological parameters, but most of that information cannot be accessed easily. The reason is that we cannot calculate reliable models for the power spectrum, to compare to the data. In a previous post, we calculated models using the class code, but these models are only accurate on very large scales (small wavenumbers).

For that reason, we will focus here on one particular signal, which can be described with a very simple model. This signal is called Baryon Acoustic Oscillations (BAO). BAO in the galaxy distribution were detected for the first time in 2005 and since that first detection, BAO have become one of the most powerful cosmological probes. In fact, the BAO measurement is one of the main reasons why the BOSS dataset has been assembled.

The BAO signal shows up as an oscillation signal on top of the power spectrum. In a previous plot, we saw model power spectra for different amounts of dark matter

Figure 1: Model power spectrum for different amounts
of dark matter.

You can see the BAO signal as an oscillation. This is the signal we are after. The BAO signal is much more robust (easier to model) compared to the rest of the power spectrum and, therefore, we will use a model for the power spectrum where we get rid of all but the oscillation signal.

We can do that by fitting a power spectrum without a BAO signal to a power spectrum with BAO signal. Fitting just means that we test many different power spectra and pick the one which matches the best.

The nbodykit package provides power spectra without BAO signal, all that we have to do is call the modelling function we used earlier but include 'numerical_nowiggle': 'yes', to the parameter file. We can than extract the power spectrum without wiggles as

Pno_wiggle = np.array([cosmo.pk_numerical_nw(ki, z) for ki in k])

And if we plot such power spectrum with the models we had earlier we get

Figure 2: Same power spectra as in Figure 1, but calculated without
BAO signal (dashed lines).

Comparing the dashed and solid lines in Figure 2 shows that these power spectra are the same except for the BAO wiggles, which have been removed.

With this we can now get a model for the BAO signal only

from scipy.interpolate import interp1d
def get_oscillation(krange, k, pk, pk_nw):
    ''' Get an oscillation only power spectrum '''
    pk_interp = lambda x: interp1d(k, pk)(x)
    pk_nw_interp = lambda x: interp1d(k, pk_nw)(x)
    yolin = pk_interp(krange)/pk_nw_interp(krange)
    return yolin

Here is a potential output of the get_oscillation() function

Figure 3: The galaxy power spectrum divided by the best fitting
no-BAO power spectrum. With this technique, we can isolate the
BAO feature.

Our procedure removed the entire broadband power spectrum and isolated the oscillation signature. Before we apply this to the data, we have to extend the model

def get_shifted_model(parameters, x, templates):
    ''' Calculate a model including a scaling parameter '''
    model = get_smooth_model(parameters, x, templates)
    return model*(1. + (templates['os_model'](x/parameters[6]) - 1.)*np.exp(-0.5*x**2*parameters[7]**2))

with

def get_smooth_model(parameters, x, templates):
    ''' Combine a noBAO model with polynomials and linear bias '''
    polynomials = parameters[1]/x**3 + parameters[2]/x**2 + parameters[3]/x + parameters[4] + parameters[5]*x
    return parameters[0]*parameters[0]*templates['noBAO'](x) + polynomials

Here we get the smooth model which we use to marginalise over the broadband power spectrum. We also introduce a smoothing scale (parameters[7]) and a scaling parameter (parameters[6]). The scaling parameter is the parameter we are mainly interested in, since it carries the wavelength of the oscillation signature and is the only parameter here which we will exploit for cosmological constraints.

Now let's apply this to the data. For that we need a metric which determined the goodness of fit. Here we use the $\chi^2$

def calc_chi2(parameters, data, templates, func):
    ''' Compares the model with the data '''
    if within_priors(parameters):
        chi2 = 0.
        # Loop over all datasets which are fit together
        for dataset in data:
            model = func(parameters, dataset['k'], templates)
            diff = (model - dataset['pk'])
            chi2 += np.dot(diff,np.dot(dataset['cov_inv'],diff))
        return chi2
    else:
        return 100000.

After storing the BOSS data as

boss_data = [{ 'k': pk_south['k'][krange_mask], 'pk': P_south[krange_mask],\

'label': 'BOSS south', 'cov': C_south, 'cov_inv': C_south_inv},\

{ 'k': pk_north['k'][krange_mask], 'pk': P_north[krange_mask],\

'label': 'BOSS north', 'cov': C_north, 'cov_inv': C_north_inv }]

we can call the following function

import scipy.optimize as op
from scipy.interpolate import interp1d
def fit(boss_data):
    '''
    Find the best fitting model
    '''
    k, pk, pk_nw = get_Pk(0.3)
    pk_nw_interp = interp1d(k, pk_nw)
    krange = np.arange(0.001, 0.5, 0.001)
    os_model = get_oscillation(krange, k, pk, pk_nw)
    start = [3.13, 0.005, 11., -1384, 7681., -12612., 1., 10.]
    result = op.minimize(calc_chi2, start, args=(boss_data, { 'noBAO': pk_nw_interp, 'os_model': os_model }, get_shifted_model))
    print("result['x'] = ", result['x'])
    return result

which returns the best fitting parameters. If you plot the best fitting model against the data you should get something like

Figure 4: BOSS measurements (data points) together with the

best fitting model.

And the best fitting parameters are

result['x'] = [ 3.13331263e+00 5.02073064e-03 1.12619733e+01 -1.38476373e+03

7.68158635e+03 -1.26123063e+04 9.95567388e-01 1.07156063e+01]

The problem is that we have no uncertainties attached to these parameters; we only have the best fitting values. Without uncertainties, we cannot use this to obtain cosmological constraints. So the next blog post will extend this by not just finding the best fitting parameters, but also getting the uncertainties.

cheers
Florian

My statistics blog

Thursday, October 12, 2017

Study the Universe with Python tutorial, part 4 -- minimizing $\chi^2$

No comments:

Post a Comment