Thursday, October 12, 2017

Study the Universe with Python tutorial, part 5 -- Monte Carlo Markov Chain

This is the fifth blog post in this series which discusses the Baryon Oscillation Spectroscopic dataset (BOSS). In the last 4 posts we downloaded the data, calculated the power spectrum and covariance matrix and isolated the BAO feature. Now we want to apply this procedure to the data and extract cosmological information.

In the last post we used the scipy minimize function to find the best fitting model. However the best fitting model is not very useful, if we don't know the uncertainties attached to it.

What we really need is a probability for each possible model which could describe our Universe. The probability for model $Y$ with parameters x should reflect the probability that we would observe the BOSS dataset if the Universe were described by $Y(x)$. Such a probability distribution is called a likelihood and a very powerful way to get such a likelihood is a Monte Carlo Markov Chain (MCMC).

What is an MCMC? An MCMC is a chain of parameter sets which has the nice property that the appearance of each parameter in the chain also represents its likelihood distribution. This means one can take the chain, produce a histogram of parameter x which directly provides the (marginalised) likelihood for parameter x (marginalised just means that you integrate over the likelihood of all other parameters).

There are many algorithms which can produce such a chains, but the most basic and well known is the Metropolis-Hastings algorithm. Here we will just focus on the python implementation of such a procedure, but if you have never encountered MCMC it is certainly worth learning about it.

We will use the MCMC emcee package which can be installed with
>> pip install emcee
and included with
import emcee
We can call this MCMC sampler like
nwalkers = 20
dim = len(start)
expected_error = [0.1, 1., 1., 1., 1., 1., 0.05, 0.1]
sampler = emcee.EnsembleSampler(nwalkers=nwalkers, dim=dim, lnpostfn=get_loglike,\
 args=(boss_data, { 'noBAO': Pk_without_BAO, 'os_model': os_model }, get_shifted_model))
start_positions = [(result['x'] + (2.*np.random.random_sample((dim,)) - 1)*expected_error) for i in range(nwalkers)]
sampler.run_mcmc(start_positions, 1000)
Here we have to specify the number of walkers (nwalkers) since this MCMC package will produce several MCMC chains in parallel (each walker produces one chain). The separate chains produced by each walker are not really independent, since they communicate with each other (Goodman & Weare, 2010). We also need to specify the starting values for the different walkers. To do that we use the results from the minimisation procedure discussed in the last post and add a random variation around it. Different to our minimisation function in the last blog post, this algorithm considers high values a good match. Therefore we have to write a small function to turn our $\chi^2$ into a log-likelihood, which is defined as
def get_loglike(parameters, data, templates, func):
    return -0.5*calc_chi2(parameters, data, templates, func)
The run_mcmc() method produces a chain with 1000 rows times the number of walkers.

The problem with MCMC is that it requires a certain time to converge. This means the number of appearances of a parameter only reflects the likelihood if the chain has converged. Determining the convergence is very tricky, and there is no procedure which is guaranteed to detect the correct convergence. Here we will use the Gelman & Rubin criterium.

The Gelman & Rubin criterium compares several independent chains and tries to determine whether the variances within these chains are similar. Here we will not discuss the details of the implementation, but you can read more about it in this interesting blog post.

To use the Gelman & Rubin criterium we need to use several MCMC chains and compare them. The chains are independent, thus usually ran in parallel to save time. This means we have to re-write the MCMC procedure
def runMCMC(start, data, templates):
    ''' Perform MCMC '''
    dim = len(start)
    Nchains = 4
    nwalkers = 20
    ichaincheck = 400
    minlength = 2000
    epsilon = 0.01

    labels = ['b', 'A1', 'A2', 'A3', 'A4', 'A5', 'alpha', 'sigmaNL']
    expected_error = [0.1, 1., 1., 1., 1., 1., 0.05, 0.1]
    # Set up the sampler.
    pos=[]
    list_of_samplers=[]
    for jj in range(0, Nchains):
        pos.append([start + (2.*np.random.random_sample((dim,)) - 1.)*expected_error for i in range(nwalkers)])
        list_of_samplers.append(emcee.EnsembleSampler(nwalkers=nwalkers, dim=dim, lnpostfn=get_loglike,\
         args=(data, templates, get_shifted_model)))

    # Start MCMC
    print("Running MCMC... ")
    within_chain_var = np.zeros((Nchains, dim))
    mean_chain = np.zeros((Nchains, dim))
    scalereduction = np.arange(dim, dtype=np.float)
    scalereduction.fill(2.)

    itercounter = 0
    chainstep = minlength
    while any(abs(1. - scalereduction) > epsilon):
        itercounter += chainstep
        for jj in range(0, Nchains):
            for result in list_of_samplers[jj].sample(pos[jj], iterations=chainstep, storechain=True):
                pos[jj] = result[0]
            # we do the convergence test on the second half of the current chain (itercounter/2)
            within_chain_var[jj], mean_chain[jj] = prep_gelman_rubin(list_of_samplers[jj])
        scalereduction = gelman_rubin_convergence(within_chain_var, mean_chain, int(itercounter/2))
        print("scalereduction = ", scalereduction)
        chainstep = ichaincheck
    # Investigate the chain and print out some metrics
    inspect_chain(list_of_samplers, labels)
    return 
This function basically does the same thing as the function we had before, but it now runs within a loop, which is only broken when the Gelman & Rubin parameters in the scalereduction array are within a certain range of unity.

The inspect_chain() function provides several diagnostics, with which we can further test whether the chain has actually converged. For example we can print out the timeline of the sampling range for each parameter
fig, axes = plt.subplots(dim, 1, sharex=True, figsize=(8, 9))
for i in range(0, dim):
    for jj in range(0, Nchains):
        axes[i].plot(list_of_samplers[jj].chain[:, :, i].T, alpha=0.4)
    #axes[i].yaxis.set_major_locator(MaxNLocator(5))
    axes[i].set_ylabel(labels[i])
fig.tight_layout(h_pad=0.0)
fig.savefig("%stime_series.png" % output_path)
which gives
Figure 1: Variance in the different parameters. The convergence
has been reached if the variance is constant with time. In the early
stages of the chain the variance changes, indicating the the
convergence has not been reached.
This figure shows that the sampling range varies in the beginning, which indicates that the chain has not converged yet. The second half of the chain has an almost constant variance which is a good indication of convergence.

If we convinced ourselves that the chain has indeed converged we can plot the likelihood. Here we use corner.py which can be installed with
>> pip install corner
and included with
import corner
We can then feed our chain into this plotting package as
fig = corner.corner(mergedsamples, quantiles=[0.16, 0.5, 0.84], plot_density=False,\
    show_titles=True, title_fmt=".3f", labels=labels)
fig.savefig("%scorner.png" % output_path)
which gives
Figure 2: 2D likelihood distributions for the 7 parameters in our fit.
The 1D marginalised likelihood is given on the right together with the
$68\%$ confidence levels. 
Finally we can also print out the constraints with
def get_percentiles(chain, labels):
    ''' Calculate constraints and uncertainties from MCMC chain '''
    per = np.percentile(chain, [50., 15.86555, 84.13445, 2.2775, 97.7225], axis=0)
    per = [per[0], per[0]-per[1], per[2]-per[0], per[0]-per[3], per[4]-per[0]]
    for i in range(0, len(per[0])):
        print("%s = %f +%f -%f +%f -%f" % (labels[i], per[0][i], per[1][i], per[2][i], per[3][i], per[4][i]))
    return per
and with the BOSS data we should get something like
>> b = 2.295306 +0.314054 -0.312351 +0.584276 -0.610636
A1 = -0.151418 +0.111546 -0.111835 +0.223292 -0.232080
A2 = 41.676085 +16.889086 -17.505383 +30.571086 -37.981033
A3 = -3306.605503 +1642.375097 -1467.782597 +3487.705208 -2538.709703
A4 = 14835.170765 +5912.070481 -6608.236429 +10122.305967 -14237.721237
A5 = -21141.535952 +9150.798840 -8221.692166 +20105.713945 -14085.031087
alpha = 0.990088 +0.032634 -0.027241 +0.077669 -0.059837
sigmaNL = 11.378695 +2.975023 -4.083891 +5.036540 -7.682274
Now what does that mean in terms of cosmological parameters? Well as mentioned before, the only parameter of interest here is the scaling parameter $\alpha$. This parameter is defined as
\[
\alpha = \frac{D_V}{D^{\rm fid}_V}
\]where the the distance $D_V$ is defined as
\[
D_V = \left[(1+z)^2D^2_A\frac{cz}{H(z)}\right]^{1/3}
\]The superscribed $^{\rm fid}$ just means that this is the same quantity in the fiducial cosmology, which is the one we defined in the cosmo class in nbodykit.

The distance $D_V(z)$ is an average of the angular diameter distance $D_A(z)$ and the Hubble parameter $H(z)$
\[
\begin{align}
D_A(z) &= \frac{c}{1+z}\int^z_0\frac{dz'}{H(z')}\\
H(z) &= H_0\left[(\Omega_cdm+\Omega_b)(1+z)^3 + \Omega_{\Lambda}\right]^{1/2}
\end{align}
\]With this we can now turn the constraint on $\alpha$ into a constraint on $D_V$
Hz = cosmo.efunc(zmean)*cosmo.H0*cosmo.h
DC = (1.+zmean)*cosmo.angular_diameter_distance(zmean)/cosmo.h
c_km_s = (speed_of_light/1000.)
DVfid = ( DC**2*(zmean*c_km_s/Hz) )**(1./3.)
DV = per[:,6]*DVfid
print("DV = %f +%f -%f +%f -%f\n" % (DV[0], DV[1], DV[2], DV[3], DV[4]))
Which should show something like
>> DV = 2152.179924 +69.092697 -57.551602 +164.343396 -121.202709
The next step would be to combine this constraint with measurements from the Cosmic Microwave Background (CMB), to get constraints on actual cosmological parameters. But for now I will leave it here.

Thanks a lot for making it to the end of this series. If you are interested in going further, you should consider a few extensions to this analysis:
  • Have a look at density field reconstruction
  • Consider an anisotropic analysis to constrain $D_A(z)$ and $H(z)$ independently
  • Do you think the polynomial model we use is the best choice? Can you come up with a better model?
  • Use the simulated BOSS catalogues to test whether our procedure is robust
The final code for this analysis is available on GitHub.
cheers
Florian

No comments:

Post a Comment