In the last post, we used the scipy minimize function to find the best-fitting model. However, the best-fitting model is not very useful if we don't know the uncertainties attached to it.
What we really need is a probability for each possible model which could describe our Universe. The probability for model $Y$ with parameters x should reflect the probability that we would observe the BOSS dataset if the Universe were described by $Y(x)$. Such a probability distribution is called a likelihood and a very powerful way to get such a likelihood is a Monte Carlo Markov Chain (MCMC).
What is an MCMC? An MCMC is a chain of parameter sets which has the nice property that the appearance of each parameter in the chain also represents its likelihood distribution. This means one can take the chain, produce a histogram of parameter x which directly provides the (marginalised) likelihood for parameter x (marginalised just means that you integrate over the likelihood of all other parameters).
There are many algorithms which can produce such a chains, but the most basic and well-known is the Metropolis-Hastings algorithm. Here we will just focus on the Python implementation of such a procedure, but if you have never encountered MCMC it is certainly worth learning about it.
We will use the MCMC emcee package which can be installed with
>> pip install emcee
and included with
import emcee
nwalkers = 20 dim = len(start) expected_error = [0.1, 1., 1., 1., 1., 1., 0.05, 0.1] sampler = emcee.EnsembleSampler(nwalkers=nwalkers, dim=dim, lnpostfn=get_loglike,\ args=(boss_data, { 'noBAO': Pk_without_BAO, 'os_model': os_model }, get_shifted_model)) start_positions = [(result['x'] + (2.*np.random.random_sample((dim,)) - 1)*expected_error) for i in range(nwalkers)] sampler.run_mcmc(start_positions, 1000)
def get_loglike(parameters, data, templates, func): return -0.5*calc_chi2(parameters, data, templates, func)
The problem with MCMC is that it requires a certain time to converge. This means the number of appearances of a parameter only reflects the likelihood if the chain has converged. Determining the convergence is very tricky, and there is no procedure which is guaranteed to detect the correct convergence. Here we will use the Gelman & Rubin criterium.
The Gelman & Rubin criterium compares several independent chains and tries to determine whether the variances within these chains are similar. Here we will not discuss the details of the implementation, but you can read more about it in this interesting blog post.
To use the Gelman & Rubin criterium we need to use several MCMC chains and compare them. The chains are independent, thus usually ran in parallel to save time. This means we have to re-write the MCMC procedure
def runMCMC(start, data, templates): ''' Perform MCMC ''' dim = len(start) Nchains = 4 nwalkers = 20 ichaincheck = 400 minlength = 2000 epsilon = 0.01 labels = ['b', 'A1', 'A2', 'A3', 'A4', 'A5', 'alpha', 'sigmaNL'] expected_error = [0.1, 1., 1., 1., 1., 1., 0.05, 0.1] # Set up the sampler. pos=[] list_of_samplers=[] for jj in range(0, Nchains): pos.append([start + (2.*np.random.random_sample((dim,)) - 1.)*expected_error for i in range(nwalkers)]) list_of_samplers.append(emcee.EnsembleSampler(nwalkers=nwalkers, dim=dim, lnpostfn=get_loglike,\ args=(data, templates, get_shifted_model))) # Start MCMC print("Running MCMC... ") within_chain_var = np.zeros((Nchains, dim)) mean_chain = np.zeros((Nchains, dim)) scalereduction = np.arange(dim, dtype=np.float) scalereduction.fill(2.) itercounter = 0 chainstep = minlength while any(abs(1. - scalereduction) > epsilon): itercounter += chainstep for jj in range(0, Nchains): for result in list_of_samplers[jj].sample(pos[jj], iterations=chainstep, storechain=True): pos[jj] = result[0] # we do the convergence test on the second half of the current chain (itercounter/2) within_chain_var[jj], mean_chain[jj] = prep_gelman_rubin(list_of_samplers[jj]) scalereduction = gelman_rubin_convergence(within_chain_var, mean_chain, int(itercounter/2)) print("scalereduction = ", scalereduction) chainstep = ichaincheck # Investigate the chain and print out some metrics inspect_chain(list_of_samplers, labels) return
The inspect_chain() function provides several diagnostics, with which we can further test whether the chain has actually converged. For example, we can print out the timeline of the sampling range for each parameter
fig, axes = plt.subplots(dim, 1, sharex=True, figsize=(8, 9)) for i in range(0, dim): for jj in range(0, Nchains): axes[i].plot(list_of_samplers[jj].chain[:, :, i].T, alpha=0.4) #axes[i].yaxis.set_major_locator(MaxNLocator(5)) axes[i].set_ylabel(labels[i]) fig.tight_layout(h_pad=0.0) fig.savefig("%stime_series.png" % output_path)
Figure 1: Variance in the different parameters. The convergence
has been reached if the variance is constant with time. In the early
stages of the chain the variance changes, indicating that the
convergence has not been reached.
This figure shows that the sampling range varies in the beginning, which indicates that the chain has not converged yet. The second half of the chain has an almost constant variance which is a good indication of convergence.If we convince ourselves that the chain has indeed converged we can plot the likelihood. Here we use corner.py which can be installed with
>> pip install corner
and included with
import corner
fig = corner.corner(mergedsamples, quantiles=[0.16, 0.5, 0.84], plot_density=False,\ show_titles=True, title_fmt=".3f", labels=labels) fig.savefig("%scorner.png" % output_path)
Figure 2: 2D likelihood distributions for the 7 parameters in our fit.
The 1D marginalised likelihood is given on the right together with the
$68\%$ confidence levels.
Finally, we can also print out the constraints with
def get_percentiles(chain, labels): ''' Calculate constraints and uncertainties from MCMC chain ''' per = np.percentile(chain, [50., 15.86555, 84.13445, 2.2775, 97.7225], axis=0) per = [per[0], per[0]-per[1], per[2]-per[0], per[0]-per[3], per[4]-per[0]] for i in range(0, len(per[0])): print("%s = %f +%f -%f +%f -%f" % (labels[i], per[0][i], per[1][i], per[2][i], per[3][i], per[4][i])) return per
>> b = 2.295306 +0.314054 -0.312351 +0.584276 -0.610636
A1 = -0.151418 +0.111546 -0.111835 +0.223292 -0.232080
A2 = 41.676085 +16.889086 -17.505383 +30.571086 -37.981033
A3 = -3306.605503 +1642.375097 -1467.782597 +3487.705208 -2538.709703
A4 = 14835.170765 +5912.070481 -6608.236429 +10122.305967 -14237.721237
A5 = -21141.535952 +9150.798840 -8221.692166 +20105.713945 -14085.031087
alpha = 0.990088 +0.032634 -0.027241 +0.077669 -0.059837
sigmaNL = 11.378695 +2.975023 -4.083891 +5.036540 -7.682274
Now what does that mean in terms of cosmological parameters? Well as mentioned before, the only parameter of interest here is the scaling parameter $\alpha$. This parameter is defined as\[
\alpha = \frac{D_V}{D^{\rm fid}_V}
\]where the the distance $D_V$ is defined as
\[
D_V = \left[(1+z)^2D^2_A\frac{cz}{H(z)}\right]^{1/3}
\]The superscribed $^{\rm fid}$ just means that this is the same quantity in the fiducial cosmology, which is the one we defined in the cosmo class in nbodykit.
The distance $D_V(z)$ is an average of the angular diameter distance $D_A(z)$ and the Hubble parameter $H(z)$
\[
\begin{align}
D_A(z) &= \frac{c}{1+z}\int^z_0\frac{dz'}{H(z')}\\
H(z) &= H_0\left[(\Omega_cdm+\Omega_b)(1+z)^3 + \Omega_{\Lambda}\right]^{1/2}
\end{align}
\]With this we can now turn the constraint on $\alpha$ into a constraint on $D_V$
Hz = cosmo.efunc(zmean)*cosmo.H0*cosmo.h DC = (1.+zmean)*cosmo.angular_diameter_distance(zmean)/cosmo.h c_km_s = (speed_of_light/1000.) DVfid = ( DC**2*(zmean*c_km_s/Hz) )**(1./3.) DV = per[:,6]*DVfid print("DV = %f +%f -%f +%f -%f\n" % (DV[0], DV[1], DV[2], DV[3], DV[4]))
>> DV = 2152.179924 +69.092697 -57.551602 +164.343396 -121.202709
The next step would be to combine this constraint with measurements from the Cosmic Microwave Background (CMB), to get constraints on actual cosmological parameters. But for now, I will leave it here.
Thanks a lot for making it to the end of this series. If you are interested in going further, you should consider a few extensions to this analysis:
- Have a look at density field reconstruction
- Consider an anisotropic analysis to constrain $D_A(z)$ and $H(z)$ independently
- Do you think the polynomial model we use is the best choice? Can you come up with a better model?
- Use the simulated BOSS catalogues to test whether our procedure is robust
The final code for this analysis is available on GitHub.
cheers
Florian