First, we need to download the data. We are going to use the dataset produced by the Baryon Oscillation Spectroscopic Survey (BOSS) collaboration, which represents the largest dataset of its kind to date. You can download all necessary data with the following 2 wget commands, just provide the destination path at the end. This will download a bit less than 3Gb of zipped data, which turns into 8Gb after you unzip it.
wget -N https://data.sdss.org/sas/dr12/boss/lss/galaxy_DR12v5_CMASSLOWZTOT_North.fits.gz -P path/to/folder/ wget -N https://data.sdss.org/sas/dr12/boss/lss/galaxy_DR12v5_CMASSLOWZTOT_South.fits.gz -P path/to/folder/
Now we are ready to go.
We start by reading in the data catalog using the fitsio package
from astropy.io import fits from astropy.table import Table def read_data(filename): """ Read a FITS catalogue and: - convert all columns to native-endian (NumPy 2.x safe) - make RA, DEC, Z, NZ, and WEIGHT_* columns float64 - ensure columns are contiguous - apply the redshift mask 0.5 < Z < 0.75 Returns: filtered_table (Astropy Table), col_names (list) """ with fits.open(filename, memmap=False) as hdul: data = hdul[1].data # FITS_rec t = Table(data) # work in an Astropy Table for easy column ops # 1) Make all columns native-endian and contiguous for name in t.colnames: col = np.asanyarray(t[name]) dt = col.dtype if dt.byteorder not in ('=', '|'): # Convert big-endian to native-endian (NumPy 2.x compatible) native_dt = dt.newbyteorder('=') col = col.byteswap().view(native_dt) col = np.ascontiguousarray(col) t[name] = col # 2) Promote Triumvirate-relevant columns to float64 ("double") to_float64 = ['RA', 'DEC', 'Z', 'NZ', 'WEIGHT_SYSTOT', 'WEIGHT_NOZ', 'WEIGHT_CP', 'WEIGHT_FKP'] for name in to_float64: if name in t.colnames: t[name] = np.ascontiguousarray(np.asarray(t[name], dtype=np.float64)) # 3) Apply redshift mask if 'Z' not in t.colnames: raise KeyError("Column 'Z' not found in file.") redshift_mask = (t['Z'] > 0.5) & (t['Z'] < 0.75) t = t[redshift_mask] print("col_names =", t.colnames) print(len(t['Z'])) return t, t.colnames
col_names = ['RA', 'DEC', 'RUN', 'RERUN', 'CAMCOL', 'FIELD', 'ID', 'ICHUNK', 'IPOLY', 'ISECT', 'FRACPSF', 'EXPFLUX', 'DEVFLUX', 'PSFFLUX', 'MODELFLUX', 'FIBER2FLUX', 'R_DEV', 'EXTINCTION', 'PSF_FWHM', 'AIRMASS', 'SKYFLUX', 'EB_MINUS_V', 'IMAGE_DEPTH', 'IMATCH', 'Z', 'WEIGHT_FKP', 'WEIGHT_CP', 'WEIGHT_NOZ', 'WEIGHT_STAR', 'WEIGHT_SEEING', 'WEIGHT_SYSTOT', 'NZ', 'COMP', 'PLATE', 'FIBERID', 'MJD', 'FINALN', 'TILE', 'SPECTILE', 'ICOLLIDED', 'INGROUP', 'MULTGROUP']
In this tutorial, we will use only a subset of the columns given here. Namely, we will only use the positions of the galaxies given by the angles on the sky in 'RA' (right ascension) and 'DEC' (declination) as well as the redshift 'Z'. The redshift describes how much the light spectrum of the galaxy has been shifted in wavelength due to the expansion of the Universe. Most of the galaxies in this dataset are several billion light-years away from us, meaning the light travelled a significant fraction of the existence of the Universe before reaching us. Galaxies which are further away have a larger redshift and we will use the redshift later, together with the right ascension and declination, to get a 3D position for all galaxies.
Let's explore the dataset a bit before we go any further. We can read in the data
filename = 'galaxy_DR12v5_CMASSLOWZTOT_South.fits' data_south, _ = read_data(filename) filename = 'galaxy_DR12v5_CMASSLOWZTOT_North.fits' data_north, _ = read_data(filename) angular_plot(data_south, data_north)
import matplotlib matplotlib.use('TkAgg') import matplotlib.pyplot as plt def angular_plot(cat1, cat2): ''' Plot the catalogues ''' cat1['RA'] -= 180 c1 = down_sample([cat1['RA'], cat1['DEC']], 2000) cat2['RA'] -= 180 c2 = down_sample([cat2['RA'], cat2['DEC']], 2000) plt.clf() plt.subplot(projection="aitoff") plt.title("BOSS DR12 survey footprint", y=1.1) plt.plot(np.radians(c1[0]), np.radians(c1[1]), '.') plt.plot(np.radians(c2[0]), np.radians(c2[1]), '.') plt.grid(True) plt.show() return
import random def down_sample(in_list, N): n = len(in_list[0]) if N <= n: idx = random.sample(range(n), N) # no replacement else: idx = random.choices(range(n), k=N) # with replacement out_list = [] for cat in in_list: out_list.append([cat[i] for i in idx]) return out_list
![]() |
Figure 1: The sky distribution of galaxies in the BOSS galaxy survey using Aitoff projection.
Only 1000 points for each region are plotted.
|
The northern part has about twice as many galaxies as the southern part and in total this dataset covers about 1 quarter of the sky.
We can also plot the redshift distribution
def redshift_plot(cat1, cat2): ''' Plot the catalogues ''' bins = np.arange(0., 0.9, 0.01) weight = cat1['WEIGHT_SYSTOT'] * (cat1['WEIGHT_NOZ'] + cat1['WEIGHT_CP'] - 1.0) plt.hist(cat1['Z'], weights=weight, bins=bins, alpha=0.4, label='BOSS South') weight = cat2['WEIGHT_SYSTOT'] * (cat2['WEIGHT_NOZ'] + cat2['WEIGHT_CP'] - 1.0) plt.hist(cat2['Z'], weights=weight, bins=bins, alpha=0.4, label='BOSS North') plt.xlabel("redshift") plt.legend(loc=0) plt.show() return
Figure 2: Redshift distribution of the BOSS galaxy sample. Redshift
0.7 corresponds to about 6 000 000 000 (6 billion) light-years. So when the light from
these galaxies was sent out, the Universe was only about 7 billion years old,
while now it is 13 billion years old.
while now it is 13 billion years old.
The next step is to turn the RA, DEC, z system into Cartesian coordinates. However, this transformation depends on the cosmological model. Using the current standard model of cosmology (also called $\Lambda$CDM) we can set up a cosmological framework by fixing a small set of parameters
(1) The amount of baryonic matter ($\Omega_b$)
(2) The amount of cold dark matter ($\Omega0_{cdm}$)
(1) The amount of baryonic matter ($\Omega_b$)
(2) The amount of cold dark matter ($\Omega0_{cdm}$)
(3) The Hubble parameter $h$ (or more precisely the Hubble parameter divided by 100, $h = H/100$)
In principle, there are more free parameters, but these are the ones that matter to this analysis. Instead of fixing those parameters individually, we will use the Planck cosmology published in 2015
The $\Lambda$CDM cosmology here assumes a flat universe with no curvature, which means that the dark energy parameter is automatically fixed through
from mpl_toolkits import mplot3d from astropy import units as u from astropy.coordinates import SkyCoord, Distance from astropy.cosmology import Planck15 def threeD_plot(cat1, cat2): # Get cartesian coordinates using the Planck 2018 cosmology c1 = down_sample([cat1['Z'], cat1['RA'], cat1['DEC']], 2000) dist1 = Planck15.comoving_distance(c1[0]).to_value(u.Mpc) dist1_h = dist1 * Planck15.h # now in Mpc/h catalog1 = {} catalog1['x'] = dist1_h * np.cos(np.radians(c1[2])) * np.cos(np.radians(c1[1])) catalog1['y'] = dist1_h * np.cos(np.radians(c1[2])) * np.sin(np.radians(c1[1])) catalog1['z'] = dist1_h * np.sin(np.radians(c1[2])) c2 = down_sample([cat2['Z'], cat2['RA'], cat2['DEC']], 2000) dist2 = Planck15.comoving_distance(c2[0]).to_value(u.Mpc) dist2_h = dist2 * Planck15.h # now in Mpc/h catalog2 = {} catalog2['x'] = dist2_h * np.cos(np.radians(c2[2])) * np.cos(np.radians(c2[1])) catalog2['y'] = dist2_h * np.cos(np.radians(c2[2])) * np.sin(np.radians(c2[1])) catalog2['z'] = dist2_h * np.sin(np.radians(c2[2])) fig = plt.figure() ax = plt.axes(projection='3d') ax.scatter(catalog1['x'], catalog1['y'], catalog1['z']) ax.scatter(catalog2['x'], catalog2['y'], catalog2['z']) plt.show() return
\[
\Omega_{DarkEnergy} = 1 - \Omega_{cdm} + \Omega_b.
\]These parameters constrain the expansion history of the Universe, including its size and age. The dataset we just read with our Python script is powerful enough to constrain these parameters and, therefore, teach us something about dark matter and dark energy. We will get to that in a later post.
The code above converts the angles (RA, DEC) together with the redshift to a 3D position for the galaxies producing a plot which should look like this
\Omega_{DarkEnergy} = 1 - \Omega_{cdm} + \Omega_b.
\]These parameters constrain the expansion history of the Universe, including its size and age. The dataset we just read with our Python script is powerful enough to constrain these parameters and, therefore, teach us something about dark matter and dark energy. We will get to that in a later post.
The code above converts the angles (RA, DEC) together with the redshift to a 3D position for the galaxies producing a plot which should look like this
Figure 3: 3D distribution of BOSS galaxies. Only 1000 randomly selected
galaxies out of more than 1 million in this dataset are included in this plot.
Milky Way is located in the middle and the galaxies in the northern (yellow)
and southern (blue) part are observed in two opposite directions.
And this concludes this first look at the BOSS dataset. This dataset was released in 2016 and represents cutting edge cosmological research. How to use this dataset to constrain dark matter and dark energy will be the subject of the next blog posts.
The entire Python code for this project is available on GitHub.
The entire Python code for this project is available on GitHub.
cheers
Florian
Florian



Just a small typo, "from nbodykit.lab" should probably be "import nbodykit.lab". Also for some strange reason, something in nbodykit is raising strange errors if you try to run this in a Jupyter notebook. Do you understand why?
ReplyDeleteHuge Thanks for this site, it is very useful to the begin the studies with cosmology.
ReplyDelete