.. _halo_catalog_analysis_tutorial1: Halo Catalog Analysis Example: halo properties as a function of host halo mass ============================================================================== In this example, we'll show how to start from a subhalo catalog and calculate how various properties scale with host halo mass. As a specific example, we'll calculate the average abundance of subhalos as a function of mass, :math:`\langle N_{\rm sub}\vert M_{\rm halo}\rangle`. There is also an IPython Notebook in the following location that can be used as a companion to the material in this section of the tutorial: **halotools/docs/notebooks/halocat_analysis/basic_examples/halo_catalog_analysis_tutorial1.ipynb** By following this tutorial together with this notebook, you can play around with your own variations of the calculation as you learn the basic syntax. Retrieve the default halo catalog --------------------------------- .. code:: python from halotools.sim_manager import CachedHaloCatalog halocat = CachedHaloCatalog() print(halocat.halo_table[0:9]) .. parsed-literal:: halo_vmax_firstacc halo_dmvir_dt_tdyn ... halo_hostid halo_mvir_host_halo ------------------ ------------------ ... ----------- ------------------- 67.3 -5.505 ... 3058439856 2.031e+10 99.91 -9.513 ... 3058439861 4.443e+10 87.86 0.8171 ... 3058439904 9.882e+10 78.43 -1.356 ... 3058439906 3.108e+10 89.69 1.495 ... 3058439907 4.266e+10 118.89 -6.333 ... 3058439910 1.728e+11 123.38 4.487 ... 3058439952 1.867e+11 109.28 -15.28 ... 3058439956 6.897e+10 84.17 -0.2037 ... 3058439985 3.339e+10 The first time you load the halo catalog into memory is slow because the halo table is sorted into a convenient order and a large number of self-consistency checks are performed. Subsequent calls to extract the ``halo_table`` are fast as the catalog is now available in RAM. .. code:: python mask = halocat.halo_table['halo_mpeak'] > 1000*halocat.particle_mass halos = halocat.halo_table[mask] Calculate total number of subhalos :math:`N_{\rm subs}` in each halo -------------------------------------------------------------------- To calculate the total number of subhalos in each host halo, we'll use the `halotools.utils.group_member_generator`. You can read more about the details of the generator in its documentation, here we'll just demo some basic usage. Briefly, this generator can be used to iterate over your halo population on a host-by-host basis, so that you can perform group-wise calculations with each step of the iteration. In this case, at each step of the iteration we'll sum up the total number of subhalos associated with each host. As described in :ref:`rockstar_subhalo_nomenclature`, ``halo_hostid`` is a natural grouping key for a subhalo catalog. So we'll sort our subhalo catalog on this column and notify the `~halotools.utils.group_member_generator` that we have done so by passing in ``halo_hostid`` as the *grouping_key* keyword argument. For this calculation, all we care about is the number of objects within each grouping, so there is no need to request the `~halotools.utils.group_member_generator` to yield any particular data about the subhalos, so we can simply set the *requested_columns* argument to be an empty list. After calling the generator with these arguments, we will then proceed to loop over the generator, calculate the number of subhalos in each host, and at the end store the result in a new column of the halo table. .. code:: python from halotools.utils import group_member_generator halos.sort('halo_hostid') grouping_key = 'halo_hostid' requested_columns = [] group_gen = group_member_generator(halos, grouping_key, requested_columns) nsub = np.zeros(len(halos)) for first, last, member_props in group_gen: nsub[first:last] = last - first - 1 halos['num_subhalos'] = nsub Our ``halos`` table now has a ``num_subhalos`` column. Calculate :math:`\langle N_{\rm sub}\rangle` vs. :math:`M_{\rm halo}` --------------------------------------------------------------------- Now we'll exploit our previous calculations to compute the mean number of subhalos in bins of halo mass. For this calculation, the `~halotools.mock_observables.mean_y_vs_x` provides useful wrapper behavior around `scipy.stats.binned_statistic` and `numpy.histogram`. Note that `~halotools.mock_observables.mean_y_vs_x` is really just a convenience function used for quick exploratory work. For results going into science publications, be sure to check how your findings depend on bin width, sampling, etc. .. code:: python from halotools.mock_observables import mean_y_vs_x import numpy as np hostmask = halos['halo_upid'] == -1 hosts = halos[hostmask] bins = np.logspace(12.5, 14.5, 25) result = mean_y_vs_x(hosts['halo_mvir_host_halo'], hosts['num_subhalos'], bins = bins, error_estimator = 'variance') host_mass, mean_richness, richness_variance = result Plot the result ~~~~~~~~~~~~~~~ .. code:: python from seaborn import plt plt.errorbar(host_mass, mean_richness, yerr=richness_variance, fmt = "none", ecolor='gray') plt.plot(host_mass, mean_richness, 'D', color='k') plt.loglog() plt.xticks(size=18) plt.yticks(size=18) plt.xlabel(r'$M_{\rm halo}/M_{\odot}$', fontsize=25) plt.ylabel(r'$\langle N_{\rm sub}\rangle$', fontsize=25) plt.ylim(ymin = 0.5, ymax=100) plt.xlim(xmin = 1e12, xmax=5e14) .. image:: nsub_vs_hostmass.png