.. _galaxy_catalog_analysis_tutorial1: Galaxy Catalog Analysis Example: Galaxy properties as a function of halo mass ======================================================================================== In this example, we'll show how to start from a sample of mock galaxies and calculate how various galaxies properties scale with halo mass. In particular, we'll calculate the average total stellar mass, :math:`\langle M_{\ast}^{\rm tot}\vert M_{\rm halo}\rangle`, and also the average quiescent fraction for centrals and satellites, :math:`\langle F_{\rm cen}^{\rm q}\vert M_{\rm halo}\rangle` and :math:`\langle F_{\rm sat}^{\rm q}\vert M_{\rm halo}\rangle`. There is also an IPython Notebook in the following location that can be used as a companion to the material in this section of the tutorial: **halotools/docs/notebooks/galcat_analysis/basic_examples/galaxy_catalog_analysis_tutorial1.ipynb** By following this tutorial together with this notebook, you can play around with your own variations of the calculation as you learn the basic syntax. Generate a mock galaxy catalog --------------------------------- Let's start out by generating a mock galaxy catalog into an N-body simulation in the usual way. Here we'll assume you have the *z=0* rockstar halos for the bolshoi simulation, as this is the default halo catalog. .. code:: python from halotools.empirical_models import PrebuiltSubhaloModelFactory model = PrebuiltSubhaloModelFactory('smhm_binary_sfr') from halotools.sim_manager import CachedHaloCatalog halocat = CachedHaloCatalog(simname = 'bolshoi', redshift = 0, halo_finder = 'rockstar') model.populate_mock(halocat) Now suppose the data we are interested in is complete for :math:`M_{\ast} > 10^{10}M_{\odot},` so we will make a cut on the mock. Our mock galaxies are stored in the ``galaxy_table`` of ``model.mock`` in the form of an Astropy `~astropy.table.Table`. .. code:: python sample_mask = model.mock.galaxy_table['stellar_mass'] > 1e10 gals = model.mock.galaxy_table[sample_mask] Calculate total stellar mass :math:`M_{\ast}^{\rm tot}` in each halo ------------------------------------------------------------------------------ To calculate the total stellar mass of galaxies in each halo, we'll use the `halotools.utils.group_member_generator`. You can read more about the details of that function in its documentation, here we'll just demo some basic usage. Briefly, this generator can be used to iterate over your galaxy population on a group-by-group basis, so that you can perform group-wise calculations with each step of the iteration. In this case, at each step of the iteration we'll sum up the stellar masses of each group's members. The ``halo_hostid`` is a natural grouping key for a galaxy catalog whose host halos are known. So we'll sort our galaxy catalog on this column, notify the `~halotools.utils.group_member_generator` that we have done so by passing in ``halo_hostid`` as the *grouping_key*, and then request that the generator yield the data stored in the ``stellar_mass`` column at each iteration. The we'll loop over the generator, calculate the sum of the yielded stellar mass column, and broadcast the result to each group member. .. code:: python from halotools.utils import group_member_generator gals.sort('halo_hostid') grouping_key = 'halo_hostid' requested_columns = ['stellar_mass'] group_gen = group_member_generator(gals, grouping_key, requested_columns) total_stellar_mass = np.zeros(len(gals)) for first, last, member_props in group_gen: stellar_mass_of_members = member_props[0] total_stellar_mass[first:last] = sum(stellar_mass_of_members) gals['halo_total_stellar_mass'] = total_stellar_mass Our ``gals`` table now has a ``halo_total_stellar_mass`` column. Calculate host halo mass :math:`M_{\rm host}` of each galaxy ------------------------------------------------------------ Now we'll do a very similar calculation, but instead broadcasting the host halo mass to each halo's members. Here we'll perform a two-property sort. By sorting first on ``halo_hostid`` and then on ``halo_upid``, what we accomplish is that galaxies sharing a common halo are grouped together, and then within each grouping the true central galaxy appears first in the sequence. This means that when the `~halotools.utils.group_member_generator` yields arrays to us, we can retrieve the property of the central galaxy via the first element in each yielded array. In this next calculation, we'll exploit that information to broadcast the host halo virial mass to all members of the halo. .. code:: python gals.sort(['halo_hostid', 'halo_upid']) grouping_key = 'halo_hostid' requested_columns = ['halo_mvir'] group_gen = group_member_generator(gals, grouping_key, requested_columns) host_mass = np.zeros(len(gals)) for first, last, member_props in group_gen: mvir_members = member_props[0] mvir_host = mvir_members[0] host_mass[first:last] = mvir_host gals['halo_mhost'] = host_mass Our ``gals`` table now has a ``halo_mhost`` column. Calculate :math:`\langle M_{\ast}^{\rm tot}\rangle` vs. :math:`M_{\rm halo}` ------------------------------------------------------------------------------------------------- Now we'll exploit our previous calculations to compute the mean total stellar mass in bins of halo mass. For this calculation, the `~halotools.mock_observables.mean_y_vs_x` provides useful wrapper behavior around `scipy.stats.binned_statistic` and `numpy.histogram`. Note that `~halotools.mock_observables.mean_y_vs_x` is really just a convenience function used for quick exploratory work. For results going into science publications, be sure to check how your findings depend on bin width, sampling, etc. .. code:: python from halotools.mock_observables import mean_y_vs_x import numpy as np bins = np.logspace(12, 15, 25) result = mean_y_vs_x(gals['halo_mhost'].data, gals['halo_total_stellar_mass'].data, bins = bins, error_estimator = 'variance') host_mass, mean_stellar_mass, mean_stellar_mass_err = result Plot the result ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python from seaborn import plt plt.errorbar(host_mass, mean_stellar_mass, yerr=mean_stellar_mass_err, fmt = "none", ecolor='gray') plt.plot(host_mass, mean_stellar_mass, 'D', color='k') plt.loglog() plt.xticks(size=18) plt.yticks(size=18) plt.xlabel(r'$M_{\rm halo}/M_{\odot}$', fontsize=25) plt.ylabel(r'$\langle M_{\ast}^{\rm tot}/M_{\odot}\rangle$', fontsize=25) plt.ylim(ymax=5e12) .. image:: output_18_1.png Quiescent fraction of centrals and satellites ---------------------------------------------- In this section we'll perform a very similar calculation to the above, only here we'll compute the average quiescent fraction of centrals and satellites. Calculate :math:`\langle F_{\rm q}^{\rm cen}\vert M_{\rm halo} \rangle` and :math:`\langle F_{\rm q}^{\rm sat} \vert M_{\rm halo}\rangle` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the above calculation, we needed to create new columns for our galaxy catalog, :math:`M_{\rm host}` and :math:`M_{\ast}^{\rm tot}`. Here we'll reuse the :math:`M_{\rm host}` column, and our model already created a boolean-valued ``quiescent`` column for our galaxies. So no group iteration is necessary; all we need to do is calculate the average trends as a function of halo mass. .. code:: python cens_mask = gals['halo_upid'] == -1 cens = gals[cens_mask] sats = gals[~cens_mask] bins = np.logspace(12, 14.5, 15) # centrals result = mean_y_vs_x(cens['halo_mhost'].data, cens['quiescent'].data, bins = bins) host_mass, fq_cens, fq_cens_err_on_mean = result # satellites result = mean_y_vs_x(sats['halo_mhost'].data, sats['quiescent'].data, bins = bins) host_mass, fq_sats, fq_sats_err_on_mean = result Plot the result and compare it to the underlying analytical relation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python plt.errorbar(host_mass, fq_cens, yerr=fq_cens_err_on_mean, color='seagreen', fmt = "none") plt.plot(host_mass, fq_cens, 'D', color='seagreen', label = 'galaxy population') analytic_result_mhost_bins = np.logspace(10, 15.5, 100) analytic_result_mean_quiescent_fraction = model.mean_quiescent_fraction(prim_haloprop = analytic_result_mhost_bins) plt.plot(analytic_result_mhost_bins, analytic_result_mean_quiescent_fraction, color='blue', label = 'analytical model') plt.xscale('log') plt.xticks(size=22) plt.yticks(size=18) plt.xlabel(r'$M_{\rm halo}/M_{\odot}$', fontsize=25) plt.ylabel('quiescent fraction', fontsize=20) plt.xlim(xmin = 1e12, xmax = 1e15) plt.ylim(ymin = 0.2, ymax=0.8) plt.legend(frameon=False, loc='best', fontsize=20) plt.title('Central galaxy quenching: model vs. mock', fontsize=17) .. image:: output_23_1.png This tutorial continues with :ref:`galaxy_catalog_analysis_tutorial2`.