# Galaxy Catalog Analysis Example: Galaxy properties as a function of halo mass¶

In this example, we’ll show how to start from a sample of mock galaxies and calculate how various galaxies properties scale with halo mass. In particular, we’ll calculate the average total stellar mass, $$\langle M_{\ast}^{\rm tot}\vert M_{\rm halo}\rangle$$, and also the average quiescent fraction for centrals and satellites, $$\langle F_{\rm cen}^{\rm q}\vert M_{\rm halo}\rangle$$ and $$\langle F_{\rm sat}^{\rm q}\vert M_{\rm halo}\rangle$$.

There is also an IPython Notebook in the following location that can be used as a companion to the material in this section of the tutorial:

halotools/docs/notebooks/galcat_analysis/basic_examples/galaxy_catalog_analysis_tutorial1.ipynb

By following this tutorial together with this notebook, you can play around with your own variations of the calculation as you learn the basic syntax.

## Generate a mock galaxy catalog¶

Let’s start out by generating a mock galaxy catalog into an N-body simulation in the usual way. Here we’ll assume you have the z=0 rockstar halos for the bolshoi simulation, as this is the default halo catalog.

from halotools.empirical_models import PrebuiltSubhaloModelFactory
model = PrebuiltSubhaloModelFactory('smhm_binary_sfr')
from halotools.sim_manager import CachedHaloCatalog
halocat = CachedHaloCatalog(simname = 'bolshoi', redshift = 0, halo_finder = 'rockstar')
model.populate_mock(halocat)


Now suppose the data we are interested in is complete for $$M_{\ast} > 10^{10}M_{\odot},$$ so we will make a cut on the mock. Our mock galaxies are stored in the galaxy_table of model.mock in the form of an Astropy Table.

sample_mask = model.mock.galaxy_table['stellar_mass'] > 1e10


## Calculate total stellar mass $$M_{\ast}^{\rm tot}$$ in each halo¶

To calculate the total stellar mass of galaxies in each halo, we’ll use the halotools.utils.group_member_generator. You can read more about the details of that function in its documentation, here we’ll just demo some basic usage. Briefly, this generator can be used to iterate over your galaxy population on a group-by-group basis, so that you can perform group-wise calculations with each step of the iteration. In this case, at each step of the iteration we’ll sum up the stellar masses of each group’s members.

The halo_hostid is a natural grouping key for a galaxy catalog whose host halos are known. So we’ll sort our galaxy catalog on this column, notify the group_member_generator that we have done so by passing in halo_hostid as the grouping_key, and then request that the generator yield the data stored in the stellar_mass column at each iteration. The we’ll loop over the generator, calculate the sum of the yielded stellar mass column, and broadcast the result to each group member.

from halotools.utils import group_member_generator

gals.sort('halo_hostid')
grouping_key = 'halo_hostid'
requested_columns = ['stellar_mass']
group_gen = group_member_generator(gals, grouping_key, requested_columns)

total_stellar_mass = np.zeros(len(gals))
for first, last, member_props in group_gen:
stellar_mass_of_members = member_props
total_stellar_mass[first:last] = sum(stellar_mass_of_members)

gals['halo_total_stellar_mass'] = total_stellar_mass


Our gals table now has a halo_total_stellar_mass column.

## Calculate host halo mass $$M_{\rm host}$$ of each galaxy¶

Now we’ll do a very similar calculation, but instead broadcasting the host halo mass to each halo’s members. Here we’ll perform a two-property sort. By sorting first on halo_hostid and then on halo_upid, what we accomplish is that galaxies sharing a common halo are grouped together, and then within each grouping the true central galaxy appears first in the sequence. This means that when the group_member_generator yields arrays to us, we can retrieve the property of the central galaxy via the first element in each yielded array. In this next calculation, we’ll exploit that information to broadcast the host halo virial mass to all members of the halo.

gals.sort(['halo_hostid', 'halo_upid'])
grouping_key = 'halo_hostid'
requested_columns = ['halo_mvir']
group_gen = group_member_generator(gals, grouping_key, requested_columns)

host_mass = np.zeros(len(gals))
for first, last, member_props in group_gen:
mvir_members = member_props
mvir_host = mvir_members
host_mass[first:last] = mvir_host

gals['halo_mhost'] = host_mass


Our gals table now has a halo_mhost column.

## Calculate $$\langle M_{\ast}^{\rm tot}\rangle$$ vs. $$M_{\rm halo}$$¶

Now we’ll exploit our previous calculations to compute the mean total stellar mass in bins of halo mass. For this calculation, the mean_y_vs_x provides useful wrapper behavior around scipy.stats.binned_statistic and numpy.histogram. Note that mean_y_vs_x is really just a convenience function used for quick exploratory work. For results going into science publications, be sure to check how your findings depend on bin width, sampling, etc.

from halotools.mock_observables import mean_y_vs_x
import numpy as np

bins = np.logspace(12, 15, 25)
result = mean_y_vs_x(gals['halo_mhost'].data,
gals['halo_total_stellar_mass'].data,
bins = bins,
error_estimator = 'variance')

host_mass, mean_stellar_mass, mean_stellar_mass_err = result


### Plot the result¶

from seaborn import plt

plt.errorbar(host_mass, mean_stellar_mass, yerr=mean_stellar_mass_err,
fmt = "none", ecolor='gray')
plt.plot(host_mass, mean_stellar_mass, 'D', color='k')

plt.loglog()
plt.xticks(size=18)
plt.yticks(size=18)
plt.xlabel(r'$M_{\rm halo}/M_{\odot}$', fontsize=25)
plt.ylabel(r'$\langle M_{\ast}^{\rm tot}/M_{\odot}\rangle$', fontsize=25)
plt.ylim(ymax=5e12) ## Quiescent fraction of centrals and satellites¶

In this section we’ll perform a very similar calculation to the above, only here we’ll compute the average quiescent fraction of centrals and satellites.

### Calculate $$\langle F_{\rm q}^{\rm cen}\vert M_{\rm halo} \rangle$$ and $$\langle F_{\rm q}^{\rm sat} \vert M_{\rm halo}\rangle$$¶

In the above calculation, we needed to create new columns for our galaxy catalog, $$M_{\rm host}$$ and $$M_{\ast}^{\rm tot}$$. Here we’ll reuse the $$M_{\rm host}$$ column, and our model already created a boolean-valued quiescent column for our galaxies. So no group iteration is necessary; all we need to do is calculate the average trends as a function of halo mass.

cens_mask = gals['halo_upid'] == -1

bins = np.logspace(12, 14.5, 15)

# centrals
result = mean_y_vs_x(cens['halo_mhost'].data, cens['quiescent'].data,
bins = bins)
host_mass, fq_cens, fq_cens_err_on_mean = result

# satellites
result = mean_y_vs_x(sats['halo_mhost'].data, sats['quiescent'].data,
bins = bins)
host_mass, fq_sats, fq_sats_err_on_mean = result


### Plot the result and compare it to the underlying analytical relation¶

plt.errorbar(host_mass, fq_cens, yerr=fq_cens_err_on_mean,
color='seagreen', fmt = "none")
plt.plot(host_mass, fq_cens, 'D', color='seagreen',
label = 'galaxy population')

analytic_result_mhost_bins = np.logspace(10, 15.5, 100)
analytic_result_mean_quiescent_fraction = model.mean_quiescent_fraction(prim_haloprop = analytic_result_mhost_bins)
plt.plot(analytic_result_mhost_bins,
analytic_result_mean_quiescent_fraction,
color='blue', label = 'analytical model')

plt.xscale('log')
plt.xticks(size=22)
plt.yticks(size=18)
plt.xlabel(r'$M_{\rm halo}/M_{\odot}$', fontsize=25)
plt.ylabel('quiescent fraction', fontsize=20)
plt.xlim(xmin = 1e12, xmax = 1e15)
plt.ylim(ymin = 0.2, ymax=0.8)
plt.legend(frameon=False, loc='best', fontsize=20)
plt.title('Central galaxy quenching: model vs. mock', fontsize=17) This tutorial continues with Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d.