Tutorial on Modeling HOD Satellites Using Subhalo Positions

This tutorial gives a detailed explanation of the SubhaloPhaseSpace class. This class can be used to model the distribution of HOD satellites within their halos by placing the satellites onto the locations of subhalos.

In any implementation of the HOD, there is no direct connection between the number of dark matter subhalos in the halo and the number of satellite galaxies in the halo. Instead, the number of satellites is determined by some parameterized analytical model for \(P(N_{\rm sat} | x_{\rm halo}, y_{\rm halo}, \dots, z_{\rm halo}).\) The distribution \(P(N_{\rm sat})\) is commonly assumed to be a Poisson distribution with a first moment determined by halo mass alone, \(\langle N_{\rm sat} | M_{\rm vir} \rangle.\)

Similarly, the spatial position and velocities of satellites within their halos must also be specified by the HOD. The two most common choices for the intra-halo distributions are

  1. An NFW profile with concentration either determined by
    • some analytical mean relation \(\bar{c}(M_{\rm vir})\)

    • the actual concentration of the simulated halo.

  2. A randomly selected dark matter particle within the simulated halo

The SubhaloPhaseSpace class in Halotools allows a third option: in this class, satellites reside at the centers of subhalos, an empirical modeling feature typically limited to abundance matching models. This tutorial covers the many options associated with this class.

General comment about using subhalos for satellite positions

As shown in Nagai and Kravtsov (2005) and subsequent papers, subhalos selected according to different subhalo properties exhibit different radial distributions. Thus, depending on which subhalos you choose to place your satellites in, you will get different radial distributions of satellites, and also different velocity distributions. Most of the freedom built into the SubhaloPhaseSpace class is designed around allowing you to explore different choices for what we will loosely refer to as the “subhalo selection function”.

Choosing which subhalos are viable options for satellites

Most Halotools-provided halo catalogs come with subhalos in addition to host halos (see Rockstar halo and subhalo nomenclature conventions for details). All the Halotools catalogs with version_name = halotools_v0p4 were made using a cut on \(M_{\rm peak}\) at 300 particles: any subhalo which never had greater than 300 particles in its entire history is thrown out of the catalog. You can always place additional cuts by manipulating the halo_table of a halo catalog prior to populating a mock. When using the SubhaloPhaseSpace class, you should be aware of any cut that was placed on the halo catalog you are using to populate satellites.

When throwing out subhalos based on any criteria, you change the subhalo selection function and influence the profiles of your satellite galaxies. If your science target is sensitive to the phase space occupied by satellite galaxies, it is generally a good idea to test whether your results are sensitive to any cuts that may have been placed on the subhalo catalog you used.

Choosing which subhalos are preferentially populated

In any HOD model, the number of satellites in a given halo is determined by a Monte Carlo realization of \(P(N_{\rm sat} | {\rm halo}),\) which has no necessary connection to the number of subhalos in the halo. Thus for models using SubhaloPhaseSpace, it will be possible for host halos to have empty subhalos. The SubhaloPhaseSpace class gives you freedom to choose which subhalos in each halo are populated first. This is accomplished in a pre-processing phase of the mock population. Within each host, you have the option to sort subhalos according to one of their properties; once \(N_{\rm sat}\) is determined by the halo occupation model, the first \(N_{\rm sat}\) subhalos in the list will be selected to host the satellites.

The order in which subhalos are selected is determined by a combination of the intra_halo_sorting_key and reverse_intra_halo_order keyword arguments to the SubhaloPhaseSpace class. The default option is to have intra_halo_sorting_key = halo_mpeak with reverse_intra_halo_order = True; for this choice, subhalos with the largest peak masses will be preferentially selected to host satellite galaxies. If the reverse_intra_halo_order were instead False, then subhalos with the smallest peak masses will be selected first. The intra_halo_sorting_key can be used with any column appearing in the halo catalog.

Choosing how to deal with not having enough subhalos

Because \(N_{\rm sat}\) and \(N_{\rm sub}\) are disconnected, it is possible that in some halos, your HOD model will require more satellite galaxies than the number of subhalos in the halo. For halo catalogs with Bolshoi-like resolution and galaxy samples with SDSS-like number densities, this does not happen often, roughly 1-2% of the time. To deal with this situation, the SubhaloPhaseSpace class randomly selects a subhalo in some other host of a similar mass, and uses the host-centric distance of that subhalo for the host-centric distance of the satellite.

In a little more detail, during the pre-processing phase of mock population, host halos are initially binned according to the binning_key column of the halo catalog; by default, this binning is done on the halo_mvir_host_halo property. The bins themselves are defined by the host_haloprop_bins argument. Care must be taken by the user to ensure that 1. no bins are empty, 2. the bins encompass the range of the property spanned by the host halos in the catalog, and 3. the bins are sufficiently narrow so that the true host mass-dependence of the radial profiles is captured. Halotools will raise an exception if an unacceptable choice is made.

Inheriting additional subhalo properties besides position and velocity

For each satellite galaxy, once its parent subhalo is selected, the position and velocity of that subhalo are assigned to the satellite. As an additional modeling feature, you may choose to inherit additional subhalo properties besides just the phase space coordinates. For example, you may wish to also inherit the subhalo formation time to implement an age matching-like model, or the subhalo spin for morphology modeling.

The additional properties you choose to inherit are specified by the inherited_subhalo_props_dict argument. Each key of the inherited_subhalo_props_dict dictionary gives the name of a column in the subhalo_table that you wish to inherit. The value bound to each key is a tuple of two strings. The first string specifies the name you would like to give the inherited property in the galaxy_table. The second string specifies the data type of the column, e.g., ‘f4’ or ‘i8’.

This is most easily understood by looking at a specific example. The default dictionary is as follows:

default_inherited_subhalo_props_dict = (
    {'halo_id': ('halo_id', 'i8'),
    'halo_x': ('x', 'f8'),
    'halo_y': ('y', 'f8'),
    'halo_z': ('z', 'f8'),
    'halo_vx': ('vx', 'f8'),
    'halo_vy': ('vy', 'f8'),
    'halo_vz': ('vz', 'f8'),
    'halo_mpeak': ('halo_mpeak', 'f8')})

You can import this dictionary directly from the empirical_models sub-package:

>>> from halotools.empirical_models import default_inherited_subhalo_props_dict

This way, you can supplement default_inherited_subhalo_props_dict with whatever additional properties you are interested in, and pass the result to inherited_subhalo_props_dict. For example, if you wanted to inherit the spin of your subhalos:

>>> from halotools.empirical_models import default_inherited_subhalo_props_dict
>>> default_inherited_subhalo_props_dict['halo_spin'] = ('halo_spin', 'f8')

A worked example

Let’s suppose we’d like to start with the leauthaud11 HOD model as our baseline; this model assumes satellites are distributed according to an NFW profile. What we’ll do next is show how to build alternative models that are identical in every respect to leauthaud11, except that satellites are placed on subhalos instead of tracing an NFW profile.

from halotools.empirical_models import PrebuiltHodModelFactory
orig_model = PrebuiltHodModelFactory('leauthaud11')

from halotools.empirical_models import SubhaloPhaseSpace
alt_profile_model = SubhaloPhaseSpace('satellites', np.logspace(10.5, 15.2, 15))

from halotools.empirical_models import HodModelFactory

new_model_dictionary = orig_model.model_dictionary
new_model_dictionary['satellites_profile'] = alt_profile_model
new_model = HodModelFactory(**new_model_dictionary)

The new_model can be used to populate mock catalogs just like any Halotools composite model:

from halotools.sim_manager import CachedHaloCatalog
halocat = CachedHaloCatalog(simname='bolplanck')
new_model.populate_mock(halocat, seed=43)
orig_model.populate_mock(halocat, seed=43)

Since we called populate_mock by passing the seed keyword, all Monte Carlo functions have deterministic behavior and so the occupation statistics of these two realizations are identical:

print(len(new_model.mock.galaxy_table), len(orig_model.mock.galaxy_table))
(49098, 49098)

In addition to the usual halo_table column names, when using SubhaloPhaseSpace there is an an additional column called real_subhalo. For satellites that sit on a subhalo that is within their actual host halo, this column is True. Otherwise, this column is False (n.b., a peculiarity of the implementation is that the real_subhalo column is also False for centrals). For reference, in this particular realization, only ~1.3% of satellites have this column equal to True.