Writing your own HOD occupation component

As discussed in Example 1: Building a simple HOD-style model, the component model governing the abundance of galaxies in a halo is a required component in all HOD-style models. This section of the documentation describes how you can write your own component model governing HOD-style occupation statistics in a way that will interface properly with the HodMockFactory.

Preliminary comments

Unlike writing component models for other kinds of features of the galaxy population, when writing an occupation component you must subclass from the OccupationComponent class. The reason for this requirement is simple. The occupation component is plays a special role in the Halotools implementation of HOD-style mock making because the result returned by this component determine how much memory should be allocated for the galaxy population being modeled. This is unlike typical component model features, which simply add new columns to an existing galaxy_table whose length has already been determined. Thus there are a few special methods and attributes that the HodMockFactory assumes will be present of any occupation component.

The way Halotools enforces the presence of these special methods and attributes is as follows. When building an HOD-style model, for every gal_type in the composite model you are required to pass in a keyword of the form gal_type_occupation. Any time the HodModelFactory detects a keyword that concludes in _occupation, the model factory requires that the object bound to this keyword be an instance of a sub-class of OccupationComponent. If you are able to write your occupation component model and successfully instantiate it, then the internals of the OccupationComponent guarantee that you have successfully met the necessary requirements.

In the next section, we’ll dive into the source code of a minimal example of source code for a working occupation component model and dissect each necessary ingredient one by one.

Source code for a minimal example of an occupation component model

from halotools.empirical_models import OccupationComponent

class MyCentralOccupation(OccupationComponent):

        def __init__(self, gal_type, threshold):

                        gal_type = gal_type,
                        threshold = threshold,
                        upper_occupation_bound = 1)

        def mean_occupation(self, **kwargs):
                table = kwargs['table']
                return np.zeros(len(table)) + 0.1

Unpacking the source code

The MyCentralOccupation class does just three things:

  1. Sub-classes from OccupationComponent

  2. Calls __init__ of the OccupationComponent parent class from within its own constructor.

  3. Defines a mean_occupation method.

So long as your occupation component model does these three things, then your model will work with the HodMockFactory.

Defining the mean_occupation method

As described in the The “physics function” of your component model section of the Example 3: An HOD-style model with a feature of your own creation documentation page, your mean_occupation method must accept a table keyword argument. The value it returns should be an array of the same length as the input table, and it should be bounded by the upper_occupation_bound (more about this later in this page). So long as your mean_occupation function conforms to these requirements, you have written an acceptable function.

Defining the __init__ constructor

You are free to write your own __init__ function however you like, so long as you call the __init__ function of the OccupationComponent class somewhere within it. The OccupationComponent constructor has the following required keyword arguments: gal_type, threshold and upper_occupation_bound. The gal_type can be any string, though this should be chosen to match the name of the population you are modeling in the composite model. The threshold argument refers to the minimum value of the primary galaxy property, e.g., -20 if the galaxy sample were selected to be brighter than some limiting r-band absolute magnitude, or 10.5 for \({\rm log}_{10}M_{\ast}\). The value you choose for threshold need not necessarily have any impact on the behavior of your model, it just needs to be present.

Setting the value of the upper_occupation_bound

The value you choose for upper_occupation_bound has important significance for the behavior of your model. When defining an occupation component, the actual function called during the mock population sequence is mc_occupation, which is defined in the OccupationComponent super-class. If you set upper_occupation_bound to 1, then the distribution of galaxy abundances in your model will be assumed to be a nearest-integer distribution, as is the case for a central galaxy population. If you set upper_occupation_bound to np.inf, then your distribution will be assumed to be a Poisson, as is the case for typical satellite galaxy occupation models. If you choose some other value, then you must provide your own implementation of the mc_occupation method that over-rides the method in the super-class.

Source code for an occupation component model with a custom mc_occupation method

In case you wish to study occupation statistics that deviate from nearest-integer or Poisson statistics, the example below shows how to model a satellite population with a physically-silly-but-distinct mc_occupation method in which a halo has either 0 or 5 satellites. By matching this pattern you can write our own occupation component and have total control over the occupation statistics in your model.

from halotools.empirical_models import OccupationComponent

class MySatelliteOccupation(OccupationComponent):

        def __init__(self, threshold):

                        gal_type = 'satellites',
                        threshold = threshold,
                        upper_occupation_bound = 5)

        def mean_occupation(self, **kwargs):
                table = kwargs['table']
                return np.zeros(len(table)) + 2.5

        def mc_occupation(self, **kwargs):
                table = kwargs['table']
                meanocc = self.mean_occupation(**kwargs)
                result = np.where(meanocc < 2.5, 0, 5)
                table['halo_num_satellites'] = result
                return result