Example 4: A more complex HOD component model

This section of the Tutorial on building an HOD-style model, illustrates a more complex example of a component model that that you have written yourself. What follows is basically just a more full-featured version of the previous Example 3: An HOD-style model with a feature of your own creation that illustrates a few more tricks.

There is also an IPython Notebook in the following location that can be used as a companion to the material in this section of the tutorial:

halotools/docs/notebooks/hod_modeling/hod_modeling_tutorial4.ipynb

By following this tutorial together with this notebook, you can play around with your own variations of the models we’ll build as you learn the basic syntax. The notebook also covers supplementary material that you may find clarifying, so we recommend that you read the notebook side by side with this documentation.

Overview of the Example 4 HOD model

The model we’ll build will be based on the zheng07 HOD, and we will use the baseline_model_instance feature described in the Model-building syntax candy: the baseline_model_instance mechanism section of the documentation. In addition to the basic zheng07 features, we’ll add a component model that governs galaxy shape. Again, our model will not be physically motivated, but we will introduce some implementation complexity to teach you how to build models with sophisticated features.

In this simple model, galaxy shape is characterized by two properties: an axis_ratio, and whether or not the galaxy is disrupted. Galaxies living in halos above some critical mass are never disrupted; galaxies living in halos below this mass have a random chance of being disrupted. Disrupted galaxies are assigned a random axis_ratio; non-disrupted galaxies all have axis_ratio = 0.3. Both the critical mass and the random disruption chance are continuously variable parameters of the model. The model applies to both centrals and satellites, for which these parameters are independently specified.

Source code for the new model

class Shape(object):

    def __init__(self, gal_type, prim_haloprop_key):

        self.gal_type = gal_type
        self._mock_generation_calling_sequence = (
            ['assign_disrupted', 'assign_axis_ratio'])
        self._galprop_dtypes_to_allocate = np.dtype(
            [('axis_ratio', 'f4'), ('disrupted', bool)])
        self.list_of_haloprops_needed = ['halo_spin']

        self.prim_haloprop_key = prim_haloprop_key
        self._methods_to_inherit = (
            ['assign_disrupted', 'assign_axis_ratio',
            'disrupted_fraction_vs_halo_mass'])
        self.param_dict = ({
            'max_disruption_mass_'+self.gal_type: 1e12,
            'disrupted_fraction_'+self.gal_type: 0.25})

    def assign_disrupted(self, **kwargs):
        if 'table' in kwargs.keys():
            table = kwargs['table']
            halo_mass = table[self.prim_haloprop_key]
        else:
            halo_mass = kwargs['prim_haloprop']

        disrupted_fraction = self.disrupted_fraction_vs_halo_mass(halo_mass)
        randomizer = np.random.uniform(0, 1, len(halo_mass))
        is_disrupted = randomizer < disrupted_fraction

        if 'table' in kwargs.keys():
            table['disrupted'][:] = is_disrupted
        else:
            return is_disrupted

    def assign_axis_ratio(self, **kwargs):
        table = kwargs['table']
        mask = table['disrupted'] == True
        num_disrupted = len(table['disrupted'][mask])
        table['axis_ratio'][mask] = np.random.random(num_disrupted)
        table['axis_ratio'][~mask] = 0.3

    def disrupted_fraction_vs_halo_mass(self, mass):
        bool_mask = mass > self.param_dict['max_disruption_mass_'+self.gal_type]
        val = self.param_dict['disrupted_fraction_'+self.gal_type]
        return np.where(bool_mask == True, 0, val)

You incorporate this new component into a composite model in the same way as before:

cen_shape = Shape('centrals', 'halo_mvir')
sat_shape = Shape('satellites', 'halo_m200b')
from halotools.empirical_models import PrebuiltHodModelFactory, HodModelFactory
zheng_model = PrebuiltHodModelFactory('zheng07')
new_model = HodModelFactory(baseline_model_instance = zheng_model,
    centrals_shape = cen_shape, satellites_shape = sat_shape)

The __init__ method of the component model

The first four lines in the __init__ method should be familiar from Example 3: An HOD-style model with a feature of your own creation. In this example, there will be two different physics functions called and two different galaxy properties assigned. The assign_disrupted method must be called before the assign_axis_ratio method, because the axis_ratio has an explicit dependence upon disruption designation, as described in Overview of the Example 4 HOD model.

The role of prim_haloprop_key

Next, by allowing you to build a model by passing in prim_haloprop_key, you open up the possibility of building models based on different choices for the halo mass definition. This trick is used ubiquitously throughout the Halotools code base. When the HodModelFactory detects that a prim_haloprop_key attribute is present in a component model, the string bound to that attribute is automatically added to the list_of_haloprops_needed.

The role of _methods_to_inherit

The next new feature that appears in the __init__ method is the _methods_to_inherit attribute. This list controls what methods the composite model will inherit from the component model. If you do not specify this list (we did not specify it in the previous example), then the HodModelFactory will assume that the only methods you want your composite model to inherit are the methods appearing in _mock_generation_calling_sequence. However, our Shape model has an interesting ancillary function disrupted_fraction_vs_halo_mass that we may wish to study on its own, even if only to make plots (see the accompanying IPython Notebook for a demonstration). This is enabled by adding this method name to the _methods_to_inherit list. Note that if you do choose to define this list inside __init__, it is required that every method name appearing in the _mock_generation_calling_sequence also appears in _methods_to_inherit, or Halotools will raise an exception.

The role of param_dict

As described in the The param_dict mechanism section of the Composite Model Bookkeeping Mechanisms documentation page, the param_dict mechanism allows you to control the behavior of your model with tunable parameters, as is done, for example, in an MCMC-type likelihood analysis. By defining our physics functions to depend on the values stored in the component model param_dict, we can modify the behavior of our component model instance by changing the values stored in this dictionary.

One detail to pay special attention to is how the keys of this dictionary are defined in this example. Note that each key contains a substring that is set by the gal_type. In this example, we do this for a specific reason. The shape of both centrals and satellites are controlled by instances of the Shape component model. And the composite model param_dict is built simply by concatenating the key:value pairs of each component model param_dict. So when the composite model is built, if there is not some way to differentiate between the parameter names belonging to the shape component of centrals vs. satellites, then there is no way to independently modify one set of parameters vs. the other. By defining names for the keys of the component model param_dict that vary with the gal_type, as we have done here, this issue is resolved because the composite model will have four parameters with unambiguous interpretations: max_disruption_mass_centrals, max_disruption_mass_satellites, disrupted_fraction_centrals and disrupted_fraction_satellites. If we were using entirely separate classes with entirely dissimilar parameter names to define shapes of centrals and satellites, this precaution would be unnecessary.

The “physics functions” of the component model

The physics functions in the Shape class differ from the one covered in the Size class of the previous example in one important respect: the disrupted column of the galaxy_table must be assigned before the assign_axis_ratio method is called in order to get sensible results. This is guaranteed by proper use of the _mock_generation_calling_sequence, as described above.

There is another, less important difference to notice: the assign_disrupted method accepts either a table keyword argument or a prim_haloprop argument. As discussed in the previous example, it is compulsory for any physics function to accept a table keyword argument, and for the appropriate columns of the input table to be overwritten as necessary. That is because the HodMockFactory always passes a galaxy_table to each physics function via the table keyword argument. However, nothing stops you from adding functionality to your physics functions so that they will behave differently in different circumstances. In this example, your composite model will have the freedom to pass in an array of halo masses and the assign_disrupted method will return the array of values it would have assigned had the method been called during mock generation. This functionality can be useful for making plots, tracking down bugs, or just generally studying the behavior of your component model. All Halotools-provided component models support passing in arrays of the relevant quantities for exactly this purpose. In Halotools, the convention is that these arrays are passed in via the prim_haloprop keyword, though you need not follow this convention if you find some other syntax more convenient or intuitive.

This tutorial continues with Example 5: An HOD model with cross-component dependencies.