Example 4: A more complex HOD component model¶
This section of the Tutorial on building an HOD-style model, illustrates a more complex example of a component model that that you have written yourself. What follows is basically just a more full-featured version of the previous Example 3: An HOD-style model with a feature of your own creation that illustrates a few more tricks.
There is also an IPython Notebook in the following location that can be used as a companion to the material in this section of the tutorial:
halotools/docs/notebooks/hod_modeling/hod_modeling_tutorial4.ipynb
By following this tutorial together with this notebook, you can play around with your own variations of the models we’ll build as you learn the basic syntax. The notebook also covers supplementary material that you may find clarifying, so we recommend that you read the notebook side by side with this documentation.
Overview of the Example 4 HOD model¶
The model we’ll build will be based on the zheng07
HOD,
and we will use the baseline_model_instance
feature
described in the Model-building syntax candy: the baseline_model_instance mechanism
section of the documentation.
In addition to the basic zheng07
features, we’ll add
a component model that governs galaxy shape.
Again, our model will not be physically motivated, but we will
introduce some implementation complexity to teach you how to
build models with sophisticated features.
In this simple model, galaxy shape is characterized by two properties: an axis_ratio, and whether or not the galaxy is disrupted. Galaxies living in halos above some critical mass are never disrupted; galaxies living in halos below this mass have a random chance of being disrupted. Disrupted galaxies are assigned a random axis_ratio; non-disrupted galaxies all have axis_ratio = 0.3. Both the critical mass and the random disruption chance are continuously variable parameters of the model. The model applies to both centrals and satellites, for which these parameters are independently specified.
Source code for the new model¶
class Shape(object):
def __init__(self, gal_type, prim_haloprop_key):
self.gal_type = gal_type
self._mock_generation_calling_sequence = (
['assign_disrupted', 'assign_axis_ratio'])
self._galprop_dtypes_to_allocate = np.dtype(
[('axis_ratio', 'f4'), ('disrupted', bool)])
self.list_of_haloprops_needed = ['halo_spin']
self.prim_haloprop_key = prim_haloprop_key
self._methods_to_inherit = (
['assign_disrupted', 'assign_axis_ratio',
'disrupted_fraction_vs_halo_mass'])
self.param_dict = ({
'max_disruption_mass_'+self.gal_type: 1e12,
'disrupted_fraction_'+self.gal_type: 0.25})
def assign_disrupted(self, **kwargs):
if 'table' in kwargs.keys():
table = kwargs['table']
halo_mass = table[self.prim_haloprop_key]
else:
halo_mass = kwargs['prim_haloprop']
disrupted_fraction = self.disrupted_fraction_vs_halo_mass(halo_mass)
randomizer = np.random.uniform(0, 1, len(halo_mass))
is_disrupted = randomizer < disrupted_fraction
if 'table' in kwargs.keys():
table['disrupted'][:] = is_disrupted
else:
return is_disrupted
def assign_axis_ratio(self, **kwargs):
table = kwargs['table']
mask = table['disrupted'] == True
num_disrupted = len(table['disrupted'][mask])
table['axis_ratio'][mask] = np.random.random(num_disrupted)
table['axis_ratio'][~mask] = 0.3
def disrupted_fraction_vs_halo_mass(self, mass):
bool_mask = mass > self.param_dict['max_disruption_mass_'+self.gal_type]
val = self.param_dict['disrupted_fraction_'+self.gal_type]
return np.where(bool_mask == True, 0, val)
You incorporate this new component into a composite model in the same way as before:
cen_shape = Shape('centrals', 'halo_mvir')
sat_shape = Shape('satellites', 'halo_m200b')
from halotools.empirical_models import PrebuiltHodModelFactory, HodModelFactory
zheng_model = PrebuiltHodModelFactory('zheng07')
new_model = HodModelFactory(baseline_model_instance = zheng_model,
centrals_shape = cen_shape, satellites_shape = sat_shape)
The __init__ method of the component model¶
The first four lines in the __init__ method should be familiar from Example 3: An HOD-style model with a feature of your own creation. In this example, there will be two different physics functions called and two different galaxy properties assigned. The assign_disrupted method must be called before the assign_axis_ratio method, because the axis_ratio has an explicit dependence upon disruption designation, as described in Overview of the Example 4 HOD model.
The role of prim_haloprop_key
¶
Next, by allowing you to build a model by passing in prim_haloprop_key,
you open up the possibility of building models based on different
choices for the halo mass definition. This trick is used ubiquitously throughout the
Halotools code base. When the HodModelFactory
detects that a
prim_haloprop_key attribute is present in a component model, the string
bound to that attribute is automatically added to the list_of_haloprops_needed
.
The role of _methods_to_inherit
¶
The next new feature that appears in the __init__ method is the
_methods_to_inherit
attribute. This list controls what methods the composite
model will inherit from the component model. If you do not specify this list
(we did not specify it in the previous example), then the HodModelFactory
will assume that the only methods you want your composite model to inherit
are the methods appearing in _mock_generation_calling_sequence
.
However, our Shape model has an interesting ancillary function
disrupted_fraction_vs_halo_mass that we may wish to study on its own,
even if only to make plots (see the accompanying IPython Notebook for a demonstration).
This is enabled by adding this method name to the
_methods_to_inherit
list. Note that if you do choose to define this list
inside __init__, it is required that every method name appearing in
the _mock_generation_calling_sequence
also appears in _methods_to_inherit
,
or Halotools will raise an exception.
The role of param_dict
¶
As described in the The param_dict mechanism section of the
Composite Model Bookkeeping Mechanisms documentation page,
the param_dict
mechanism allows you to control the behavior
of your model with tunable parameters, as is done, for example,
in an MCMC-type likelihood analysis. By defining our physics functions
to depend on the values stored in the component model param_dict
,
we can modify the behavior of our component model instance by
changing the values stored in this dictionary.
One detail to pay special attention to is how the keys of this dictionary
are defined in this example. Note that each key contains a substring that
is set by the gal_type
. In this example, we do this for a specific reason.
The shape of both centrals and satellites are controlled by instances of the
Shape component model. And the composite model param_dict
is built simply
by concatenating the key:value pairs of each component model param_dict
.
So when the composite model is built, if there is not some way to differentiate
between the parameter names belonging to the shape component of centrals vs.
satellites, then there is no way to independently modify one set of parameters
vs. the other. By defining names for the keys of the component model param_dict
that vary with the gal_type
, as we have done here, this issue is resolved
because the composite model will have four parameters with unambiguous interpretations:
max_disruption_mass_centrals
, max_disruption_mass_satellites
,
disrupted_fraction_centrals
and disrupted_fraction_satellites
.
If we were using entirely separate classes
with entirely dissimilar parameter names
to define shapes of centrals and satellites,
this precaution would be unnecessary.
The “physics functions” of the component model¶
The physics functions in the Shape class differ from the one covered in the
Size class of the previous example in one important respect:
the disrupted column of the galaxy_table
must be assigned before
the assign_axis_ratio method is called in order to get sensible results.
This is guaranteed by proper use of the _mock_generation_calling_sequence
,
as described above.
There is another, less important difference to notice: the
assign_disrupted method accepts either a table
keyword argument
or a prim_haloprop
argument. As discussed in the previous example,
it is compulsory for any physics function to accept a table
keyword argument,
and for the appropriate columns of the input table
to be overwritten as necessary.
That is because the HodMockFactory
always passes a galaxy_table
to each physics
function via the table
keyword argument. However, nothing stops you from
adding functionality to your physics functions so that they will behave differently
in different circumstances. In this example, your composite model will have the freedom
to pass in an array of halo masses and the assign_disrupted method will
return the array of values it would have assigned had the method been called during
mock generation. This functionality can be useful for making plots, tracking down bugs,
or just generally studying the behavior of your component model.
All Halotools-provided component models support passing in arrays of the relevant
quantities for exactly this purpose. In Halotools, the convention is that these arrays
are passed in via the prim_haloprop
keyword, though you need not follow
this convention if you find some other syntax more convenient or intuitive.
This tutorial continues with Example 5: An HOD model with cross-component dependencies.