Composite Model Bookkeeping Mechanisms¶
The param_dict
mechanism¶
The component model classes in Halotools determine the functional form of the aspect of the galaxy-halo connection being modeled, but in many cases models have a handful of parameters allows you to tune the behavior of the functions. For example, the Zheng07Cens
class dictates that the average number of central galaxies as a function of halo mass, \(\langle N_{\rm cen}\vert M_{\rm halo}\rangle\), is governed by an erf
function, but the speed of the transition of the erf
function between 0 and 1 can be varied by changing the \(\sigma_{\log M}\) parameter.
In all such cases, parameters such as \(\sigma_{\log M}\) are elements of param_dict
, a python dictionary bound to the component model instance. By changing the values bound to the parameters in the param_dict
, you change the behavior of the model.
Propagating param_dict
from component to composite¶
While creating a composite model from a set of component models, the factory classes SubhaloModelFactory
and HodModelFactory
collect every parameter that appears in each component model param_dict
, and create a new composite param_dict
that is bound to the composite model instance. The way that composite model methods are written, in order to change the behavior of the composite model all you need to do is change the values of the parameters in the param_dict
bound to the composite model and the changes propagate down to the component model defining the behavior.
In most cases this propagation process is unambiguous and straightforwardly accomplished with The update_param_dict_decorator mechanism. However, if two or more component models have a parameter with the exact same name, then care is required.
As an example, consider the composite model dictionary built by the leauthaud11_model_dictionary
function. In this composite model, there are two populations of galaxies, centrals and satellites, whose occupation statistics are governed by Leauthaud11Cens
and Leauthaud11Sats
, respectively. Both of these classes derive much of their behavior from the underlying stellar-to-halo-mass relation of Behroozi et al. (2010), and so all the parameters in the param_dict
of Behroozi10SmHm
appear in both the param_dict
of Leauthaud11Cens
and the param_dict
of Leauthaud11Sats
.
In this example, the repeated appearance of the stellar-to-halo-mass parameters is harmless because these these really are the same parameters that just so happen to appear twice. But since Halotools users are free to define their own model components and compose any arbitrary collection of components together, it is possible that the same name could have been inadvertently given to parameters in different components controlling entirely distinct behavior. In such a case, when that parameter is modified in the composite model param_dict
it is ambiguous how to propagate the change down in to the appropriate component model.
Suppressing harmless warnings¶
To protect against this ambiguity, whenever a repeated parameter is detected during the building of the composite model param_dict
, a warning is issued to the user. It is up to the user to determine whether the repetition is harmless or if one of the component model parameter names needs to be changed to disambiguate. As the appearance of such a warning can be annoying for commonly-used models in which the repetition is harmless, it is possible to suppress this warning by creating a _suppress_repeated_param_warning
attribute to any of the components in the composite model, and setting this attribute to True
. You can see this mechanism at work in the source code of the leauthaud11_model_dictionary
function.
The new_haloprop_func_dict
mechanism¶
The basic job of any component model is to provide some mapping between a halo catalog and some property (or set of properties) of the galaxies that live in those halos. There are many cases where the underlying halo property that is the independent variable in the mapping is not a pre-existing column in your halo catalog. For example, assigning positions to satellite galaxies may depend on an analytical model for the NFW concentration-mass relation. If this particular definition of the concentration is not already in the halo catalog, this quantity would need to be computed for every halo before the satellite positions could be assigned.
There are several possible solutions to this problem. First, the composite model could simply compute the desired halo property on-the-fly as part of the galaxy property assignment. However, if the calculation is expensive then this needlessly adds runtime to mock-population with any composite model that uses this component. Second, you could always add the desired column to the halo table and then over-write the existing halo catalog data file with an updated one that includes the new column. However, for the sake of reproducibility it is best to minimize the number of times a halo catalog is over-written, as keeping track of each over-write quickly becomes a headache and mistakes in that bookkeeping can lead to hidden buggy behavior.
The new_haloprop_func_dict
mechanism helps address this problem. When model factories
build composite models, each component model is examined for the possible presence of a
new_haloprop_func_dict
attribute, to which a python dictionary must be bound.
The keys of this dictionary serve as the names of the new columns that will be added to the halo catalog
in a pre-processing phase of the mock-population algorithm.
The values bound to these keys are python function objects; each function object must accept a length-Nhalos
Astropy Table
or Numpy structured array as input, and it must return a length-Nhalos array as output; the returned array will be the data in the newly created column of the halo catalog.
To take advantage of this mechanism in your component model, the only thing you need to do is create a
new_haloprop_func_dict
attribute somewhere in the __init__
constructor of your component model,
and make sure that the dictionary bound to this attribute conforms to the above specifications. After doing this, you can safely assume that the halo catalog column needed by your component model will be in any halo catalog used to populate mock galaxies with a composite model using your component.
The galprop_dtypes_to_allocate
mechanism¶
Whenever a component model is used during mock population, the mock factory passes a table
keyword
argument to the methods of the component. It is important that the table passed to the function has the necessary columns assumed by the function.
Every component model assigns some property or set of properties to the mock population of galaxies. In mock population, the synthetic galaxy population is stored in the galaxy_table
bound to the mock object. The galaxy_table
is an Astropy Table
object, with columns storing every galaxy property assigned by the composite model. The _galprop_dtypes_to_allocate
mechanism is responsible for creating the necessary columns of the galaxy_table
and making sure they are appropriately formatted.
If you are writing your own model component of any kind, the model factories require that instances of your model have a _galprop_dtypes_to_allocate
attribute. You can meet this specification by assigning any numpy.dtype
object to the _galprop_dtypes_to_allocate
attribute during the __init__
constructor of your componenent model (even if the dtype is empty). See Tutorial on designing your own model of the galaxy-halo connection for many examples.
The model_feature_calling_sequence
mechanism¶
When the mock factories create a synthetic galaxy population, a sequence of methods of the composite model are called in the order determined by the _mock_generation_calling_sequence
list attribute bound to the composite model. For subhalo-based models, this list is determined by SubhaloModelFactory.set_calling_sequence
, whereas for HOD-style models this list is determined by HodModelFactory.set_calling_sequence
.
As described in The mock_generation_calling_sequence mechanism, each component model also has a _mock_generation_calling_sequence
attribute. The composite model sequence is built up as a succession
of the component model sequences. The sequential ordering of component models in this succession is determined by the _model_feature_calling_sequence
attribute, which is set by the build_model_feature_calling_sequence
factory method. Thus the composite model _mock_generation_calling_sequence
is determined according to the following schematic:
composite_model._mock_generation_calling_sequence = []
for component_model_name in composite_model._model_feature_calling_sequence:
component_model = composite_model.model_dictionary[component_model_name]
for method_name in component_model._mock_generation_calling_sequence:
composite_model._mock_generation_calling_sequence.append(method_name)
Thus each component model’s methods are always called one right after the other. The order in which each component model is called upon is determined by the _model_feature_calling_sequence
attribute. The user is free to explicitly specify this sequence via the model_feature_calling_sequence keyword argument passed to the factory constructor. This may be useful for cases where the model for one galaxy property has explicit dependende on another galaxy property defined in an independent model component. If the model_feature_calling_sequence keyword is not passed, the order in which the component models are called should be assumed to be random.
The mock_generation_calling_sequence
mechanism¶
Each component model has a _mock_generation_calling_sequence
attribute storing a list of strings. Each string is the name of a method bound to the component model instance. The order in which these names appear determines the order in which the methods will be called during mock population. This mechanism works together with The model_feature_calling_sequence mechanism to determine the entire sequence of functions that are called when populating a mock.
The update_param_dict_decorator
mechanism¶
As described in The param_dict mechanism, the composite model param_dict
is simply a collection of the parameters in the param_dict
of all the component models. While this collection process is simple, it creates the following problem. The component and composite param_dict
are separate dictionaries, and even though they share keys in common, the keys point to different locations in memory. So if the user decides to change the value bound to a key in the param_dict
of the composite model, this change does nothing at all to the value bound to the corresponding key the component model. And yet, the behavior is entirley governed by the component model, so unless some action is taken to propagate the change from the composite param_dict
to the component param_dict
, then the composite model will not change behavior when its param_dict
is changed.
The ModelFactory.update_param_dict_decorator
addresses this problem. When the model factories inherit the methods of the component models, they actually inherited modified versions of the methods, where the modification comes from decorating the inherited methods with the update_param_dict_decorator
, whose source code appears below:
def update_param_dict_decorator(self, component_model, func_name):
def decorated_func(*args, **kwargs):
# Update the param_dict as necessary
for key in self.param_dict.keys():
if key in component_model.param_dict:
component_model.param_dict[key] = self.param_dict[key]
func = getattr(component_model, func_name)
return func(*args, **kwargs)
return decorated_func
The behavior of the decorated_func
is identical in every way to the input function, except for before calling the
input function, decorated_func
first opens up the component model param_dict
and updates any the value of any key that also appears in the composite model param_dict
.
Note that this mechanism does not automatically and immediately propagate changes in the composite model param_dict
to the component model param_dict
. If you manually change values in the composite model param_dict
, nothing happens to the component model by that action alone. The role of the update_param_dict_decorator
is to accomplish this propagation when it counts: when you actually call the methods of the component model that the composite model actually needs.
The list_of_haloprops_needed
mechanism¶
When the MockFactory
calls upon the component model methods, the only thing that
gets passed to each methods is a table
keyword argument. In almost all cases,
the table bound to this keyword is the galaxy_table
that is in the process of
being generated (see the Galaxy properties assigned prior to the mc_occupation methods section of the
Tutorial on the algorithm for HOD-based mock-making documentation page for the only exception to this rule).
The galaxy_table
differs from the halo_table
in several respects.
In subhalo-based models, they will have the same length, but in HOD-style models
they will generally have different lengths. The galaxy_table
will have columns
associated with mock galaxy properties that the halo_table
generally will not.
For the purpose of this discussion, the most important difference is this:
the ``galaxy_table`` only inherits the columns of the ``halo_table`` that the
composite model tells it to inherit. The list_of_haloprops_needed
is the
mechanism that the composite model exploits to inform the MockFactory
which halo_table
columns should be inherited by the galaxy_table
.
All component models have the option to define a list_of_haloprops_needed
attribute,
a list of strings of halo_table
column names. The model factory collects together all these lists
and forms their union. Any halo_table
column name in this union will be inherited by
the mock galaxy population. Component models need not necessarily define
a list_of_haloprops_needed
attribute. For example, in cases where
multiple component models require the same halo property, only one component need
declare a need for this property. Multiple requests of the same column is always harmless,
but for if you ever choose to include a component model that does not include a
list_of_haloprops_needed
attribute,
the model factory will always raise a (possibly harmless) warning.