Composite Model Bookkeeping Mechanisms

The param_dict mechanism

The component model classes in Halotools determine the functional form of the aspect of the galaxy-halo connection being modeled, but in many cases models have a handful of parameters allows you to tune the behavior of the functions. For example, the Zheng07Cens class dictates that the average number of central galaxies as a function of halo mass, \(\langle N_{\rm cen}\vert M_{\rm halo}\rangle\), is governed by an erf function, but the speed of the transition of the erf function between 0 and 1 can be varied by changing the \(\sigma_{\log M}\) parameter.

In all such cases, parameters such as \(\sigma_{\log M}\) are elements of param_dict, a python dictionary bound to the component model instance. By changing the values bound to the parameters in the param_dict, you change the behavior of the model.

Propagating param_dict from component to composite

While creating a composite model from a set of component models, the factory classes SubhaloModelFactory and HodModelFactory collect every parameter that appears in each component model param_dict, and create a new composite param_dict that is bound to the composite model instance. The way that composite model methods are written, in order to change the behavior of the composite model all you need to do is change the values of the parameters in the param_dict bound to the composite model and the changes propagate down to the component model defining the behavior.

In most cases this propagation process is unambiguous and straightforwardly accomplished with The update_param_dict_decorator mechanism. However, if two or more component models have a parameter with the exact same name, then care is required.

As an example, consider the composite model dictionary built by the leauthaud11_model_dictionary function. In this composite model, there are two populations of galaxies, centrals and satellites, whose occupation statistics are governed by Leauthaud11Cens and Leauthaud11Sats, respectively. Both of these classes derive much of their behavior from the underlying stellar-to-halo-mass relation of Behroozi et al. (2010), and so all the parameters in the param_dict of Behroozi10SmHm appear in both the param_dict of Leauthaud11Cens and the param_dict of Leauthaud11Sats.

In this example, the repeated appearance of the stellar-to-halo-mass parameters is harmless because these these really are the same parameters that just so happen to appear twice. But since Halotools users are free to define their own model components and compose any arbitrary collection of components together, it is possible that the same name could have been inadvertently given to parameters in different components controlling entirely distinct behavior. In such a case, when that parameter is modified in the composite model param_dict it is ambiguous how to propagate the change down in to the appropriate component model.

Suppressing harmless warnings

To protect against this ambiguity, whenever a repeated parameter is detected during the building of the composite model param_dict, a warning is issued to the user. It is up to the user to determine whether the repetition is harmless or if one of the component model parameter names needs to be changed to disambiguate. As the appearance of such a warning can be annoying for commonly-used models in which the repetition is harmless, it is possible to suppress this warning by creating a _suppress_repeated_param_warning attribute to any of the components in the composite model, and setting this attribute to True. You can see this mechanism at work in the source code of the leauthaud11_model_dictionary function.

The new_haloprop_func_dict mechanism

The basic job of any component model is to provide some mapping between a halo catalog and some property (or set of properties) of the galaxies that live in those halos. There are many cases where the underlying halo property that is the independent variable in the mapping is not a pre-existing column in your halo catalog. For example, assigning positions to satellite galaxies may depend on an analytical model for the NFW concentration-mass relation. If this particular definition of the concentration is not already in the halo catalog, this quantity would need to be computed for every halo before the satellite positions could be assigned.

There are several possible solutions to this problem. First, the composite model could simply compute the desired halo property on-the-fly as part of the galaxy property assignment. However, if the calculation is expensive then this needlessly adds runtime to mock-population with any composite model that uses this component. Second, you could always add the desired column to the halo table and then over-write the existing halo catalog data file with an updated one that includes the new column. However, for the sake of reproducibility it is best to minimize the number of times a halo catalog is over-written, as keeping track of each over-write quickly becomes a headache and mistakes in that bookkeeping can lead to hidden buggy behavior.

The new_haloprop_func_dict mechanism helps address this problem. When model factories build composite models, each component model is examined for the possible presence of a new_haloprop_func_dict attribute, to which a python dictionary must be bound. The keys of this dictionary serve as the names of the new columns that will be added to the halo catalog in a pre-processing phase of the mock-population algorithm. The values bound to these keys are python function objects; each function object must accept a length-Nhalos Astropy Table or Numpy structured array as input, and it must return a length-Nhalos array as output; the returned array will be the data in the newly created column of the halo catalog.

To take advantage of this mechanism in your component model, the only thing you need to do is create a new_haloprop_func_dict attribute somewhere in the __init__ constructor of your component model, and make sure that the dictionary bound to this attribute conforms to the above specifications. After doing this, you can safely assume that the halo catalog column needed by your component model will be in any halo catalog used to populate mock galaxies with a composite model using your component.

The galprop_dtypes_to_allocate mechanism

Whenever a component model is used during mock population, the mock factory passes a table keyword argument to the methods of the component. It is important that the table passed to the function has the necessary columns assumed by the function.

Every component model assigns some property or set of properties to the mock population of galaxies. In mock population, the synthetic galaxy population is stored in the galaxy_table bound to the mock object. The galaxy_table is an Astropy Table object, with columns storing every galaxy property assigned by the composite model. The _galprop_dtypes_to_allocate mechanism is responsible for creating the necessary columns of the galaxy_table and making sure they are appropriately formatted.

If you are writing your own model component of any kind, the model factories require that instances of your model have a _galprop_dtypes_to_allocate attribute. You can meet this specification by assigning any numpy.dtype object to the _galprop_dtypes_to_allocate attribute during the __init__ constructor of your componenent model (even if the dtype is empty). See Tutorial on designing your own model of the galaxy-halo connection for many examples.

The model_feature_calling_sequence mechanism

When the mock factories create a synthetic galaxy population, a sequence of methods of the composite model are called in the order determined by the _mock_generation_calling_sequence list attribute bound to the composite model. For subhalo-based models, this list is determined by SubhaloModelFactory.set_calling_sequence, whereas for HOD-style models this list is determined by HodModelFactory.set_calling_sequence.

As described in The mock_generation_calling_sequence mechanism, each component model also has a _mock_generation_calling_sequence attribute. The composite model sequence is built up as a succession of the component model sequences. The sequential ordering of component models in this succession is determined by the _model_feature_calling_sequence attribute, which is set by the build_model_feature_calling_sequence factory method. Thus the composite model _mock_generation_calling_sequence is determined according to the following schematic:

composite_model._mock_generation_calling_sequence = []
for component_model_name in composite_model._model_feature_calling_sequence:
        component_model = composite_model.model_dictionary[component_model_name]
        for method_name in component_model._mock_generation_calling_sequence:

Thus each component model’s methods are always called one right after the other. The order in which each component model is called upon is determined by the _model_feature_calling_sequence attribute. The user is free to explicitly specify this sequence via the model_feature_calling_sequence keyword argument passed to the factory constructor. This may be useful for cases where the model for one galaxy property has explicit dependende on another galaxy property defined in an independent model component. If the model_feature_calling_sequence keyword is not passed, the order in which the component models are called should be assumed to be random.

The mock_generation_calling_sequence mechanism

Each component model has a _mock_generation_calling_sequence attribute storing a list of strings. Each string is the name of a method bound to the component model instance. The order in which these names appear determines the order in which the methods will be called during mock population. This mechanism works together with The model_feature_calling_sequence mechanism to determine the entire sequence of functions that are called when populating a mock.

The update_param_dict_decorator mechanism

As described in The param_dict mechanism, the composite model param_dict is simply a collection of the parameters in the param_dict of all the component models. While this collection process is simple, it creates the following problem. The component and composite param_dict are separate dictionaries, and even though they share keys in common, the keys point to different locations in memory. So if the user decides to change the value bound to a key in the param_dict of the composite model, this change does nothing at all to the value bound to the corresponding key the component model. And yet, the behavior is entirley governed by the component model, so unless some action is taken to propagate the change from the composite param_dict to the component param_dict, then the composite model will not change behavior when its param_dict is changed.

The ModelFactory.update_param_dict_decorator addresses this problem. When the model factories inherit the methods of the component models, they actually inherited modified versions of the methods, where the modification comes from decorating the inherited methods with the update_param_dict_decorator, whose source code appears below:

def update_param_dict_decorator(self, component_model, func_name):

    def decorated_func(*args, **kwargs):

        # Update the param_dict as necessary
        for key in self.param_dict.keys():
            if key in component_model.param_dict:
                component_model.param_dict[key] = self.param_dict[key]

        func = getattr(component_model, func_name)
        return func(*args, **kwargs)

    return decorated_func

The behavior of the decorated_func is identical in every way to the input function, except for before calling the input function, decorated_func first opens up the component model param_dict and updates any the value of any key that also appears in the composite model param_dict.

Note that this mechanism does not automatically and immediately propagate changes in the composite model param_dict to the component model param_dict. If you manually change values in the composite model param_dict, nothing happens to the component model by that action alone. The role of the update_param_dict_decorator is to accomplish this propagation when it counts: when you actually call the methods of the component model that the composite model actually needs.

The list_of_haloprops_needed mechanism

When the MockFactory calls upon the component model methods, the only thing that gets passed to each methods is a table keyword argument. In almost all cases, the table bound to this keyword is the galaxy_table that is in the process of being generated (see the Galaxy properties assigned prior to the mc_occupation methods section of the Tutorial on the algorithm for HOD-based mock-making documentation page for the only exception to this rule).

The galaxy_table differs from the halo_table in several respects. In subhalo-based models, they will have the same length, but in HOD-style models they will generally have different lengths. The galaxy_table will have columns associated with mock galaxy properties that the halo_table generally will not.

For the purpose of this discussion, the most important difference is this: the ``galaxy_table`` only inherits the columns of the ``halo_table`` that the composite model tells it to inherit. The list_of_haloprops_needed is the mechanism that the composite model exploits to inform the MockFactory which halo_table columns should be inherited by the galaxy_table.

All component models have the option to define a list_of_haloprops_needed attribute, a list of strings of halo_table column names. The model factory collects together all these lists and forms their union. Any halo_table column name in this union will be inherited by the mock galaxy population. Component models need not necessarily define a list_of_haloprops_needed attribute. For example, in cases where multiple component models require the same halo property, only one component need declare a need for this property. Multiple requests of the same column is always harmless, but for if you ever choose to include a component model that does not include a list_of_haloprops_needed attribute, the model factory will always raise a (possibly harmless) warning.