HodModelFactory

class halotools.empirical_models.HodModelFactory(**kwargs)[source]

Bases: ModelFactory

Class used to build HOD-style models of the galaxy-halo connection.

See Tutorial on building an HOD-style model for an in-depth description of how to build HOD models, demonstrated by a sequence of increasingly complex examples. If you do not wish to build your own model but want to use one provided by Halotools, instead see PrebuiltHodModelFactory.

All HOD-style composite models can directly populate catalogs of dark matter halos. For an in-depth description of how Halotools implements this mock-generation, see Tutorial on the algorithm for HOD-based mock-making.

The arguments passed to the HodModelFactory constructor determine the features of the model that are returned by the factory. This works in one of two ways, both of which have explicit examples provided below.

  1. Building a new model from scratch.

You can build a model from scratch by passing in a sequence of model_features, each of which are instances of component models. The factory then composes these independently-defined components into a composite model.

  1. Building a new model from an existing model.

It is also possible to add/swap new features to a previously built composite model instance, allowing you to create new models from existing ones. To do this, you pass in a baseline_model_instance and any set of model_features. Any model_feature keyword that matches a feature name of the baseline_model_instance will replace that feature in the baseline_model_instance; all other model_features that you pass in will augment the baseline_model_instance with new behavior.

Regardless what set of features you use to build your model, the returned object can be used to directly populate a halo catalog with mock galaxies using the populate_mock method, as shown in the example below.

Parameters:
*model_featuressequence of keyword arguments, optional

Each keyword you use will be interpreted as the name of a feature in the composite model, e.g. ‘stellar_mass’ or ‘star_formation_rate’; the value bound to each keyword must be an instance of a component model governing the behavior of that feature. See the examples section below.

baseline_model_instanceSubhaloModelFactory instance, optional

If passed to the constructor, the model_dictionary bound to the baseline_model_instance will be treated as the baseline dictionary. Any additional keyword arguments passed to the constructor that appear in the baseline dictionary will be treated as model features that replace the corresponding component model in the baseline dictionary. Any model features passed to the constructor that do not appear in the baseline dictionary will be treated as new features that augment the baseline model with new behavior. See the examples section below.

model_feature_calling_sequencelist, optional

Determines the order in which your component features will be called during mock population.

Some component models may have explicit dependence upon the value of some other galaxy property being modeled. In such a case, you must pass a model_feature_calling_sequence list, ordered in the desired calling sequence.

A classic example is if the stellar-to-halo-mass relation has explicit dependence on the star formation rate of the galaxy (active or quiescent). For this example, the model_feature_calling_sequence would be model_feature_calling_sequence = [‘sfr_designation’, ‘stellar_mass’, …].

Default behavior is to assume that no model feature has explicit dependence upon any other, in which case the component models appearing in the model_features keyword arguments will be called in random order, giving primacy to the potential presence of stellar_mass and/or luminosity features.

gal_type_listlist, optional

List of strings providing the names of the galaxy types in the composite model. This is only necessary to provide if you have a gal_type in your model that is neither centrals nor satellites.

For example, if you have entirely separate models for red_satellites and blue_satellites, then your gal_type_list might be, gal_type_list = [‘centrals’, ‘red_satellites’, ‘blue_satellites’]. Another possible example would be gal_type_list = [‘centrals’, ‘satellites’, ‘orphans’].

redshift: float, optional

Redshift of the model galaxies. Must be compatible with the redshift of all component models, and with the redshift of the snapshot of the simulation used to populate mocks. Default is None.

halo_selection_funcfunction object, optional

Function object used to place a cut on the input table. If the halo_selection_func keyword argument is passed, the input to the function must be a single positional argument storing a length-N structured numpy array or Astropy table; the function output must be a length-N boolean array that will be used as a mask. Halos that are masked will be entirely neglected during mock population.

Examples

As described above, there are two different ways to build models using the HodModelFactory. Here we give demonstrations of each in turn.

In the first example we’ll show how to build a model from scratch using the model_features option. For illustration purposes, we’ll pick a particularly simple HOD-style model based on Zheng et al. (2007). As described in zheng07_model_dictionary, in this model there are two galaxy populations, ‘centrals’ and ‘satellites’; centrals sit at the center of dark matter halos, and satellites follow an NFW profile.

We’ll start with the features for the population of centrals:

>>> from halotools.empirical_models import TrivialPhaseSpace, Zheng07Cens
>>> cens_occ_model =  Zheng07Cens()
>>> cens_prof_model = TrivialPhaseSpace()

Now for the satellites:

>>> from halotools.empirical_models import NFWPhaseSpace, Zheng07Sats
>>> sats_occ_model =  Zheng07Sats()
>>> sats_prof_model = NFWPhaseSpace()

At this point we have our component model instances. The following call to the factory uses the model_features option described above:

>>> model_instance = HodModelFactory(centrals_occupation = cens_occ_model, centrals_profile = cens_prof_model, satellites_occupation = sats_occ_model, satellites_profile = sats_prof_model)

The feature names we have chosen are ‘centrals_occupation’ and ‘centrals_profile’, ‘satellites_occupation’ and ‘satellites_profile’. The first substring of each feature name informs the factory of the name of the galaxy population, the second substring identifies the type of feature; to each feature we have attached a component model instance.

Whatever features your composite model has, you can use the populate_mock method to create Monte Carlo realization of the model by populating any dark matter halo catalog in your cache directory:

>>> from halotools.sim_manager import CachedHaloCatalog
>>> halocat = CachedHaloCatalog(simname = 'bolshoi', redshift = 0.5) 
>>> model_instance.populate_mock(halocat) 

Your model_instance now has a mock attribute storing a synthetic galaxy population. See the populate_mock docstring for details.

There also convenience functions for estimating the clustering signal predicted by the model. For example, the following method repeatedly populates the Bolshoi simulation with galaxies, computes the 3-d galaxy clustering signal of each mock, computes the median clustering signal in each bin, and returns the result:

>>> r, xi = model_instance.compute_average_galaxy_clustering(num_iterations = 5, simname = 'bolshoi', redshift = 0.5) 

In this next example we’ll show how to build a new model from an existing one using the baseline_model_instance option. We will start from the composite model built in Example 1 above. Here we’ll build a new model which is identical the model_instance above, only we instead use the AssembiasZheng07Cens class to introduce assembly bias into the occupation statistics of central galaxies.

>>> from halotools.empirical_models import AssembiasZheng07Cens
>>> new_cen_occ_model = AssembiasZheng07Cens()
>>> new_model_instance = HodModelFactory(baseline_model_instance = model_instance, centrals_occupation = new_cen_occ_model)

The new_model_instance and the original model_instance are identical in every respect except for the assembly bias of central galaxy occupation.

Methods Summary

build_dtype_list()

Create the _galprop_dtypes_to_allocate attribute that determines the name and data type of every galaxy property that will appear in the mock galaxy_table.

build_init_param_dict()

Create the param_dict attribute of the instance.

build_lookup_tables()

Method to compute and load lookup tables for each of the phase space component models.

build_model_feature_calling_sequence(...)

Method uses the model_feature_calling_sequence passed to __init__, if available.

build_new_haloprop_func_dict()

Method used to build a dictionary of functions, new_haloprop_func_dict, that create new halo catalog columns during a pre-processing phase of mock population.

build_prim_sec_haloprop_list()

Method builds the _haloprop_list of strings.

build_prof_param_keys()

build_publication_list()

populate_mock(halocat, **kwargs)

Method used to populate a simulation with a Monte Carlo realization of a model.

restore_init_param_dict()

Reset all values of the current param_dict to the values the class was instantiated with.

set_calling_sequence()

Method used to determine the sequence of function calls that will be made during mock population.

set_gal_types()

Private method binding the gal_types list attribute.

set_inherited_methods()

Each component model should have a _mock_generation_calling_sequence attribute that provides the sequence of method names to call during mock population.

set_model_redshift()

set_primary_behaviors()

Creates names and behaviors for the primary methods of HodModelFactory that will be used by the outside world.

set_warning_suppressions()

Method used to determine whether a warning should be issued if the build_init_param_dict method detects the presence of multiple appearances of the same parameter name.

update_param_dict_decorator(component_model, ...)

Decorator used to propagate any possible changes in the composite model param_dict down to the appropriate component model param_dict.

Methods Documentation

build_dtype_list()[source]

Create the _galprop_dtypes_to_allocate attribute that determines the name and data type of every galaxy property that will appear in the mock galaxy_table.

This attribute is determined by examining the _galprop_dtypes_to_allocate attribute of every component model, and building a composite set of all these dtypes, enforcing self-consistency in cases where the same galaxy property appears more than once.

build_init_param_dict()[source]

Create the param_dict attribute of the instance. The param_dict is a dictionary storing the full collection of parameters controlling the behavior of the composite model.

The param_dict dictionary is determined by examining the param_dict attribute of every component model, and building up a composite dictionary from them. It is permissible for the same parameter name to appear more than once amongst a set of component models, but a warning will be issued in such cases.

Notes

In MCMC applications, the items of param_dict defines the possible parameter set explored by the likelihood engine. Changing the values of the parameters in param_dict will propagate to the behavior of the component models when the relevant methods are called.

build_lookup_tables()[source]

Method to compute and load lookup tables for each of the phase space component models.

build_model_feature_calling_sequence(supplementary_kwargs)[source]

Method uses the model_feature_calling_sequence passed to __init__, if available. If no such argument was passed, the default sequence will be to first call occupation features, then call all other features in a random order, always calling features associated with a centrals population first (if presesent).

Parameters:
supplementary_kwargsdict

Dictionary storing all keyword arguments passed to the __init__ constructor that were not part of the input model dictionary.

Returns:
model_feature_calling_sequencelist

List of strings specifying the order in which the component models will be called upon during mock population to execute their methods.

build_new_haloprop_func_dict()[source]

Method used to build a dictionary of functions, new_haloprop_func_dict, that create new halo catalog columns during a pre-processing phase of mock population.

build_prim_sec_haloprop_list()[source]

Method builds the _haloprop_list of strings.

This list stores the names of all halo catalog columns that appear as either prim_haloprop_key or sec_haloprop_key of any component model. For all strings appearing in _haloprop_list, the mock galaxy_table will have a corresponding column storing the halo property inherited by the mock galaxy.

build_prof_param_keys()[source]
build_publication_list()[source]
populate_mock(halocat, **kwargs)[source]

Method used to populate a simulation with a Monte Carlo realization of a model.

After calling this method, the model instance will have a new mock attribute. You can then access the galaxy population via model.mock.galaxy_table, an Astropy Table.

See Tutorial on the algorithm for HOD-based mock-making for an in-depth tutorial on the mock-making algorithm.

Parameters:
halocatobject

Either an instance of CachedHaloCatalog or UserSuppliedHaloCatalog.

Num_ptcl_requirementint, optional

Requirement on the number of dark matter particles in the halo. The column defined by the halo_mass_column_key string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default value is set in Num_ptcl_requirement. Currently only supported for instances of HodModelFactory.

halo_mass_column_keystring, optional

This string must be a column of the input halo catalog. The column defined by this string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default is ‘halo_mvir’. Currently only supported for instances of HodModelFactory.

masking_functionfunction, optional

Function object used to place a mask on the halo table prior to calling the mock generating functions. Calling signature of the function should be to accept a single positional argument storing a table, and returning a boolean numpy array that will be used as a fancy indexing mask. All masked halos will be ignored during mock population. Default is None.

enforce_PBCbool, optional

If set to True, after galaxy positions are assigned the model_helpers.enforce_periodicity_of_box will re-map satellite galaxies whose positions spilled over the edge of the periodic box. Default is True. This variable should only ever be set to False when using the masking_function to populate a specific spatial subvolume, as in that case PBCs no longer apply. Currently only supported for instances of HodModelFactory.

Notes

Note the difference between the halotools.empirical_models.HodMockFactory.populate method and the closely related method halotools.empirical_models.HodModelFactory.populate_mock. The populate_mock method is bound to a composite model instance and is called the first time a composite model is used to generate a mock. Calling the populate_mock method creates the HodMockFactory instance and binds it to composite model. From then on, if you want to repopulate a new Universe with the same composite model, you should instead call the populate method bound to model.mock. The reason for this distinction is that calling populate_mock triggers a large number of relatively expensive pre-processing steps and self-consistency checks that need only be carried out once. See the Examples section below for an explicit demonstration.

In particular, if you are running an MCMC type analysis, you will choose your halo catalog and completeness cuts, and call populate_mock with the appropriate arguments. Thereafter, you can explore parameter space by changing the values stored in the param_dict dictionary attached to the model, and then calling the populate method bound to model.mock. Any changes to the param_dict of the model will automatically propagate into the behavior of the populate method.

Examples

Here we’ll use a pre-built model to demonstrate basic usage. The syntax shown below is the same for all composite models, whether they are pre-built by Halotools or built by you with HodModelFactory.

>>> from halotools.empirical_models import PrebuiltHodModelFactory
>>> model_instance = PrebuiltHodModelFactory('zheng07')

Here we will use a fake simulation, but you can populate mocks using any instance of CachedHaloCatalog or UserSuppliedHaloCatalog.

>>> from halotools.sim_manager import FakeSim
>>> halocat = FakeSim()
>>> model_instance.populate_mock(halocat)

Your model_instance now has a mock attribute bound to it. You can call the populate method bound to the mock, which will repopulate the halo catalog with a new Monte Carlo realization of the model.

>>> model_instance.mock.populate()

If you want to change the behavior of your model, just change the values stored in the param_dict. Differences in the parameter values will change the behavior of the mock-population.

>>> model_instance.param_dict['logMmin'] = 12.1
>>> model_instance.mock.populate()
restore_init_param_dict()[source]

Reset all values of the current param_dict to the values the class was instantiated with.

Primary behaviors are reset as well, as this is how the inherited behaviors get bound to the values in param_dict.

set_calling_sequence()[source]

Method used to determine the sequence of function calls that will be made during mock population. The methods of each component model will be called one after the other; the order in which the component models are called upon is determined by _model_feature_calling_sequence. When each component model is called, the sequence of methods that are called for that component is determined by the _mock_generation_calling_sequence attribute bound to the component model instance. See The model_feature_calling_sequence mechanism for further details.

set_gal_types()[source]

Private method binding the gal_types list attribute. If there are both centrals and satellites, method ensures that centrals will always be built first, out of consideration for satellite model components with explicit dependence on the central population.

set_inherited_methods()[source]

Each component model should have a _mock_generation_calling_sequence attribute that provides the sequence of method names to call during mock population. Additionally, each component should have a _methods_to_inherit attribute that determines which methods will be inherited by the composite model. The _mock_generation_calling_sequence list should be a subset of _methods_to_inherit. If any of the above conditions fail, no exception will be raised during the construction of the composite model. Instead, an empty list will be forcibly attached to each component model for which these lists may have been missing. Also, for each component model, if there are any elements of _mock_generation_calling_sequence that were missing from _methods_to_inherit, all such elements will be forcibly added to that component model’s _methods_to_inherit.

Finally, each component model should have an _attrs_to_inherit attribute that determines which attributes will be inherited by the composite model. If any component models did not implement the _attrs_to_inherit, an empty list is forcibly added to the component model.

After calling the set_inherited_methods method, it will be therefore be entirely safe to run a for loop over each component model’s _methods_to_inherit and _attrs_to_inherit, even if these lists were forgotten or irrelevant to that particular component.

set_model_redshift()[source]
set_primary_behaviors()[source]

Creates names and behaviors for the primary methods of HodModelFactory that will be used by the outside world.

Notes

The new methods created here are given standardized names, for consistent communication with the rest of the package. This consistency is particularly important for mock-making, so that the HodMockFactory can always call the same functions regardless of the complexity of the model.

The behaviors of the methods created here are defined elsewhere; set_primary_behaviors just creates a symbolic link to those external behaviors.

set_warning_suppressions()[source]

Method used to determine whether a warning should be issued if the build_init_param_dict method detects the presence of multiple appearances of the same parameter name.

If any of the component model instances have a _suppress_repeated_param_warning attribute that is set to the boolean True value, then no warning will be issued even if there are multiple appearances of the same parameter name. This allows the user to not be bothered with warning messages for cases where it is understood that there will be no conflicting behavior.

update_param_dict_decorator(component_model, func_name)[source]

Decorator used to propagate any possible changes in the composite model param_dict down to the appropriate component model param_dict.

The behavior of the methods bound to the composite model are decorated versions of the methods defined in the component models. The decoration is done with update_param_dict_decorator. For each function that gets bound to the composite model, what this decorator does is search the param_dict of the component_model associated with the function, and update all matching keys in that param_dict with the param_dict of the composite. This way, all the user needs to do is make changes to the composite model param_dict. Then, when calling any method of the composite model, the changed values of the param_dict automatically propagate down to the component model before calling upon its behavior. This allows the composite_model to control behavior of functions that it does not define.

Parameters:
component_modelobj

Instance of the component model in which the behavior of the function is defined.

func_namestring

Name of the method in the component model whose behavior is being decorated.

Returns:
decorated_funcfunction

Function object whose behavior is identical to the behavior of the function in the component model, except that the component model param_dict is first updated with any possible changes to corresponding parameters in the composite model param_dict.