HodModelFactory¶
- class halotools.empirical_models.HodModelFactory(**kwargs)[source]¶
Bases:
ModelFactory
Class used to build HOD-style models of the galaxy-halo connection.
See Tutorial on building an HOD-style model for an in-depth description of how to build HOD models, demonstrated by a sequence of increasingly complex examples. If you do not wish to build your own model but want to use one provided by Halotools, instead see
PrebuiltHodModelFactory
.All HOD-style composite models can directly populate catalogs of dark matter halos. For an in-depth description of how Halotools implements this mock-generation, see Tutorial on the algorithm for HOD-based mock-making.
The arguments passed to the
HodModelFactory
constructor determine the features of the model that are returned by the factory. This works in one of two ways, both of which have explicit examples provided below.Building a new model from scratch.
You can build a model from scratch by passing in a sequence of
model_features
, each of which are instances of component models. The factory then composes these independently-defined components into a composite model.Building a new model from an existing model.
It is also possible to add/swap new features to a previously built composite model instance, allowing you to create new models from existing ones. To do this, you pass in a
baseline_model_instance
and any set ofmodel_features
. Anymodel_feature
keyword that matches a feature name of thebaseline_model_instance
will replace that feature in thebaseline_model_instance
; all othermodel_features
that you pass in will augment thebaseline_model_instance
with new behavior.Regardless what set of features you use to build your model, the returned object can be used to directly populate a halo catalog with mock galaxies using the
populate_mock
method, as shown in the example below.- Parameters:
- *model_featuressequence of keyword arguments, optional
Each keyword you use will be interpreted as the name of a feature in the composite model, e.g. ‘stellar_mass’ or ‘star_formation_rate’; the value bound to each keyword must be an instance of a component model governing the behavior of that feature. See the examples section below.
- baseline_model_instance
SubhaloModelFactory
instance, optional If passed to the constructor, the
model_dictionary
bound to thebaseline_model_instance
will be treated as the baseline dictionary. Any additional keyword arguments passed to the constructor that appear in the baseline dictionary will be treated as model features that replace the corresponding component model in the baseline dictionary. Any model features passed to the constructor that do not appear in the baseline dictionary will be treated as new features that augment the baseline model with new behavior. See the examples section below.- model_feature_calling_sequencelist, optional
Determines the order in which your component features will be called during mock population.
Some component models may have explicit dependence upon the value of some other galaxy property being modeled. In such a case, you must pass a
model_feature_calling_sequence
list, ordered in the desired calling sequence.A classic example is if the stellar-to-halo-mass relation has explicit dependence on the star formation rate of the galaxy (active or quiescent). For this example, the
model_feature_calling_sequence
would be model_feature_calling_sequence = [‘sfr_designation’, ‘stellar_mass’, …].Default behavior is to assume that no model feature has explicit dependence upon any other, in which case the component models appearing in the
model_features
keyword arguments will be called in random order, giving primacy to the potential presence ofstellar_mass
and/orluminosity
features.- gal_type_listlist, optional
List of strings providing the names of the galaxy types in the composite model. This is only necessary to provide if you have a gal_type in your model that is neither
centrals
norsatellites
.For example, if you have entirely separate models for
red_satellites
andblue_satellites
, then yourgal_type_list
might be, gal_type_list = [‘centrals’, ‘red_satellites’, ‘blue_satellites’]. Another possible example would be gal_type_list = [‘centrals’, ‘satellites’, ‘orphans’].- redshift: float, optional
Redshift of the model galaxies. Must be compatible with the redshift of all component models, and with the redshift of the snapshot of the simulation used to populate mocks. Default is None.
- halo_selection_funcfunction object, optional
Function object used to place a cut on the input
table
. If thehalo_selection_func
keyword argument is passed, the input to the function must be a single positional argument storing a length-N structured numpy array or Astropy table; the function output must be a length-N boolean array that will be used as a mask. Halos that are masked will be entirely neglected during mock population.
Examples
As described above, there are two different ways to build models using the
HodModelFactory
. Here we give demonstrations of each in turn.In the first example we’ll show how to build a model from scratch using the
model_features
option. For illustration purposes, we’ll pick a particularly simple HOD-style model based on Zheng et al. (2007). As described inzheng07_model_dictionary
, in this model there are two galaxy populations, ‘centrals’ and ‘satellites’; centrals sit at the center of dark matter halos, and satellites follow an NFW profile.We’ll start with the features for the population of centrals:
>>> from halotools.empirical_models import TrivialPhaseSpace, Zheng07Cens >>> cens_occ_model = Zheng07Cens() >>> cens_prof_model = TrivialPhaseSpace()
Now for the satellites:
>>> from halotools.empirical_models import NFWPhaseSpace, Zheng07Sats >>> sats_occ_model = Zheng07Sats() >>> sats_prof_model = NFWPhaseSpace()
At this point we have our component model instances. The following call to the factory uses the
model_features
option described above:>>> model_instance = HodModelFactory(centrals_occupation = cens_occ_model, centrals_profile = cens_prof_model, satellites_occupation = sats_occ_model, satellites_profile = sats_prof_model)
The feature names we have chosen are ‘centrals_occupation’ and ‘centrals_profile’, ‘satellites_occupation’ and ‘satellites_profile’. The first substring of each feature name informs the factory of the name of the galaxy population, the second substring identifies the type of feature; to each feature we have attached a component model instance.
Whatever features your composite model has, you can use the
populate_mock
method to create Monte Carlo realization of the model by populating any dark matter halo catalog in your cache directory:>>> from halotools.sim_manager import CachedHaloCatalog >>> halocat = CachedHaloCatalog(simname = 'bolshoi', redshift = 0.5) >>> model_instance.populate_mock(halocat)
Your
model_instance
now has amock
attribute storing a synthetic galaxy population. See thepopulate_mock
docstring for details.There also convenience functions for estimating the clustering signal predicted by the model. For example, the following method repeatedly populates the Bolshoi simulation with galaxies, computes the 3-d galaxy clustering signal of each mock, computes the median clustering signal in each bin, and returns the result:
>>> r, xi = model_instance.compute_average_galaxy_clustering(num_iterations = 5, simname = 'bolshoi', redshift = 0.5)
In this next example we’ll show how to build a new model from an existing one using the
baseline_model_instance
option. We will start from the composite model built in Example 1 above. Here we’ll build a new model which is identical themodel_instance
above, only we instead use theAssembiasZheng07Cens
class to introduce assembly bias into the occupation statistics of central galaxies.>>> from halotools.empirical_models import AssembiasZheng07Cens >>> new_cen_occ_model = AssembiasZheng07Cens() >>> new_model_instance = HodModelFactory(baseline_model_instance = model_instance, centrals_occupation = new_cen_occ_model)
The
new_model_instance
and the originalmodel_instance
are identical in every respect except for the assembly bias of central galaxy occupation.Methods Summary
Create the
_galprop_dtypes_to_allocate
attribute that determines the name and data type of every galaxy property that will appear in the mockgalaxy_table
.Create the
param_dict
attribute of the instance.Method to compute and load lookup tables for each of the phase space component models.
Method uses the
model_feature_calling_sequence
passed to __init__, if available.Method used to build a dictionary of functions,
new_haloprop_func_dict
, that create new halo catalog columns during a pre-processing phase of mock population.Method builds the
_haloprop_list
of strings.populate_mock
(halocat, **kwargs)Method used to populate a simulation with a Monte Carlo realization of a model.
Reset all values of the current
param_dict
to the values the class was instantiated with.Method used to determine the sequence of function calls that will be made during mock population.
Private method binding the
gal_types
list attribute.Each component model should have a
_mock_generation_calling_sequence
attribute that provides the sequence of method names to call during mock population.Creates names and behaviors for the primary methods of
HodModelFactory
that will be used by the outside world.Method used to determine whether a warning should be issued if the
build_init_param_dict
method detects the presence of multiple appearances of the same parameter name.update_param_dict_decorator
(component_model, ...)Decorator used to propagate any possible changes in the composite model param_dict down to the appropriate component model param_dict.
Methods Documentation
- build_dtype_list()[source]¶
Create the
_galprop_dtypes_to_allocate
attribute that determines the name and data type of every galaxy property that will appear in the mockgalaxy_table
.This attribute is determined by examining the
_galprop_dtypes_to_allocate
attribute of every component model, and building a composite set of all these dtypes, enforcing self-consistency in cases where the same galaxy property appears more than once.
- build_init_param_dict()[source]¶
Create the
param_dict
attribute of the instance. Theparam_dict
is a dictionary storing the full collection of parameters controlling the behavior of the composite model.The
param_dict
dictionary is determined by examining theparam_dict
attribute of every component model, and building up a composite dictionary from them. It is permissible for the same parameter name to appear more than once amongst a set of component models, but a warning will be issued in such cases.Notes
In MCMC applications, the items of
param_dict
defines the possible parameter set explored by the likelihood engine. Changing the values of the parameters inparam_dict
will propagate to the behavior of the component models when the relevant methods are called.
- build_lookup_tables()[source]¶
Method to compute and load lookup tables for each of the phase space component models.
- build_model_feature_calling_sequence(supplementary_kwargs)[source]¶
Method uses the
model_feature_calling_sequence
passed to __init__, if available. If no such argument was passed, the default sequence will be to first calloccupation
features, then call all other features in a random order, always calling features associated with acentrals
population first (if presesent).- Parameters:
- supplementary_kwargsdict
Dictionary storing all keyword arguments passed to the
__init__
constructor that were not part of the input model dictionary.
- Returns:
- model_feature_calling_sequencelist
List of strings specifying the order in which the component models will be called upon during mock population to execute their methods.
- build_new_haloprop_func_dict()[source]¶
Method used to build a dictionary of functions,
new_haloprop_func_dict
, that create new halo catalog columns during a pre-processing phase of mock population.See also
- build_prim_sec_haloprop_list()[source]¶
Method builds the
_haloprop_list
of strings.This list stores the names of all halo catalog columns that appear as either
prim_haloprop_key
orsec_haloprop_key
of any component model. For all strings appearing in_haloprop_list
, the mockgalaxy_table
will have a corresponding column storing the halo property inherited by the mock galaxy.
- populate_mock(halocat, **kwargs)[source]¶
Method used to populate a simulation with a Monte Carlo realization of a model.
After calling this method, the model instance will have a new
mock
attribute. You can then access the galaxy population viamodel.mock.galaxy_table
, an AstropyTable
.See Tutorial on the algorithm for HOD-based mock-making for an in-depth tutorial on the mock-making algorithm.
- Parameters:
- halocatobject
Either an instance of
CachedHaloCatalog
orUserSuppliedHaloCatalog
.- Num_ptcl_requirementint, optional
Requirement on the number of dark matter particles in the halo. The column defined by the
halo_mass_column_key
string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default value is set inNum_ptcl_requirement
. Currently only supported for instances ofHodModelFactory
.- halo_mass_column_keystring, optional
This string must be a column of the input halo catalog. The column defined by this string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default is ‘halo_mvir’. Currently only supported for instances of
HodModelFactory
.- masking_functionfunction, optional
Function object used to place a mask on the halo table prior to calling the mock generating functions. Calling signature of the function should be to accept a single positional argument storing a table, and returning a boolean numpy array that will be used as a fancy indexing mask. All masked halos will be ignored during mock population. Default is None.
- enforce_PBCbool, optional
If set to True, after galaxy positions are assigned the
model_helpers.enforce_periodicity_of_box
will re-map satellite galaxies whose positions spilled over the edge of the periodic box. Default is True. This variable should only ever be set to False when using themasking_function
to populate a specific spatial subvolume, as in that case PBCs no longer apply. Currently only supported for instances ofHodModelFactory
.
Notes
Note the difference between the
halotools.empirical_models.HodMockFactory.populate
method and the closely related methodhalotools.empirical_models.HodModelFactory.populate_mock
. Thepopulate_mock
method is bound to a composite model instance and is called the first time a composite model is used to generate a mock. Calling thepopulate_mock
method creates theHodMockFactory
instance and binds it to composite model. From then on, if you want to repopulate a new Universe with the same composite model, you should instead call thepopulate
method bound tomodel.mock
. The reason for this distinction is that callingpopulate_mock
triggers a large number of relatively expensive pre-processing steps and self-consistency checks that need only be carried out once. See the Examples section below for an explicit demonstration.In particular, if you are running an MCMC type analysis, you will choose your halo catalog and completeness cuts, and call
populate_mock
with the appropriate arguments. Thereafter, you can explore parameter space by changing the values stored in theparam_dict
dictionary attached to the model, and then calling thepopulate
method bound tomodel.mock
. Any changes to theparam_dict
of the model will automatically propagate into the behavior of thepopulate
method.Examples
Here we’ll use a pre-built model to demonstrate basic usage. The syntax shown below is the same for all composite models, whether they are pre-built by Halotools or built by you with
HodModelFactory
.>>> from halotools.empirical_models import PrebuiltHodModelFactory >>> model_instance = PrebuiltHodModelFactory('zheng07')
Here we will use a fake simulation, but you can populate mocks using any instance of
CachedHaloCatalog
orUserSuppliedHaloCatalog
.>>> from halotools.sim_manager import FakeSim >>> halocat = FakeSim() >>> model_instance.populate_mock(halocat)
Your
model_instance
now has amock
attribute bound to it. You can call thepopulate
method bound to themock
, which will repopulate the halo catalog with a new Monte Carlo realization of the model.>>> model_instance.mock.populate()
If you want to change the behavior of your model, just change the values stored in the
param_dict
. Differences in the parameter values will change the behavior of the mock-population.>>> model_instance.param_dict['logMmin'] = 12.1 >>> model_instance.mock.populate()
- restore_init_param_dict()[source]¶
Reset all values of the current
param_dict
to the values the class was instantiated with.Primary behaviors are reset as well, as this is how the inherited behaviors get bound to the values in
param_dict
.
- set_calling_sequence()[source]¶
Method used to determine the sequence of function calls that will be made during mock population. The methods of each component model will be called one after the other; the order in which the component models are called upon is determined by
_model_feature_calling_sequence
. When each component model is called, the sequence of methods that are called for that component is determined by the_mock_generation_calling_sequence
attribute bound to the component model instance. See The model_feature_calling_sequence mechanism for further details.
- set_gal_types()[source]¶
Private method binding the
gal_types
list attribute. If there are both centrals and satellites, method ensures that centrals will always be built first, out of consideration for satellite model components with explicit dependence on the central population.
- set_inherited_methods()[source]¶
Each component model should have a
_mock_generation_calling_sequence
attribute that provides the sequence of method names to call during mock population. Additionally, each component should have a_methods_to_inherit
attribute that determines which methods will be inherited by the composite model. The_mock_generation_calling_sequence
list should be a subset of_methods_to_inherit
. If any of the above conditions fail, no exception will be raised during the construction of the composite model. Instead, an empty list will be forcibly attached to each component model for which these lists may have been missing. Also, for each component model, if there are any elements of_mock_generation_calling_sequence
that were missing from_methods_to_inherit
, all such elements will be forcibly added to that component model’s_methods_to_inherit
.Finally, each component model should have an
_attrs_to_inherit
attribute that determines which attributes will be inherited by the composite model. If any component models did not implement the_attrs_to_inherit
, an empty list is forcibly added to the component model.After calling the set_inherited_methods method, it will be therefore be entirely safe to run a for loop over each component model’s
_methods_to_inherit
and_attrs_to_inherit
, even if these lists were forgotten or irrelevant to that particular component.
- set_primary_behaviors()[source]¶
Creates names and behaviors for the primary methods of
HodModelFactory
that will be used by the outside world.Notes
The new methods created here are given standardized names, for consistent communication with the rest of the package. This consistency is particularly important for mock-making, so that the
HodMockFactory
can always call the same functions regardless of the complexity of the model.The behaviors of the methods created here are defined elsewhere;
set_primary_behaviors
just creates a symbolic link to those external behaviors.
- set_warning_suppressions()[source]¶
Method used to determine whether a warning should be issued if the
build_init_param_dict
method detects the presence of multiple appearances of the same parameter name.If any of the component model instances have a
_suppress_repeated_param_warning
attribute that is set to the boolean True value, then no warning will be issued even if there are multiple appearances of the same parameter name. This allows the user to not be bothered with warning messages for cases where it is understood that there will be no conflicting behavior.See also
- update_param_dict_decorator(component_model, func_name)[source]¶
Decorator used to propagate any possible changes in the composite model param_dict down to the appropriate component model param_dict.
The behavior of the methods bound to the composite model are decorated versions of the methods defined in the component models. The decoration is done with
update_param_dict_decorator
. For each function that gets bound to the composite model, what this decorator does is search the param_dict of the component_model associated with the function, and update all matching keys in that param_dict with the param_dict of the composite. This way, all the user needs to do is make changes to the composite model param_dict. Then, when calling any method of the composite model, the changed values of the param_dict automatically propagate down to the component model before calling upon its behavior. This allows the composite_model to control behavior of functions that it does not define.- Parameters:
- component_modelobj
Instance of the component model in which the behavior of the function is defined.
- func_namestring
Name of the method in the component model whose behavior is being decorated.
- Returns:
- decorated_funcfunction
Function object whose behavior is identical to the behavior of the function in the component model, except that the component model param_dict is first updated with any possible changes to corresponding parameters in the composite model param_dict.