ModelFactory¶
- class halotools.empirical_models.ModelFactory(input_model_dictionary, **kwargs)[source]¶
Bases:
object
Abstract container class used to build any composite model of the galaxy-halo connection.
See
SubhaloModelFactory
for subhalo-based models, andHodModelFactory
for HOD-style models.- Parameters:
- input_model_dictionarydict
dictionary providing instructions for how to build the composite model from a set of components.
- galaxy_selection_funcfunction object, optional
Function object that imposes a cut on the mock galaxies. Function should take a length-k Astropy table as a single positional argument, and return a length-k numpy boolean array that will be treated as a mask over the rows of the table. If not None, the mask defined by
galaxy_selection_func
will be applied to thegalaxy_table
after the table is generated by thepopulate_mock
method. Default is None.- halo_selection_funcfunction object, optional
Function object used to place a cut on the input
table
. If thehalo_selection_func
keyword argument is passed, the input to the function must be a single positional argument storing a length-N structured numpy array or Astropy table; the function output must be a length-N boolean array that will be used as a mask. Halos that are masked will be entirely neglected during mock population.
Methods Summary
Method repeatedly populates a simulation with a mock galaxy catalog, computes the clustering signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of clustering measurements.
Method repeatedly populates a simulation with a mock galaxy catalog, computes the galaxy-matter cross-correlation signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of repeated measurements.
populate_mock
(halocat[, Num_ptcl_requirement])Method used to populate a simulation with a Monte Carlo realization of a model.
update_param_dict_decorator
(component_model, ...)Decorator used to propagate any possible changes in the composite model param_dict down to the appropriate component model param_dict.
Methods Documentation
- compute_average_galaxy_clustering(num_iterations=5, summary_statistic='median', **kwargs)[source]¶
Method repeatedly populates a simulation with a mock galaxy catalog, computes the clustering signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of clustering measurements.
The
compute_average_galaxy_clustering
is simply a convenience function, and is not intended for use in performance-critical applications such as MCMCs. In an MCMC, there is no need to repeatedly populate the same snapshot with the same set of model parameters; the primary purpose for this repetition is for smoothing out numerical noise when making plots and doing exploratory work. If you wish to use the 3d correlation function in a performance-critical application, see Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d for a demonstration of how to call thetpcf
function once, directly on the mock galaxy catalog.- Parameters:
- num_iterationsint, optional
Number of Monte Carlo realizations to use to estimate the clustering signal. Default is 5.
- summary_statisticstring, optional
String specifying the method used to estimate the clustering signal from the collection of Monte Carlo realizations. Options are
median
andmean
. Default ismedian
.- simnamestring, optional
Nickname of the simulation into which mock galaxies will be populated. Currently supported simulations are Bolshoi (simname =
bolshoi
), Consuelo (simname =consuelo
), MultiDark (simname =multidark
), and Bolshoi-Planck (simname =bolplanck
). Default is set insim_defaults
.- halo_finderstring, optional
Nickname of the halo-finder of the halocat into which mock galaxies will be populated, e.g.,
rockstar
orbdm
. Default is set insim_defaults
.- redshiftfloat, optional
Redshift of the desired halocat into which mock galaxies will be populated. Default is set in
sim_defaults
.- variable_galaxy_maskscalar, optional
Any value used to construct a mask to select a sub-population of mock galaxies. See examples below.
- mask_functionarray, optional
Function object returning a masking array when operating on the galaxy_table. More flexible than the simpler
variable_galaxy_mask
option becausemask_function
allows for the possibility of multiple simultaneous cuts. See examples below.- include_crosscorrbool, optional
Only for simultaneous use with a
variable_galaxy_mask
-determined mask. Ifinclude_crosscorr
is set to False (the default option), method will return the auto-correlation function of the subsample of galaxies determined by the inputvariable_galaxy_mask
. Ifinclude_crosscorr
is True, method will return the auto-correlation of the subsample, the cross-correlation of the subsample and the complementary subsample, and the the auto-correlation of the complementary subsample, in that order. See examples below.- rbinsarray, optional
Bins in which the correlation function will be calculated. Default is set in
model_defaults
module.
- Returns:
- rbin_centersarray
Midpoint of the bins used in the correlation function calculation
- correlation_funcarray
If not using any mask (the default option), method returns the correlation function of the full mock galaxy catalog.
If using a mask, and if
include_crosscorr
is False (the default option), method returns the correlation function of the subsample of galaxies determined by the input mask.If using a mask, and if
include_crosscorr
is True, method will return the auto-correlation of the subsample, the cross-correlation of the subsample and the complementary subsample, and the the auto-correlation of the complementary subsample, in that order. See the example below.
Notes
The
compute_average_galaxy_clustering
method bound to mock instances is just a convenience wrapper around thetpcf
function. If you wish for greater control over how your galaxy clustering signal is estimated, see thetpcf
documentation.Note that there can be no guarantees that the
compute_average_galaxy_clustering
method bound to your model will terminate in a reasonable amount of time. For example, if you use a subhalo-based model that populates every subhalo in the catalog with a mock galaxy, then callingcompute_average_galaxy_clustering
on this model will attempt to compute a correlation function on hundreds of millions of points. In such cases, you are better off calling thepopulate_mock
method and then calling thetpcf
after placing a cut on thegalaxy_table
, as demonstrated in Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d.Examples
The simplest use-case of the
compute_average_galaxy_clustering
function is just to call the function with no arguments. This will generate a sequence of Monte Carlo realizations of your model into the default halocat, calculate the two-point correlation function of all galaxies in your mock, and return the median clustering strength in each radial bin:>>> model = Leauthaud11() >>> r, clustering = model.compute_average_galaxy_clustering()
To control how which simulation is used, you use the same syntax you use to load a
CachedHaloCatalog
into memory from your cache directory:>>> r, clustering = model.compute_average_galaxy_clustering(simname = 'multidark', redshift=1)
You can control the number of mock catalogs that are generated via:
>>> r, clustering = model.compute_average_galaxy_clustering(num_iterations = 10)
You may wish to focus on the clustering signal for a specific subpopulation. To do this, you have two options. First, you can use the
variable_galaxy_mask
mechanism:>>> r, clustering = model.compute_average_galaxy_clustering(gal_type = 'centrals')
With the
variable_galaxy_mask
mechanism, you are free to use any column of your galaxy_table as a keyword argument. If you couple this function call with theinclude_crosscorr
keyword argument, the function will also return all auto- and cross-correlations of the subset and its complement:>>> r, cen_cen, cen_sat, sat_sat = model.compute_average_galaxy_clustering(gal_type = 'centrals', include_crosscorr = True)
- compute_average_galaxy_matter_cross_clustering(num_iterations=5, summary_statistic='median', **kwargs)[source]¶
Method repeatedly populates a simulation with a mock galaxy catalog, computes the galaxy-matter cross-correlation signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of repeated measurements.
The
compute_average_galaxy_matter_cross_clustering
is simply a convenience function, and is not intended for use in performance-critical applications such as MCMCs. In an MCMC, there is no need to repeatedly populate the same snapshot with the same set of model parameters; the primary purpose for this repetition is for smoothing out numerical noise when making plots and doing exploratory work. If you wish to use the 3d cross-correlation function in a performance-critical application, see Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d for a demonstration of how to call thetpcf
function once, directly on the mock galaxy catalog, and then refer to thetpcf
docstring for how to use the cross-correlation feature.- Parameters:
- num_iterationsint, optional
Number of Monte Carlo realizations to use to estimate the clustering signal. Default is 5.
- summary_statisticstring, optional
String specifying the method used to estimate the clustering signal from the collection of Monte Carlo realizations. Options are
median
andmean
. Default ismedian
.- simnamestring, optional
Nickname of the simulation into which mock galaxies will be populated. Currently supported simulations are Bolshoi (simname =
bolshoi
), Consuelo (simname =consuelo
), MultiDark (simname =multidark
), and Bolshoi-Planck (simname =bolplanck
). Default is set insim_defaults
.- halo_finderstring, optional
Nickname of the halo-finder of the halocat into which mock galaxies will be populated, e.g.,
rockstar
orbdm
. Default is set insim_defaults
.- redshiftfloat, optional
Redshift of the desired halocat into which mock galaxies will be populated. Default is set in
sim_defaults
.- variable_galaxy_maskscalar, optional
Any value used to construct a mask to select a sub-population of mock galaxies. See examples below.
- mask_functionarray, optional
Function object returning a masking array when operating on the galaxy_table. More flexible than the simpler
variable_galaxy_mask
option becausemask_function
allows for the possibility of multiple simultaneous cuts. See examples below.- include_complementbool, optional
Only for simultaneous use with a
variable_galaxy_mask
-determined mask. Ifinclude_complement
is set to False (the default option), method will return the cross-correlation function between a random downsampling of dark matter particles and the subsample of galaxies determined by the inputvariable_galaxy_mask
. Ifinclude_complement
is True, method will also return the cross-correlation between the dark matter particles and the complementary subsample. See examples below.- rbinsarray, optional
Bins in which the correlation function will be calculated. Default is set in
model_defaults
module.
- Returns:
- rbin_centersarray
Midpoint of the bins used in the correlation function calculation
- correlation_funcarray
If not using any mask (the default option), method returns the correlation function of the full mock galaxy catalog.
If using a mask, and if
include_crosscorr
is False (the default option), method returns the correlation function of the subsample of galaxies determined by the input mask.If using a mask, and if
include_crosscorr
is True, method will return the auto-correlation of the subsample, the cross-correlation of the subsample and the complementary subsample, and the the auto-correlation of the complementary subsample, in that order. See the example below.
Notes
The
compute_average_galaxy_matter_cross_clustering
method bound to mock instances is just a convenience wrapper around thetpcf
function. If you wish for greater control over how your galaxy clustering signal is estimated, see thetpcf
documentation.Note that there can be no guarantees that the
compute_average_galaxy_matter_cross_clustering
method bound to your model will terminate in a reasonable amount of time. For example, if you use a subhalo-based model that populates every subhalo in the catalog with a mock galaxy, then callingcompute_average_galaxy_matter_cross_clustering
on this model will attempt to compute a correlation function on hundreds of millions of points. In such cases, you are better off calling thepopulate_mock
method and then calling thetpcf
after placing a cut on thegalaxy_table
, as demonstrated in Galaxy Catalog Analysis Example: Galaxy-galaxy lensing. The only difference between this use-case and the one demonstrated in the tutorial is that here you will use thetpcf
to calculate the cross-correlation between dark matter particles and galaxies, rather than calling themean_delta_sigma
function.Examples
The simplest use-case of the
compute_average_galaxy_matter_cross_clustering
function is just to call the function with no arguments. This will generate a sequence of Monte Carlo realizations of your model into the default halocat, calculate the cross-correlation function between dark matter and all galaxies in your mock, and return the median clustering strength in each radial bin:>>> model = Leauthaud11() >>> r, clustering = model.compute_average_galaxy_matter_cross_clustering()
To control how which simulation is used, you use the same syntax you use to load a
CachedHaloCatalog
into memory from your cache directory:>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering(simname = 'multidark', redshift=1)
You can control the number of mock catalogs that are generated via:
>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering(num_iterations = 10)
You may wish to focus on the clustering signal for a specific subpopulation. To do this, you have two options. First, you can use the
variable_galaxy_mask
mechanism:>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering(gal_type = 'centrals')
With the
variable_galaxy_mask
mechanism, you are free to use any column of your galaxy_table as a keyword argument. If you couple this function call with theinclude_complement
keyword argument, the function will also return the correlation function of the complementary subset.>>> r, cen_clustering, sat_clustering = model.compute_average_galaxy_matter_cross_clustering(gal_type = 'centrals', include_complement = True)
- populate_mock(halocat, Num_ptcl_requirement=300, **kwargs)[source]¶
Method used to populate a simulation with a Monte Carlo realization of a model.
After calling this method, the model instance will have a new
mock
attribute. You can then access the galaxy population viamodel.mock.galaxy_table
, an AstropyTable
.For documentation specific to the
populate_mock
method of subhalo-based models, seehalotools.empirical_models.SubhaloModelFactory.populate_mock
; for HOD-style models seehalotools.empirical_models.HodModelFactory.populate_mock
.See the Tutorial on the mock-making algorithms section of the documentation for an in-depth description of the Halotools source-code implementation of mock galaxy population.
- Parameters:
- halocatobject
Either an instance of
CachedHaloCatalog
orUserSuppliedHaloCatalog
.- Num_ptcl_requirementint, optional
Requirement on the number of dark matter particles in the halo. The column defined by the
halo_mass_column_key
string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default value is set inNum_ptcl_requirement
. Currently only supported for instances ofHodModelFactory
.- halo_mass_column_keystring, optional
This string must be a column of the input halo catalog. The column defined by this string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default is ‘halo_mvir’. Currently only supported for instances of
HodModelFactory
.- masking_functionfunction, optional
Function object used to place a mask on the halo table prior to calling the mock generating functions. Calling signature of the function should be to accept a single positional argument storing a table, and returning a boolean numpy array that will be used as a fancy indexing mask. All masked halos will be ignored during mock population. Default is None.
- enforce_PBCbool, optional
If set to True, after galaxy positions are assigned the
model_helpers.enforce_periodicity_of_box
will re-map satellite galaxies whose positions spilled over the edge of the periodic box. Default is True. This variable should only ever be set to False when using themasking_function
to populate a specific spatial subvolume, as in that case PBCs no longer apply. Currently only supported for instances ofHodModelFactory
.- seedint, optional
Random number seed used in the Monte Carlo realization. Default is None, which will produce stochastic results.
Notes
Note the difference between the
halotools.empirical_models.MockFactory.populate
method and the closely related methodhalotools.empirical_models.ModelFactory.populate_mock
. Thepopulate_mock
method is bound to a composite model instance and is called the first time a composite model is used to generate a mock. Calling thepopulate_mock
method creates theMockFactory
instance and binds it to composite model. From then on, if you want to repopulate a new Universe with the same composite model, you should instead call thepopulate
method bound tomodel.mock
. The reason for this distinction is that callingpopulate_mock
triggers a large number of relatively expensive pre-processing steps and self-consistency checks that need only be carried out once. See the Examples section below for an explicit demonstration.In particular, if you are running an MCMC type analysis, you will choose your halo catalog and completeness cuts, and call
populate_mock
with the appropriate arguments. Thereafter, you can explore parameter space by changing the values stored in theparam_dict
dictionary attached to the model, and then calling thepopulate
method bound tomodel.mock
. Any changes to theparam_dict
of the model will automatically propagate into the behavior of thepopulate
method.Examples
We’ll use a pre-built HOD-style model to demonstrate basic usage. The same syntax applies to subhalo-based models.
>>> from halotools.empirical_models import PrebuiltHodModelFactory >>> model_instance = PrebuiltHodModelFactory('zheng07')
Here we will use a fake simulation, but you can populate mocks using any instance of
CachedHaloCatalog
orUserSuppliedHaloCatalog
.>>> from halotools.sim_manager import FakeSim >>> halocat = FakeSim() >>> model_instance.populate_mock(halocat)
Your
model_instance
now has amock
attribute bound to it. You can call thepopulate
method bound to themock
, which will repopulate the halo catalog with a new Monte Carlo realization of the model.>>> model_instance.mock.populate()
If you want to change the behavior of your model, just change the values stored in the
param_dict
. Differences in the parameter values will change the behavior of the mock-population.>>> model_instance.param_dict['logMmin'] = 12.1 >>> model_instance.mock.populate()
- update_param_dict_decorator(component_model, func_name)[source]¶
Decorator used to propagate any possible changes in the composite model param_dict down to the appropriate component model param_dict.
The behavior of the methods bound to the composite model are decorated versions of the methods defined in the component models. The decoration is done with
update_param_dict_decorator
. For each function that gets bound to the composite model, what this decorator does is search the param_dict of the component_model associated with the function, and update all matching keys in that param_dict with the param_dict of the composite. This way, all the user needs to do is make changes to the composite model param_dict. Then, when calling any method of the composite model, the changed values of the param_dict automatically propagate down to the component model before calling upon its behavior. This allows the composite_model to control behavior of functions that it does not define.- Parameters:
- component_modelobj
Instance of the component model in which the behavior of the function is defined.
- func_namestring
Name of the method in the component model whose behavior is being decorated.
- Returns:
- decorated_funcfunction
Function object whose behavior is identical to the behavior of the function in the component model, except that the component model param_dict is first updated with any possible changes to corresponding parameters in the composite model param_dict.