ModelFactory

class halotools.empirical_models.ModelFactory(input_model_dictionary, **kwargs)[source] [edit on github]

Bases: object

Abstract container class used to build any composite model of the galaxy-halo connection.

See SubhaloModelFactory for subhalo-based models, and HodModelFactory for HOD-style models.

Parameters:

input_model_dictionary : dict

dictionary providing instructions for how to build the composite model from a set of components.

galaxy_selection_func : function object, optional

Function object that imposes a cut on the mock galaxies. Function should take a length-k Astropy table as a single positional argument, and return a length-k numpy boolean array that will be treated as a mask over the rows of the table. If not None, the mask defined by galaxy_selection_func will be applied to the galaxy_table after the table is generated by the populate_mock method. Default is None.

halo_selection_func : function object, optional

Function object used to place a cut on the input table. If the halo_selection_func keyword argument is passed, the input to the function must be a single positional argument storing a length-N structured numpy array or Astropy table; the function output must be a length-N boolean array that will be used as a mask. Halos that are masked will be entirely neglected during mock population.

Methods Summary

compute_average_galaxy_clustering([…]) Method repeatedly populates a simulation with a mock galaxy catalog, computes the clustering signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of clustering measurements.
compute_average_galaxy_matter_cross_clustering([…]) Method repeatedly populates a simulation with a mock galaxy catalog, computes the galaxy-matter cross-correlation signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of repeated measurements.
populate_mock(halocat[, Num_ptcl_requirement]) Method used to populate a simulation with a Monte Carlo realization of a model.
update_param_dict_decorator(component_model, …) Decorator used to propagate any possible changes in the composite model param_dict down to the appropriate component model param_dict.

Methods Documentation

compute_average_galaxy_clustering(num_iterations=5, summary_statistic='median', **kwargs)[source] [edit on github]

Method repeatedly populates a simulation with a mock galaxy catalog, computes the clustering signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of clustering measurements.

The compute_average_galaxy_clustering is simply a convenience function, and is not intended for use in performance-critical applications such as MCMCs. In an MCMC, there is no need to repeatedly populate the same snapshot with the same set of model parameters; the primary purpose for this repetition is for smoothing out numerical noise when making plots and doing exploratory work. If you wish to use the 3d correlation function in a performance-critical application, see Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d for a demonstration of how to call the tpcf function once, directly on the mock galaxy catalog.

Parameters:

num_iterations : int, optional

Number of Monte Carlo realizations to use to estimate the clustering signal. Default is 5.

summary_statistic : string, optional

String specifying the method used to estimate the clustering signal from the collection of Monte Carlo realizations. Options are median and mean. Default is median.

simname : string, optional

Nickname of the simulation into which mock galaxies will be populated. Currently supported simulations are Bolshoi (simname = bolshoi), Consuelo (simname = consuelo), MultiDark (simname = multidark), and Bolshoi-Planck (simname = bolplanck). Default is set in sim_defaults.

halo_finder : string, optional

Nickname of the halo-finder of the halocat into which mock galaxies will be populated, e.g., rockstar or bdm. Default is set in sim_defaults.

redshift : float, optional

Redshift of the desired halocat into which mock galaxies will be populated. Default is set in sim_defaults.

variable_galaxy_mask : scalar, optional

Any value used to construct a mask to select a sub-population of mock galaxies. See examples below.

mask_function : array, optional

Function object returning a masking array when operating on the galaxy_table. More flexible than the simpler variable_galaxy_mask option because mask_function allows for the possibility of multiple simultaneous cuts. See examples below.

include_crosscorr : bool, optional

Only for simultaneous use with a variable_galaxy_mask-determined mask. If include_crosscorr is set to False (the default option), method will return the auto-correlation function of the subsample of galaxies determined by the input variable_galaxy_mask. If include_crosscorr is True, method will return the auto-correlation of the subsample, the cross-correlation of the subsample and the complementary subsample, and the the auto-correlation of the complementary subsample, in that order. See examples below.

rbins : array, optional

Bins in which the correlation function will be calculated. Default is set in model_defaults module.

Returns:

rbin_centers : array

Midpoint of the bins used in the correlation function calculation

correlation_func : array

If not using any mask (the default option), method returns the correlation function of the full mock galaxy catalog.

If using a mask, and if include_crosscorr is False (the default option), method returns the correlation function of the subsample of galaxies determined by the input mask.

If using a mask, and if include_crosscorr is True, method will return the auto-correlation of the subsample, the cross-correlation of the subsample and the complementary subsample, and the the auto-correlation of the complementary subsample, in that order. See the example below.

Notes

The compute_average_galaxy_clustering method bound to mock instances is just a convenience wrapper around the tpcf function. If you wish for greater control over how your galaxy clustering signal is estimated, see the tpcf documentation.

Note that there can be no guarantees that the compute_average_galaxy_clustering method bound to your model will terminate in a reasonable amount of time. For example, if you use a subhalo-based model that populates every subhalo in the catalog with a mock galaxy, then calling compute_average_galaxy_clustering on this model will attempt to compute a correlation function on hundreds of millions of points. In such cases, you are better off calling the populate_mock method and then calling the tpcf after placing a cut on the galaxy_table, as demonstrated in Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d.

Examples

The simplest use-case of the compute_average_galaxy_clustering function is just to call the function with no arguments. This will generate a sequence of Monte Carlo realizations of your model into the default halocat, calculate the two-point correlation function of all galaxies in your mock, and return the median clustering strength in each radial bin:

>>> model = Leauthaud11() 
>>> r, clustering = model.compute_average_galaxy_clustering() 

To control how which simulation is used, you use the same syntax you use to load a CachedHaloCatalog into memory from your cache directory:

>>> r, clustering = model.compute_average_galaxy_clustering(simname = 'multidark', redshift=1) 

You can control the number of mock catalogs that are generated via:

>>> r, clustering = model.compute_average_galaxy_clustering(num_iterations = 10) 

You may wish to focus on the clustering signal for a specific subpopulation. To do this, you have two options. First, you can use the variable_galaxy_mask mechanism:

>>> r, clustering = model.compute_average_galaxy_clustering(gal_type = 'centrals') 

With the variable_galaxy_mask mechanism, you are free to use any column of your galaxy_table as a keyword argument. If you couple this function call with the include_crosscorr keyword argument, the function will also return all auto- and cross-correlations of the subset and its complement:

>>> r, cen_cen, cen_sat, sat_sat = model.compute_average_galaxy_clustering(gal_type = 'centrals', include_crosscorr = True) 

Your second option is to use the mask_function option. For example, suppose we wish to study the clustering of satellite galaxies residing in cluster-mass halos:

>>> def my_masking_function(table): 
>>>     result = (table['halo_mvir'] > 1e14) & (table['gal_type'] == 'satellites') 
>>>     return result 
>>> r, cluster_sat_clustering = model.compute_average_galaxy_clustering(mask_function = my_masking_function) 
compute_average_galaxy_matter_cross_clustering(num_iterations=5, summary_statistic='median', **kwargs)[source] [edit on github]

Method repeatedly populates a simulation with a mock galaxy catalog, computes the galaxy-matter cross-correlation signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of repeated measurements.

The compute_average_galaxy_matter_cross_clustering is simply a convenience function, and is not intended for use in performance-critical applications such as MCMCs. In an MCMC, there is no need to repeatedly populate the same snapshot with the same set of model parameters; the primary purpose for this repetition is for smoothing out numerical noise when making plots and doing exploratory work. If you wish to use the 3d cross-correlation function in a performance-critical application, see Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d for a demonstration of how to call the tpcf function once, directly on the mock galaxy catalog, and then refer to the tpcf docstring for how to use the cross-correlation feature.

Parameters:

num_iterations : int, optional

Number of Monte Carlo realizations to use to estimate the clustering signal. Default is 5.

summary_statistic : string, optional

String specifying the method used to estimate the clustering signal from the collection of Monte Carlo realizations. Options are median and mean. Default is median.

simname : string, optional

Nickname of the simulation into which mock galaxies will be populated. Currently supported simulations are Bolshoi (simname = bolshoi), Consuelo (simname = consuelo), MultiDark (simname = multidark), and Bolshoi-Planck (simname = bolplanck). Default is set in sim_defaults.

halo_finder : string, optional

Nickname of the halo-finder of the halocat into which mock galaxies will be populated, e.g., rockstar or bdm. Default is set in sim_defaults.

redshift : float, optional

Redshift of the desired halocat into which mock galaxies will be populated. Default is set in sim_defaults.

variable_galaxy_mask : scalar, optional

Any value used to construct a mask to select a sub-population of mock galaxies. See examples below.

mask_function : array, optional

Function object returning a masking array when operating on the galaxy_table. More flexible than the simpler variable_galaxy_mask option because mask_function allows for the possibility of multiple simultaneous cuts. See examples below.

include_complement : bool, optional

Only for simultaneous use with a variable_galaxy_mask-determined mask. If include_complement is set to False (the default option), method will return the cross-correlation function between a random downsampling of dark matter particles and the subsample of galaxies determined by the input variable_galaxy_mask. If include_complement is True, method will also return the cross-correlation between the dark matter particles and the complementary subsample. See examples below.

rbins : array, optional

Bins in which the correlation function will be calculated. Default is set in model_defaults module.

Returns:

rbin_centers : array

Midpoint of the bins used in the correlation function calculation

correlation_func : array

If not using any mask (the default option), method returns the correlation function of the full mock galaxy catalog.

If using a mask, and if include_crosscorr is False (the default option), method returns the correlation function of the subsample of galaxies determined by the input mask.

If using a mask, and if include_crosscorr is True, method will return the auto-correlation of the subsample, the cross-correlation of the subsample and the complementary subsample, and the the auto-correlation of the complementary subsample, in that order. See the example below.

Notes

The compute_average_galaxy_matter_cross_clustering method bound to mock instances is just a convenience wrapper around the tpcf function. If you wish for greater control over how your galaxy clustering signal is estimated, see the tpcf documentation.

Note that there can be no guarantees that the compute_average_galaxy_matter_cross_clustering method bound to your model will terminate in a reasonable amount of time. For example, if you use a subhalo-based model that populates every subhalo in the catalog with a mock galaxy, then calling compute_average_galaxy_matter_cross_clustering on this model will attempt to compute a correlation function on hundreds of millions of points. In such cases, you are better off calling the populate_mock method and then calling the tpcf after placing a cut on the galaxy_table, as demonstrated in Galaxy Catalog Analysis Example: Galaxy-galaxy lensing. The only difference between this use-case and the one demonstrated in the tutorial is that here you will use the tpcf to calculate the cross-correlation between dark matter particles and galaxies, rather than calling the delta_sigma function.

Examples

The simplest use-case of the compute_average_galaxy_matter_cross_clustering function is just to call the function with no arguments. This will generate a sequence of Monte Carlo realizations of your model into the default halocat, calculate the cross-correlation function between dark matter and all galaxies in your mock, and return the median clustering strength in each radial bin:

>>> model = Leauthaud11() 
>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering() 

To control how which simulation is used, you use the same syntax you use to load a CachedHaloCatalog into memory from your cache directory:

>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering(simname = 'multidark', redshift=1) 

You can control the number of mock catalogs that are generated via:

>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering(num_iterations = 10) 

You may wish to focus on the clustering signal for a specific subpopulation. To do this, you have two options. First, you can use the variable_galaxy_mask mechanism:

>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering(gal_type = 'centrals') 

With the variable_galaxy_mask mechanism, you are free to use any column of your galaxy_table as a keyword argument. If you couple this function call with the include_complement keyword argument, the function will also return the correlation function of the complementary subset.

>>> r, cen_clustering, sat_clustering = model.compute_average_galaxy_matter_cross_clustering(gal_type = 'centrals', include_complement = True) 

Your second option is to use the mask_function option. For example, suppose we wish to study the galaxy-matter cross-correlation function of satellite galaxies residing in cluster-mass halos:

>>> def my_masking_function(table): 
>>>     result = (table['halo_mvir'] > 1e14) & (table['gal_type'] == 'satellites') 
>>>     return result 
>>> r, cluster_sat_clustering = model.compute_average_galaxy_matter_cross_clustering(mask_function = my_masking_function) 
populate_mock(halocat, Num_ptcl_requirement=300, **kwargs)[source] [edit on github]

Method used to populate a simulation with a Monte Carlo realization of a model.

After calling this method, the model instance will have a new mock attribute. You can then access the galaxy population via model.mock.galaxy_table, an Astropy Table.

For documentation specific to the populate_mock method of subhalo-based models, see halotools.empirical_models.SubhaloModelFactory.populate_mock; for HOD-style models see halotools.empirical_models.HodModelFactory.populate_mock.

See the Tutorial on the mock-making algorithms section of the documentation for an in-depth description of the Halotools source-code implementation of mock galaxy population.

Parameters:

halocat : object

Either an instance of CachedHaloCatalog or UserSuppliedHaloCatalog.

Num_ptcl_requirement : int, optional

Requirement on the number of dark matter particles in the halo. The column defined by the halo_mass_column_key string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default value is set in Num_ptcl_requirement. Currently only supported for instances of HodModelFactory.

halo_mass_column_key : string, optional

This string must be a column of the input halo catalog. The column defined by this string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default is ‘halo_mvir’. Currently only supported for instances of HodModelFactory.

masking_function : function, optional

Function object used to place a mask on the halo table prior to calling the mock generating functions. Calling signature of the function should be to accept a single positional argument storing a table, and returning a boolean numpy array that will be used as a fancy indexing mask. All masked halos will be ignored during mock population. Default is None.

enforce_PBC : bool, optional

If set to True, after galaxy positions are assigned the model_helpers.enforce_periodicity_of_box will re-map satellite galaxies whose positions spilled over the edge of the periodic box. Default is True. This variable should only ever be set to False when using the masking_function to populate a specific spatial subvolume, as in that case PBCs no longer apply. Currently only supported for instances of HodModelFactory.

seed : int, optional

Random number seed used in the Monte Carlo realization. Default is None, which will produce stochastic results.

Notes

Note the difference between the halotools.empirical_models.MockFactory.populate method and the closely related method halotools.empirical_models.ModelFactory.populate_mock. The populate_mock method is bound to a composite model instance and is called the first time a composite model is used to generate a mock. Calling the populate_mock method creates the MockFactory instance and binds it to composite model. From then on, if you want to repopulate a new Universe with the same composite model, you should instead call the populate method bound to model.mock. The reason for this distinction is that calling populate_mock triggers a large number of relatively expensive pre-processing steps and self-consistency checks that need only be carried out once. See the Examples section below for an explicit demonstration.

In particular, if you are running an MCMC type analysis, you will choose your halo catalog and completeness cuts, and call populate_mock with the appropriate arguments. Thereafter, you can explore parameter space by changing the values stored in the param_dict dictionary attached to the model, and then calling the populate method bound to model.mock. Any changes to the param_dict of the model will automatically propagate into the behavior of the populate method.

Examples

We’ll use a pre-built HOD-style model to demonstrate basic usage. The same syntax applies to subhalo-based models.

>>> from halotools.empirical_models import PrebuiltHodModelFactory
>>> model_instance = PrebuiltHodModelFactory('zheng07')

Here we will use a fake simulation, but you can populate mocks using any instance of CachedHaloCatalog or UserSuppliedHaloCatalog.

>>> from halotools.sim_manager import FakeSim
>>> halocat = FakeSim()
>>> model_instance.populate_mock(halocat)

Your model_instance now has a mock attribute bound to it. You can call the populate method bound to the mock, which will repopulate the halo catalog with a new Monte Carlo realization of the model.

>>> model_instance.mock.populate()

If you want to change the behavior of your model, just change the values stored in the param_dict. Differences in the parameter values will change the behavior of the mock-population.

>>> model_instance.param_dict['logMmin'] = 12.1
>>> model_instance.mock.populate()
update_param_dict_decorator(component_model, func_name)[source] [edit on github]

Decorator used to propagate any possible changes in the composite model param_dict down to the appropriate component model param_dict.

The behavior of the methods bound to the composite model are decorated versions of the methods defined in the component models. The decoration is done with update_param_dict_decorator. For each function that gets bound to the composite model, what this decorator does is search the param_dict of the component_model associated with the function, and update all matching keys in that param_dict with the param_dict of the composite. This way, all the user needs to do is make changes to the composite model param_dict. Then, when calling any method of the composite model, the changed values of the param_dict automatically propagate down to the component model before calling upon its behavior. This allows the composite_model to control behavior of functions that it does not define.

Parameters:

component_model : obj

Instance of the component model in which the behavior of the function is defined.

func_name : string

Name of the method in the component model whose behavior is being decorated.

Returns:

decorated_func : function

Function object whose behavior is identical to the behavior of the function in the component model, except that the component model param_dict is first updated with any possible changes to corresponding parameters in the composite model param_dict.