ModelFactory¶

class halotools.empirical_models.ModelFactory(input_model_dictionary, **kwargs)[source]¶

Bases: object

Abstract container class used to build any composite model of the galaxy-halo connection.

See SubhaloModelFactory for subhalo-based models, and HodModelFactory for HOD-style models.

Parameters:

input_model_dictionarydict: dictionary providing instructions for how to build the composite model from a set of components.
galaxy_selection_funcfunction object, optional: Function object that imposes a cut on the mock galaxies. Function should take a length-k Astropy table as a single positional argument, and return a length-k numpy boolean array that will be treated as a mask over the rows of the table. If not None, the mask defined by galaxy_selection_func will be applied to the galaxy_table after the table is generated by the populate_mock method. Default is None.
halo_selection_funcfunction object, optional: Function object used to place a cut on the input table. If the halo_selection_func keyword argument is passed, the input to the function must be a single positional argument storing a length-N structured numpy array or Astropy table; the function output must be a length-N boolean array that will be used as a mask. Halos that are masked will be entirely neglected during mock population.

Methods Summary

`compute_average_galaxy_clustering`([...])	Method repeatedly populates a simulation with a mock galaxy catalog, computes the clustering signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of clustering measurements.
`compute_average_galaxy_matter_cross_clustering`([...])	Method repeatedly populates a simulation with a mock galaxy catalog, computes the galaxy-matter cross-correlation signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of repeated measurements.
`populate_mock`(halocat[, Num_ptcl_requirement])	Method used to populate a simulation with a Monte Carlo realization of a model.
`update_param_dict_decorator`(component_model, ...)	Decorator used to propagate any possible changes in the composite model param_dict down to the appropriate component model param_dict.

Methods Documentation

compute_average_galaxy_clustering(num_iterations=5, summary_statistic='median', **kwargs)[source]¶

Method repeatedly populates a simulation with a mock galaxy catalog, computes the clustering signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of clustering measurements.

The compute_average_galaxy_clustering is simply a convenience function, and is not intended for use in performance-critical applications such as MCMCs. In an MCMC, there is no need to repeatedly populate the same snapshot with the same set of model parameters; the primary purpose for this repetition is for smoothing out numerical noise when making plots and doing exploratory work. If you wish to use the 3d correlation function in a performance-critical application, see Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d for a demonstration of how to call the tpcf function once, directly on the mock galaxy catalog.

Parameters:

num_iterationsint, optional: Number of Monte Carlo realizations to use to estimate the clustering signal. Default is 5.
summary_statisticstring, optional: String specifying the method used to estimate the clustering signal from the collection of Monte Carlo realizations. Options are median and mean. Default is median.
simnamestring, optional: Nickname of the simulation into which mock galaxies will be populated. Currently supported simulations are Bolshoi (simname = bolshoi), Consuelo (simname = consuelo), MultiDark (simname = multidark), and Bolshoi-Planck (simname = bolplanck). Default is set in sim_defaults.
halo_finderstring, optional: Nickname of the halo-finder of the halocat into which mock galaxies will be populated, e.g., rockstar or bdm. Default is set in sim_defaults.
redshiftfloat, optional: Redshift of the desired halocat into which mock galaxies will be populated. Default is set in sim_defaults.
variable_galaxy_maskscalar, optional: Any value used to construct a mask to select a sub-population of mock galaxies. See examples below.
mask_functionarray, optional: Function object returning a masking array when operating on the galaxy_table. More flexible than the simpler variable_galaxy_mask option because mask_function allows for the possibility of multiple simultaneous cuts. See examples below.
include_crosscorrbool, optional: Only for simultaneous use with a variable_galaxy_mask-determined mask. If include_crosscorr is set to False (the default option), method will return the auto-correlation function of the subsample of galaxies determined by the input variable_galaxy_mask. If include_crosscorr is True, method will return the auto-correlation of the subsample, the cross-correlation of the subsample and the complementary subsample, and the the auto-correlation of the complementary subsample, in that order. See examples below.
rbinsarray, optional: Bins in which the correlation function will be calculated. Default is set in model_defaults module.

Returns:

rbin_centersarray

Midpoint of the bins used in the correlation function calculation

correlation_funcarray

If not using any mask (the default option), method returns the correlation function of the full mock galaxy catalog.

If using a mask, and if include_crosscorr is False (the default option), method returns the correlation function of the subsample of galaxies determined by the input mask.

If using a mask, and if include_crosscorr is True, method will return the auto-correlation of the subsample, the cross-correlation of the subsample and the complementary subsample, and the the auto-correlation of the complementary subsample, in that order. See the example below.

Notes

The compute_average_galaxy_clustering method bound to mock instances is just a convenience wrapper around the tpcf function. If you wish for greater control over how your galaxy clustering signal is estimated, see the tpcf documentation.

Note that there can be no guarantees that the compute_average_galaxy_clustering method bound to your model will terminate in a reasonable amount of time. For example, if you use a subhalo-based model that populates every subhalo in the catalog with a mock galaxy, then calling compute_average_galaxy_clustering on this model will attempt to compute a correlation function on hundreds of millions of points. In such cases, you are better off calling the populate_mock method and then calling the tpcf after placing a cut on the galaxy_table, as demonstrated in Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d.

Examples

The simplest use-case of the compute_average_galaxy_clustering function is just to call the function with no arguments. This will generate a sequence of Monte Carlo realizations of your model into the default halocat, calculate the two-point correlation function of all galaxies in your mock, and return the median clustering strength in each radial bin:

>>> model = Leauthaud11()
>>> r, clustering = model.compute_average_galaxy_clustering()

To control how which simulation is used, you use the same syntax you use to load a CachedHaloCatalog into memory from your cache directory:

>>> r, clustering = model.compute_average_galaxy_clustering(simname = 'multidark', redshift=1)

You can control the number of mock catalogs that are generated via:

>>> r, clustering = model.compute_average_galaxy_clustering(num_iterations = 10)

You may wish to focus on the clustering signal for a specific subpopulation. To do this, you have two options. First, you can use the variable_galaxy_mask mechanism:

>>> r, clustering = model.compute_average_galaxy_clustering(gal_type = 'centrals')

With the variable_galaxy_mask mechanism, you are free to use any column of your galaxy_table as a keyword argument. If you couple this function call with the include_crosscorr keyword argument, the function will also return all auto- and cross-correlations of the subset and its complement:

>>> r, cen_cen, cen_sat, sat_sat = model.compute_average_galaxy_clustering(gal_type = 'centrals', include_crosscorr = True)

compute_average_galaxy_matter_cross_clustering(num_iterations=5, summary_statistic='median', **kwargs)[source]¶

Method repeatedly populates a simulation with a mock galaxy catalog, computes the galaxy-matter cross-correlation signal of each Monte Carlo realization, and returns a summary statistic of the clustering such as the median computed from the collection of repeated measurements.

The compute_average_galaxy_matter_cross_clustering is simply a convenience function, and is not intended for use in performance-critical applications such as MCMCs. In an MCMC, there is no need to repeatedly populate the same snapshot with the same set of model parameters; the primary purpose for this repetition is for smoothing out numerical noise when making plots and doing exploratory work. If you wish to use the 3d cross-correlation function in a performance-critical application, see Galaxy Catalog Analysis Example: Calculating galaxy clustering in 3d for a demonstration of how to call the tpcf function once, directly on the mock galaxy catalog, and then refer to the tpcf docstring for how to use the cross-correlation feature.

Parameters:

num_iterationsint, optional: Number of Monte Carlo realizations to use to estimate the clustering signal. Default is 5.
summary_statisticstring, optional: String specifying the method used to estimate the clustering signal from the collection of Monte Carlo realizations. Options are median and mean. Default is median.
simnamestring, optional: Nickname of the simulation into which mock galaxies will be populated. Currently supported simulations are Bolshoi (simname = bolshoi), Consuelo (simname = consuelo), MultiDark (simname = multidark), and Bolshoi-Planck (simname = bolplanck). Default is set in sim_defaults.
halo_finderstring, optional: Nickname of the halo-finder of the halocat into which mock galaxies will be populated, e.g., rockstar or bdm. Default is set in sim_defaults.
redshiftfloat, optional: Redshift of the desired halocat into which mock galaxies will be populated. Default is set in sim_defaults.
variable_galaxy_maskscalar, optional: Any value used to construct a mask to select a sub-population of mock galaxies. See examples below.
mask_functionarray, optional: Function object returning a masking array when operating on the galaxy_table. More flexible than the simpler variable_galaxy_mask option because mask_function allows for the possibility of multiple simultaneous cuts. See examples below.
include_complementbool, optional: Only for simultaneous use with a variable_galaxy_mask-determined mask. If include_complement is set to False (the default option), method will return the cross-correlation function between a random downsampling of dark matter particles and the subsample of galaxies determined by the input variable_galaxy_mask. If include_complement is True, method will also return the cross-correlation between the dark matter particles and the complementary subsample. See examples below.
rbinsarray, optional: Bins in which the correlation function will be calculated. Default is set in model_defaults module.

Returns:

rbin_centersarray

Midpoint of the bins used in the correlation function calculation

correlation_funcarray

If not using any mask (the default option), method returns the correlation function of the full mock galaxy catalog.

If using a mask, and if include_crosscorr is False (the default option), method returns the correlation function of the subsample of galaxies determined by the input mask.

If using a mask, and if include_crosscorr is True, method will return the auto-correlation of the subsample, the cross-correlation of the subsample and the complementary subsample, and the the auto-correlation of the complementary subsample, in that order. See the example below.

Notes

The compute_average_galaxy_matter_cross_clustering method bound to mock instances is just a convenience wrapper around the tpcf function. If you wish for greater control over how your galaxy clustering signal is estimated, see the tpcf documentation.

Note that there can be no guarantees that the compute_average_galaxy_matter_cross_clustering method bound to your model will terminate in a reasonable amount of time. For example, if you use a subhalo-based model that populates every subhalo in the catalog with a mock galaxy, then calling compute_average_galaxy_matter_cross_clustering on this model will attempt to compute a correlation function on hundreds of millions of points. In such cases, you are better off calling the populate_mock method and then calling the tpcf after placing a cut on the galaxy_table, as demonstrated in Galaxy Catalog Analysis Example: Galaxy-galaxy lensing. The only difference between this use-case and the one demonstrated in the tutorial is that here you will use the tpcf to calculate the cross-correlation between dark matter particles and galaxies, rather than calling the mean_delta_sigma function.

Examples

The simplest use-case of the compute_average_galaxy_matter_cross_clustering function is just to call the function with no arguments. This will generate a sequence of Monte Carlo realizations of your model into the default halocat, calculate the cross-correlation function between dark matter and all galaxies in your mock, and return the median clustering strength in each radial bin:

>>> model = Leauthaud11()
>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering()

To control how which simulation is used, you use the same syntax you use to load a CachedHaloCatalog into memory from your cache directory:

>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering(simname = 'multidark', redshift=1)

You can control the number of mock catalogs that are generated via:

>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering(num_iterations = 10)

You may wish to focus on the clustering signal for a specific subpopulation. To do this, you have two options. First, you can use the variable_galaxy_mask mechanism:

>>> r, clustering = model.compute_average_galaxy_matter_cross_clustering(gal_type = 'centrals')

With the variable_galaxy_mask mechanism, you are free to use any column of your galaxy_table as a keyword argument. If you couple this function call with the include_complement keyword argument, the function will also return the correlation function of the complementary subset.

>>> r, cen_clustering, sat_clustering = model.compute_average_galaxy_matter_cross_clustering(gal_type = 'centrals', include_complement = True)

populate_mock(halocat, Num_ptcl_requirement=300, **kwargs)[source]¶

Method used to populate a simulation with a Monte Carlo realization of a model.

After calling this method, the model instance will have a new mock attribute. You can then access the galaxy population via model.mock.galaxy_table, an Astropy Table.

For documentation specific to the populate_mock method of subhalo-based models, see halotools.empirical_models.SubhaloModelFactory.populate_mock; for HOD-style models see halotools.empirical_models.HodModelFactory.populate_mock.

See the Tutorial on the mock-making algorithms section of the documentation for an in-depth description of the Halotools source-code implementation of mock galaxy population.

Parameters:

halocatobject: Either an instance of CachedHaloCatalog or UserSuppliedHaloCatalog.
Num_ptcl_requirementint, optional: Requirement on the number of dark matter particles in the halo. The column defined by the halo_mass_column_key string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default value is set in Num_ptcl_requirement. Currently only supported for instances of HodModelFactory.
halo_mass_column_keystring, optional: This string must be a column of the input halo catalog. The column defined by this string will have a cut placed on it: all halos with halocat.halo_table[halo_mass_column_key] < Num_ptcl_requirement*halocat.particle_mass will be thrown out immediately after reading the original halo catalog in memory. Default is ‘halo_mvir’. Currently only supported for instances of HodModelFactory.
masking_functionfunction, optional: Function object used to place a mask on the halo table prior to calling the mock generating functions. Calling signature of the function should be to accept a single positional argument storing a table, and returning a boolean numpy array that will be used as a fancy indexing mask. All masked halos will be ignored during mock population. Default is None.
enforce_PBCbool, optional: If set to True, after galaxy positions are assigned the model_helpers.enforce_periodicity_of_box will re-map satellite galaxies whose positions spilled over the edge of the periodic box. Default is True. This variable should only ever be set to False when using the masking_function to populate a specific spatial subvolume, as in that case PBCs no longer apply. Currently only supported for instances of HodModelFactory.
seedint, optional: Random number seed used in the Monte Carlo realization. Default is None, which will produce stochastic results.

Navigation

ModelFactory¶