calculate_satellite_selection_mask¶
- halotools.empirical_models.calculate_satellite_selection_mask(subhalo_hostids, satellite_occupations, host_halo_ids, host_halo_bin_numbers, fill_remaining_satellites=True, seed=None, testing_mode=False, min_required_entries_per_bin=None)[source]¶
Function driving the selection of subhalos during HOD mock population. Given a catalog of subhalos, host halos and a desired number of satellites in each host, the
calculate_satellite_selection_mask
function can be used to calculate the indices used to select subhalos to serve as satellites.In the situation in which a host halo does not have as many subhalos as the desired number of satellites, a subhalo within the same bin as that host (e.g., a subhalo with a similar host mass) will be randomly selected to serve as the satellite; we will refer to such satellites as orphans. Here, the input
host_halo_bin_numbers
determines which host halos are grouped together into the same bin.The returned array is a length-Nsats array storing the indices of the
subhalo_hostids
that were selected. In case special treatment of the orphan satellites is desired, thecalculate_satellite_selection_mask
function also returns a boolean array that can be used as a mask to identify the orphans.- Parameters:
- subhalo_hostidsarray
Integer array of length Nsubs storing the id of the associated host halo.
subhalo_hostids
may have repeated values and must be in ascending order.- satellite_occupationsarray
Integer array of length Nhosts storing the desired number of satellites in each host halo.
- host_halo_idsarray
Integer array of length Nhosts storing each host halo’s unique id, typically the
halo_id
column in a Halotools-formatted catalog.- host_halo_bin_numbersarray
Integer array of length Nhosts storing the bin number of each host halo, e.g., the returned value of np.digitize(host_halo_masses, mass_bins).
- fill_remaining_satellitesbool, optional
To address cases where a host halo has fewer subhalos than the desired number of satellites, the indices of randomly selected subhalos from the same host mass bin will be selected provided that
fill_remaining_satellites
is set to True. Iffill_remaining_satellites
is instead set to False, then the value -1 will be returned for all such entries, permitting an alternative special treatment of such cases (such as drawing from an NFW profile). Default is True.- seedinteger, optional
Random number seed used when drawing random numbers with
numpy.random
. Useful when deterministic results are desired, such as during unit-testing. Default is None, producing stochastic results.- testing_modebool, optional
Boolean specifying whether input arrays will be tested to see if they satisfy the assumptions required by the algorithm. Setting
testing_mode
to True is useful for unit-testing purposes, while setting it to False improves performance. Default is False.
- Returns:
- satellite_selection_indicesarray
Integer array of indices that can act as a mask to select subhalos.
If
fill_remaining_satellites
is set to False, then some values ofsatellite_selection_indices
may be -1.- missing_subhalo_maskarray
Boolean array that can be used to select the indices corresponding to satellites with no true subhalo in the associated host halo. This situation occurs whenever and entry of
desired_occupations
exceeds the number of subhalos in that host halo. Thus iffill_remaining_satellites
is set to False, then all values of satellite_selection_indices[missing_subhalo_mask] will be equal to -1.- min_required_entries_per_binint, optional
Minimum requirement on the number of subhalos in each bin. Default is set by the
random_indices_within_bin
function.
Notes
Every bin of host halos must contain enough subhalos to draw from, or the function will raise an exception. If this occurs, you will either need to choose wider bins and/or use a subhalo catalog that is more densely populated.
Examples
We’ll demonstrate basic usage here using a halo catalog taken from a
FakeSim
object, which means we’ll have to do a fair amount of work to arrange the memory layout into the required form. Whencalculate_satellite_selection_mask
is used as part of a Halotools model, this organization is typically accomplished in an automated fashion during a pre-processing phase of mock population.>>> from halotools.sim_manager import FakeSim >>> halocat = FakeSim()
The
calculate_satellite_selection_mask
algorithm requires that every entry of the inputsubhalo_hostids
has a matching entry in the inputhost_halo_ids
array. To address this, we will mask out those rare subhalos with no matching host halo (this situation occurs in <0.1% for typical Rockstar catalogs).>>> matched_mask = np.in1d(halocat.halo_table['halo_hostid'], halocat.halo_table['halo_id']) >>> halos = halocat.halo_table[matched_mask]
Now we will sort the catalog by the
sorting_keys
list.>>> halos['negative_halo_vpeak'] = -halos['halo_vpeak'] >>> sorting_keys = ['halo_mvir_host_halo', 'halo_hostid', 'halo_upid', 'negative_halo_vpeak'] >>> halos.sort(sorting_keys)
Our halo catalog is now sorted in ascending order of
halo_mvir_host_halo
. Because the second entry of oursorting_keys
ishalo_hostid
, then within each bin of host halo mass, halos and subhalos with the samehalo_hostid
will be grouped together. Sincehalo_upid
is -1 for host halos and a positive long integer for subhalos, then choosinghalo_upid
as our thirdsorting_key
entry, then within each host-sub system the host halo will appear first. Finally, the subhalos in each host will be arranged in descending order ofhalo_vpeak
- this ensures that subhalos with particularly largehalo_vpeak
will be preferentially selected to serve as satellites.Now we separate host halos from subhalos (preserving the above memory layout), and define the arrays we’ll use as inputs to the
calculate_satellite_selection_mask
function.>>> host_halo_mask = halos['halo_upid'] == -1 >>> hosts = halos[host_halo_mask] >>> subhalos = halos[~host_halo_mask] >>> mass_bins = np.logspace(9.9, 16.1, 5) >>> host_halo_bin_numbers = np.digitize(hosts['halo_mvir'].data, mass_bins) >>> subhalo_hostids = subhalos['halo_hostid'].data >>> satellite_occupations = np.random.randint(0, 5, len(hosts)) >>> host_halo_ids = hosts['halo_id'].data
With our arrays so defined, we call the
calculate_satellite_selection_mask
function and demonstrate that it does indeed return a result that can serve as an indexing array providing us with the correct total number of satellites.>>> result = calculate_satellite_selection_mask(subhalo_hostids, satellite_occupations, host_halo_ids, host_halo_bin_numbers, testing_mode=True) >>> satellite_selection_indices, missing_subhalo_mask = result >>> selected_subhalos = subhalos[satellite_selection_indices] >>> assert len(selected_subhalos) == satellite_occupations.sum()