calculate_satellite_selection_mask

halotools.empirical_models.calculate_satellite_selection_mask(subhalo_hostids, satellite_occupations, host_halo_ids, host_halo_bin_numbers, fill_remaining_satellites=True, seed=None, testing_mode=False, min_required_entries_per_bin=None)[source]

Function driving the selection of subhalos during HOD mock population. Given a catalog of subhalos, host halos and a desired number of satellites in each host, the calculate_satellite_selection_mask function can be used to calculate the indices used to select subhalos to serve as satellites.

In the situation in which a host halo does not have as many subhalos as the desired number of satellites, a subhalo within the same bin as that host (e.g., a subhalo with a similar host mass) will be randomly selected to serve as the satellite; we will refer to such satellites as orphans. Here, the input host_halo_bin_numbers determines which host halos are grouped together into the same bin.

The returned array is a length-Nsats array storing the indices of the subhalo_hostids that were selected. In case special treatment of the orphan satellites is desired, the calculate_satellite_selection_mask function also returns a boolean array that can be used as a mask to identify the orphans.

Parameters:
subhalo_hostidsarray

Integer array of length Nsubs storing the id of the associated host halo. subhalo_hostids may have repeated values and must be in ascending order.

satellite_occupationsarray

Integer array of length Nhosts storing the desired number of satellites in each host halo.

host_halo_idsarray

Integer array of length Nhosts storing each host halo’s unique id, typically the halo_id column in a Halotools-formatted catalog.

host_halo_bin_numbersarray

Integer array of length Nhosts storing the bin number of each host halo, e.g., the returned value of np.digitize(host_halo_masses, mass_bins).

fill_remaining_satellitesbool, optional

To address cases where a host halo has fewer subhalos than the desired number of satellites, the indices of randomly selected subhalos from the same host mass bin will be selected provided that fill_remaining_satellites is set to True. If fill_remaining_satellites is instead set to False, then the value -1 will be returned for all such entries, permitting an alternative special treatment of such cases (such as drawing from an NFW profile). Default is True.

seedinteger, optional

Random number seed used when drawing random numbers with numpy.random. Useful when deterministic results are desired, such as during unit-testing. Default is None, producing stochastic results.

testing_modebool, optional

Boolean specifying whether input arrays will be tested to see if they satisfy the assumptions required by the algorithm. Setting testing_mode to True is useful for unit-testing purposes, while setting it to False improves performance. Default is False.

Returns:
satellite_selection_indicesarray

Integer array of indices that can act as a mask to select subhalos.

If fill_remaining_satellites is set to False, then some values of satellite_selection_indices may be -1.

missing_subhalo_maskarray

Boolean array that can be used to select the indices corresponding to satellites with no true subhalo in the associated host halo. This situation occurs whenever and entry of desired_occupations exceeds the number of subhalos in that host halo. Thus if fill_remaining_satellites is set to False, then all values of satellite_selection_indices[missing_subhalo_mask] will be equal to -1.

min_required_entries_per_binint, optional

Minimum requirement on the number of subhalos in each bin. Default is set by the random_indices_within_bin function.

Notes

Every bin of host halos must contain enough subhalos to draw from, or the function will raise an exception. If this occurs, you will either need to choose wider bins and/or use a subhalo catalog that is more densely populated.

Examples

We’ll demonstrate basic usage here using a halo catalog taken from a FakeSim object, which means we’ll have to do a fair amount of work to arrange the memory layout into the required form. When calculate_satellite_selection_mask is used as part of a Halotools model, this organization is typically accomplished in an automated fashion during a pre-processing phase of mock population.

>>> from halotools.sim_manager import FakeSim
>>> halocat = FakeSim()

The calculate_satellite_selection_mask algorithm requires that every entry of the input subhalo_hostids has a matching entry in the input host_halo_ids array. To address this, we will mask out those rare subhalos with no matching host halo (this situation occurs in <0.1% for typical Rockstar catalogs).

>>> matched_mask = np.in1d(halocat.halo_table['halo_hostid'], halocat.halo_table['halo_id'])
>>> halos = halocat.halo_table[matched_mask]

Now we will sort the catalog by the sorting_keys list.

>>> halos['negative_halo_vpeak'] = -halos['halo_vpeak']
>>> sorting_keys = ['halo_mvir_host_halo', 'halo_hostid', 'halo_upid', 'negative_halo_vpeak']
>>> halos.sort(sorting_keys)

Our halo catalog is now sorted in ascending order of halo_mvir_host_halo. Because the second entry of our sorting_keys is halo_hostid, then within each bin of host halo mass, halos and subhalos with the same halo_hostid will be grouped together. Since halo_upid is -1 for host halos and a positive long integer for subhalos, then choosing halo_upid as our third sorting_key entry, then within each host-sub system the host halo will appear first. Finally, the subhalos in each host will be arranged in descending order of halo_vpeak - this ensures that subhalos with particularly large halo_vpeak will be preferentially selected to serve as satellites.

Now we separate host halos from subhalos (preserving the above memory layout), and define the arrays we’ll use as inputs to the calculate_satellite_selection_mask function.

>>> host_halo_mask = halos['halo_upid'] == -1
>>> hosts = halos[host_halo_mask]
>>> subhalos = halos[~host_halo_mask]
>>> mass_bins = np.logspace(9.9, 16.1, 5)
>>> host_halo_bin_numbers = np.digitize(hosts['halo_mvir'].data, mass_bins)
>>> subhalo_hostids = subhalos['halo_hostid'].data
>>> satellite_occupations = np.random.randint(0, 5, len(hosts))
>>> host_halo_ids = hosts['halo_id'].data

With our arrays so defined, we call the calculate_satellite_selection_mask function and demonstrate that it does indeed return a result that can serve as an indexing array providing us with the correct total number of satellites.

>>> result = calculate_satellite_selection_mask(subhalo_hostids, satellite_occupations, host_halo_ids, host_halo_bin_numbers, testing_mode=True)
>>> satellite_selection_indices, missing_subhalo_mask = result
>>> selected_subhalos = subhalos[satellite_selection_indices]
>>> assert len(selected_subhalos) == satellite_occupations.sum()