Calculating the Sum of Halo Progenitor Masses¶
This section of the documentation describes how to use the crossmatch
and group_member_generator
utility functions to analyze subhalo merger trees.
Many more complicated analyses of merger trees can be built upon by
matching the basic patterns shown here, which speed up naive algorithms by
several orders of magnitude.
Halos gain mass by a combination of merges and smooth accretion. If you have two
catalogs of subhalos at successive snapshots, subhalos_z0
and subhalos_z1
,
and if the catalog at the earlier timestep subhalos_z1
contains a column specifying the halo ID that each subhalo descends into, then
there is sufficient information to compute the sum of the progenitor masses
for every object in subhalos_z0
. The naive algorithm for this calculation is just a
double for loop with a blind lookup at every step, which quickly becomes prohibitively slow
for subhalo catalogs of modern simulations. The crossmatch
and group_member_generator
utility functions speed up this calculation considerably,
as demonstrated below.
First we create some fake data for demonstration purposes. In the setup below, the subhalos_z1
catalog is from the snapshot immediately prior to the subhalos_z0
catalog. The desc_id column stores the halo_id that each subhalos_z1
descends into; the same desc_id can appear multiple times in the subhalos_z1
catalog, and there need not be a matching halo_id in the subhalos_z0
catalog.
>>> from astropy.table import Table
>>> import numpy as np
>>> subhalos_z0 = Table()
>>> num_subhalos_z0 = 47893
>>> subhalos_z0['halo_id'] = np.arange(num_subhalos_z0).astype('i8')
>>> subhalos_z1 = Table()
>>> num_subhalos_z1 = 58105
>>> subhalos_z1['halo_id'] = np.arange(num_subhalos_z0, num_subhalos_z0+num_subhalos_z1).astype('i8')
>>> subhalos_z1['desc_id'] = np.random.randint(0, 2*num_subhalos_z0, num_subhalos_z1)
>>> subhalos_z1['halo_mass'] = np.random.uniform(1e10, 1e15, num_subhalos_z1)
Now sort the subhalos in the earlier snapshot so that subhalos_z1
with a common descendant are grouped together, and build the group_member_generator
so that it yields the mass of the progenitor halos with each iteration.
>>> from halotools.utils import group_member_generator
>>> subhalos_z1.sort('desc_id')
>>> grouping_key = 'desc_id'
>>> requested_columns = ['halo_mass']
>>> group_gen = group_member_generator(subhalos_z1, grouping_key, requested_columns)
Now we iterate over the newly created generator:
>>> sum_of_coprogenitor_masses = np.zeros(num_subhalos_z1)
>>> for first, last, member_props in group_gen:
... masses = member_props[0]
... sum_of_coprogenitor_masses[first:last] = np.sum(masses)
>>> subhalos_z1['coprogenitor_mass_sum'] = sum_of_coprogenitor_masses
In the above loop, there is one step of the loop for each unique desc_id that appears in subhalos_z1
,
and at each new step, all subhalos_z1
subhalos associated with that descendant are yielded (including the main progenitor mass).
The array sum_of_coprogenitor_masses now stores the total mass of the descendant grouping
associated with each subhalo in the earlier timestep. Now we use the crossmatch
function
to broadcast these results down into the descendant halos.
>>> from halotools.utils import crossmatch
>>> idxA, idxB = crossmatch(subhalos_z1['desc_id'], subhalos_z0['halo_id'])
>>> subhalos_z0['sum_of_progenitor_masses'] = 0.
>>> subhalos_z0['sum_of_progenitor_masses'][idxB] = subhalos_z1['coprogenitor_mass_sum'][idxA]
In the above calculation, the way we set up the fake data, the descendant of every subhalos_z1
halo did not necessarily appear in the subhalos_z0
catalog. We can verify this using the crossmatch
function as follows:
>>> subhalos_z1['has_match'] = False
>>> subhalos_z1['has_match'][idxA] = True
>>> assert not np.all(subhalos_z1['has_match'] == True)
That did not impact our final calculation because of the way crossmatch
works: the indexing array idxA
has no entries corresponding to subhalos_z1
with no matching descendant.
Now let’s ask a slightly more complicated question, and exclude the main progenitor mass from the sum. This will tell us how much mass each subhalos_z0
gained as a result of merging from distinct subhalos. We’ll do this by first sorting each desc_id-grouping by mass, and excluding the final row corresponding to the most massive progenitor.
>>> subhalos_z1.sort(['desc_id', 'halo_mass'])
>>> grouping_key = 'desc_id'
>>> requested_columns = ['halo_mass']
>>> group_gen = group_member_generator(subhalos_z1, grouping_key, requested_columns)
Because of the two-variable sort, within each grouping the most-massive progenitor will appear last, which makes it easy to iterate over the generator and exclude the mmp from the sum:
>>> sum_of_merging_masses_no_mmp = np.zeros(num_subhalos_z1) - 1.
>>> for first, last, member_props in group_gen:
... masses = member_props[0]
... sum_of_merging_masses_no_mmp[first:last] = np.sum(masses[:-1])
>>> subhalos_z1['non_mmp_coprogenitor_mass_sum'] = sum_of_merging_masses_no_mmp
Just as before, we broadcast the newly-added column down to the descendant halos:
>>> idxA, idxB = crossmatch(subhalos_z1['desc_id'], subhalos_z0['halo_id'])
>>> subhalos_z0['mass_gain_from_mergers'] = 0.
>>> subhalos_z0['mass_gain_from_mergers'][idxB] = subhalos_z1['non_mmp_coprogenitor_mass_sum'][idxA]
For further demonstrations of how to use group_member_generator
, see Galaxy Catalog Analysis Example: Galaxy properties as a function of halo mass and Halo Catalog Analysis Example: halo properties as a function of host halo mass.