# Calculating the Sum of Halo Progenitor Masses¶

This section of the documentation describes how to use the `crossmatch` and `group_member_generator` utility functions to analyze subhalo merger trees. Many more complicated analyses of merger trees can be built upon by matching the basic patterns shown here, which speed up naive algorithms by several orders of magnitude.

Halos gain mass by a combination of merges and smooth accretion. If you have two catalogs of subhalos at successive snapshots, `subhalos_z0` and `subhalos_z1`, and if the catalog at the earlier timestep `subhalos_z1` contains a column specifying the halo ID that each subhalo descends into, then there is sufficient information to compute the sum of the progenitor masses for every object in `subhalos_z0`. The naive algorithm for this calculation is just a double for loop with a blind lookup at every step, which quickly becomes prohibitively slow for subhalo catalogs of modern simulations. The `crossmatch` and `group_member_generator` utility functions speed up this calculation considerably, as demonstrated below.

First we create some fake data for demonstration purposes. In the setup below, the `subhalos_z1` catalog is from the snapshot immediately prior to the `subhalos_z0` catalog. The desc_id column stores the halo_id that each `subhalos_z1` descends into; the same desc_id can appear multiple times in the `subhalos_z1` catalog, and there need not be a matching halo_id in the `subhalos_z0` catalog.

```>>> from astropy.table import Table
>>> import numpy as np
```
```>>> subhalos_z0 = Table()
>>> num_subhalos_z0 = 47893
>>> subhalos_z0['halo_id'] = np.arange(num_subhalos_z0).astype('i8')
```
```>>> subhalos_z1 = Table()
>>> num_subhalos_z1 = 58105
>>> subhalos_z1['halo_id'] = np.arange(num_subhalos_z0, num_subhalos_z0+num_subhalos_z1).astype('i8')
>>> subhalos_z1['desc_id'] = np.random.randint(0, 2*num_subhalos_z0, num_subhalos_z1)
>>> subhalos_z1['halo_mass'] = np.random.uniform(1e10, 1e15, num_subhalos_z1)
```

Now sort the subhalos in the earlier snapshot so that `subhalos_z1` with a common descendant are grouped together, and build the `group_member_generator` so that it yields the mass of the progenitor halos with each iteration.

```>>> from halotools.utils import group_member_generator
>>> subhalos_z1.sort('desc_id')
>>> grouping_key = 'desc_id'
>>> requested_columns = ['halo_mass']
>>> group_gen = group_member_generator(subhalos_z1, grouping_key, requested_columns)
```

Now we iterate over the newly created generator:

```>>> sum_of_coprogenitor_masses = np.zeros(num_subhalos_z1)
>>> for first, last, member_props in group_gen:
...    masses = member_props[0]
...    sum_of_coprogenitor_masses[first:last] = np.sum(masses)
>>> subhalos_z1['coprogenitor_mass_sum'] = sum_of_coprogenitor_masses
```

In the above loop, there is one step of the loop for each unique desc_id that appears in `subhalos_z1`, and at each new step, all `subhalos_z1` subhalos associated with that descendant are yielded (including the main progenitor mass). The array sum_of_coprogenitor_masses now stores the total mass of the descendant grouping associated with each subhalo in the earlier timestep. Now we use the `crossmatch` function to broadcast these results down into the descendant halos.

```>>> from halotools.utils import crossmatch
>>> idxA, idxB = crossmatch(subhalos_z1['desc_id'], subhalos_z0['halo_id'])
>>> subhalos_z0['sum_of_progenitor_masses'] = 0.
>>> subhalos_z0['sum_of_progenitor_masses'][idxB] = subhalos_z1['coprogenitor_mass_sum'][idxA]
```

In the above calculation, the way we set up the fake data, the descendant of every `subhalos_z1` halo did not necessarily appear in the `subhalos_z0` catalog. We can verify this using the `crossmatch` function as follows:

```>>> subhalos_z1['has_match'] = False
>>> subhalos_z1['has_match'][idxA] = True
>>> assert not np.all(subhalos_z1['has_match'] == True)
```

That did not impact our final calculation because of the way `crossmatch` works: the indexing array `idxA` has no entries corresponding to `subhalos_z1` with no matching descendant.

Now let’s ask a slightly more complicated question, and exclude the main progenitor mass from the sum. This will tell us how much mass each `subhalos_z0` gained as a result of merging from distinct subhalos. We’ll do this by first sorting each desc_id-grouping by mass, and excluding the final row corresponding to the most massive progenitor.

```>>> subhalos_z1.sort(['desc_id', 'halo_mass'])
>>> grouping_key = 'desc_id'
>>> requested_columns = ['halo_mass']
>>> group_gen = group_member_generator(subhalos_z1, grouping_key, requested_columns)
```

Because of the two-variable sort, within each grouping the most-massive progenitor will appear last, which makes it easy to iterate over the generator and exclude the mmp from the sum:

```>>> sum_of_merging_masses_no_mmp = np.zeros(num_subhalos_z1) - 1.
>>> for first, last, member_props in group_gen:
...    masses = member_props[0]
...    sum_of_merging_masses_no_mmp[first:last] = np.sum(masses[:-1])
>>> subhalos_z1['non_mmp_coprogenitor_mass_sum'] = sum_of_merging_masses_no_mmp
```

Just as before, we broadcast the newly-added column down to the descendant halos:

```>>> idxA, idxB = crossmatch(subhalos_z1['desc_id'], subhalos_z0['halo_id'])
>>> subhalos_z0['mass_gain_from_mergers'] = 0.
>>> subhalos_z0['mass_gain_from_mergers'][idxB] = subhalos_z1['non_mmp_coprogenitor_mass_sum'][idxA]
```

For further demonstrations of how to use `group_member_generator`, see Galaxy Catalog Analysis Example: Galaxy properties as a function of halo mass and Halo Catalog Analysis Example: halo properties as a function of host halo mass.