group_member_generator¶
- halotools.utils.group_member_generator(data, grouping_key, requested_columns)[source]¶
Generator used to loop over grouped data and yield requested properties of members of a group. When running a for loop over
group_member_generator
, you will be repeatedly sent arrays storing properties of data entries sharing a commongrouping_key
. This enables you to perform whatever intra-group calculation you wish for each iteration through the number of total groups. The generator also sends you the indices of the inputdata
corresponding to the yielded group members, allowing you to create new columns for your data table storing the results of your intra-group calculations.Before calling
group_member_generator
, the inputdata
must be sorted by thegrouping_key
so thatdata[grouping_key]
is monotonic.Common applications of
group_member_generator
include subhalo analysis (e.g., calculating host halo mass) and galaxy group analysis (e.g., calculating total stellar mass or group-centric position). The Examples section below shows basic usage. There are also three tutorials demonstrating common applications in more detail:- Parameters:
- dataStructured Numpy
ndarray
or AstropyTable
- grouping_keystring
Name of the column that defines how the input
data
are grouped, e.g.,group_id
orhalo_hostid
. The inputdata
must be sorted such that the array stored indata[grouping_key]
is monotonic.- requested_columnslist of strings
List of column names that will be yielded by the generator. As you loop over the generator, for every string entry in
requested_columns
there will be an array that is yielded. It is permissible forrequested_columns
to be an empty list, in which case thegroup_data_list
yielded at each iteration will also be an empty list.
- dataStructured Numpy
- Returns:
- first_idx, last_idxint
These two integers provide the indices of the rows of the input
data
yielded at each iteration.- group_data_listlist
List of arrays storing the requested group member properties. There will be one element of
group_data_list
for every element of the inputrequested_columns
. Each element is a Numpyndarray
with a length equal to the number of members of the group.
Examples
First let’s retrieve a Halotools-formatted halo catalog storing some randomly generated data.
>>> from halotools.sim_manager import FakeSim >>> halocat = FakeSim() >>> halos = halocat.halo_table
As described in Rockstar halo and subhalo nomenclature conventions, the
halo_hostid
is a natural grouping key for a halo table. Let’s use this key to calculate the host halo mass of all halos in the data table.First we build the generator:
>>> halos.sort(['halo_hostid', 'halo_upid']) >>> grouping_key = 'halo_hostid' >>> requested_columns = ['halo_mvir'] >>> group_gen = group_member_generator(halos, grouping_key, requested_columns)
Then we loop over it:
>>> result = np.zeros(len(halos)) >>> for first, last, member_props in group_gen: ... masses = member_props[0] ... host_mass = masses[0] ... result[first:last] = host_mass >>> halos['halo_mvir_host_halo'] = result
Inside the scope of the loop, the first two yielded integers allow us to access the appropriate slice of the array being calculated. The
member_props
list only stores a single element, the masses array storing the value ofhalo_mvir
of each member of the host + subhalo system. Because we have sorted the halos by bothhalo_hostid
andhalo_upid
, then within eachhalo_hostid
grouping, the host system will appear first because -1 is smaller than any value forhalo_upid
stored by a subhalo. Thus by selecting the first element of the masses array, we select the virial mass of the host halo.We can also use the
group_member_generator
to compute more complicated quantities. For example, let’s calculate the mean mass-weighted spin of all halo members. Note that our halo table is already sorted, so we save CPU time by not re-sorting it.>>> grouping_key = 'halo_hostid' >>> requested_columns = ['halo_mvir', 'halo_spin'] >>> group_gen = group_member_generator(halos, grouping_key, requested_columns)
>>> result = np.zeros(len(halos)) >>> for first, last, member_props in group_gen: ... masses = member_props[0] ... spins = member_props[1] ... mass_weighted_avg_spin = np.sum(masses*spins)/float(len(masses)) ... result[first:last] = mass_weighted_avg_spin >>> halos['halo_mass_weighted_avg_spin'] = result