Creating value-added halo catalogs through cross-matching¶
All halo catalogs come with an integer ID column providing a unique
identifier of the (sub)halo in the catalog. This tutorial demonstrates
two different examples of how you can use the
crossmatch
function to exploit this column to create
“value-added” versions of your halo catalogs.
In Example 1, we’ll show how to combine information from two partially
overlapping halo catalogs. In Example 2, we’ll show how to create new
columns for a subhalo catalog storing the properties of the host halo,
e.g., host mass \(M_{\rm vir}^{\rm host}\).
For a closely related tutorial, see Cross-matching galaxy and halo catalogs.
Example 1: Combining information from different halo catalogs¶
When analyzing halo catalogs, it’s a common situation for you to have
two different versions of a halo catalog,
one with halo properties that you wish to transfer to the other.
In general, the two versions may only partially overlap,
as different cuts may have have been applied to the catalogs.
We’ll demonstrate this scenario using the FakeSim
halo catalog that is randomly generated on-the-fly, but the
same calculation applies equally well to real halo catalogs,
or generally any structured data table with an object ID.
>>> from halotools.sim_manager import FakeSim
>>> halocat1 = FakeSim()
>>> halo_table1 = halocat1.halo_table
>>> halocat2 = FakeSim()
>>> mask = halocat2.halo_table['halo_mvir'] > 1e11
>>> halo_table2 = halocat2.halo_table[mask]
Now let’s add some new column information to halo_table2
and use the crossmatch
function to transfer
this information to halo_table1
. This function returns the indices
providing the correspondence between the rows in halo_table1
that have
matches in halo_table2
.
>>> import numpy as np
>>> halo_table2['some_new_column'] = np.random.random(len(halo_table2))
The halo catalog column halo_id
is a Long giving a unique identifier
to every halo and subhalo in the halo catalog, so we can use that column
to match one object to the other.
>>> from halotools.utils import crossmatch
>>> halo_table1['transferred_column'] = np.zeros(len(halo_table1), dtype = halo_table2['some_new_column'].dtype)
>>> idx_table1, idx_table2 = crossmatch(halo_table1['halo_id'], halo_table2['halo_id'])
>>> halo_table1['transferred_column'][idx_table1] = halo_table2['some_new_column'][idx_table2]
Now for those objects in halo_table1
that are also in halo_table2
,
the values from the some_new_column
column will be stored in the
transferred_column
; rows without a matching entry will still be set to their
initial value of zero.
Example 2: Transferring host halo properties to their subhalos¶
When analyzing catalogs that include subhalos, one very common kind of calculation
that is done over and over is to group subhalos according to some property of the
host halo, such as host halo mass. Such calculations become easy when there is a
column in your data table storing the associated host halo property,
and in order to create such a column, you need to cross-match the
halo_id
column against the halo_hostid
column.
As described in Rockstar halo and subhalo nomenclature conventions, for the case of subhalos,
the halo_hostid
column points to the halo_id
of the host halo.
So we use the crossmatch
function to add new columns to
the halo catalog such that some property of the host halo is transferred onto
all of its subhalos.
>>> halocat = FakeSim()
>>> t = halocat.halo_table
>>> idx_table1, idx_table2 = crossmatch(t['halo_hostid'], t['halo_id'])
>>> t['host_halo_mvir'] = t['halo_mvir'] # initialize the new column
>>> t['host_halo_mvir'][idx_table1] = t[idx_table2]['halo_mvir']