crossmatch¶
- halotools.utils.crossmatch(x, y, skip_bounds_checking=False)[source]¶
Finds where the elements of
x
appear in the arrayy
, including repeats.The elements in x may be repeated, but the elements in y must be unique. The arrays x and y may be only partially overlapping.
The applications of this function envolve cross-matching two catalogs/data tables which share an objectID. For example, if you have a primary data table and a secondary data table containing supplementary information about (some of) the objects, the
crossmatch
function can be used to “value-add” the primary table with data from the second.For another example, suppose you have a single data table with an object ID column and also a column for a “host” ID column (e.g.,
halo_hostid
in Halotools-provided catalogs), you can use thecrossmatch
function to create new columns storing properties of the associated host.See Creating value-added halo catalogs through cross-matching and Cross-matching galaxy and halo catalogs for tutorials on common usages of this function with halo and galaxy catalogs.
- Parameters:
- xinteger array
Array of integers with possibly repeated entries.
- yinteger array
Array of unique integers.
- skip_bounds_checkingbool, optional
The first step in the
crossmatch
function is to test that the input arrays satisfy the assumptions of the algorithm (namely thatx
andy
store integers, and that all values iny
are unique). Ifskip_bounds_checking
is set to True, this testing is bypassed and the function evaluates faster. Default is False.
- Returns:
- idx_xinteger array
Integer array used to apply a mask to x such that x[idx_x] == y[idx_y]
- y_idxinteger array
Integer array used to apply a mask to y such that x[idx_x] == y[idx_y]
See also
Notes
The matching between
x
andy
is done on the sorted arrays. A consequence of this is that x[idx_x] and y[idx_y] will generally be a subset ofx
andy
in sorted order.Examples
Let’s create some fake data to demonstrate basic usage of the function. First, let’s suppose we have two tables of objects,
table1
andtable2
. There are no repeated elements in any table, but these tables only partially overlap. The example below demonstrates how to transfer column data fromtable2
intotable1
for the subset of objects that appear in both tables.>>> num_table1 = int(1e6) >>> x = np.random.rand(num_table1) >>> objid = np.arange(num_table1) >>> from astropy.table import Table >>> table1 = Table({'x': x, 'objid': objid})
>>> num_table2 = int(1e6) >>> objid = np.arange(5e5, num_table2+5e5) >>> y = np.random.rand(num_table2) >>> table2 = Table({'y': y, 'objid': objid})
Note that
table1
andtable2
only partially overlap. In the code below, we will initialize a newy
column fortable1
, and for those rows with anobjid
that appears in bothtable1
andtable2
, we’ll transfer the values ofy
fromtable2
totable1
.>>> idx_table1, idx_table2 = crossmatch(table1['objid'].data, table2['objid'].data) >>> table1['y'] = np.zeros(len(table1), dtype = table2['y'].dtype) >>> table1['y'][idx_table1] = table2['y'][idx_table2]
Now we’ll consider a slightly more complicated example in which there are repeated entries in the input array
x
. Suppose in this case that our datax
comes with a natural grouping, for example into those galaxies that occupy a common halo. If we have a separate tabley
that stores attributes of the group, we may wish to broadcast some group property such as total group mass amongst all the group members.First create some new dummy data to demonstrate this application of the
crossmatch
function:>>> num_galaxies = int(1e6) >>> x = np.random.rand(num_galaxies) >>> objid = np.arange(num_galaxies) >>> num_groups = int(1e4) >>> groupid = np.random.randint(0, num_groups, num_galaxies) >>> galaxy_table = Table({'x': x, 'objid': objid, 'groupid': groupid})
>>> groupmass = np.random.rand(num_groups) >>> groupid = np.arange(num_groups) >>> group_table = Table({'groupmass': groupmass, 'groupid': groupid})
Now we use the
crossmatch
to paint the appropriate value ofgroupmass
onto each galaxy:>>> idx_galaxies, idx_groups = crossmatch(galaxy_table['groupid'].data, group_table['groupid'].data) >>> galaxy_table['groupmass'] = np.zeros(len(galaxy_table), dtype = group_table['groupmass'].dtype) >>> galaxy_table['groupmass'][idx_galaxies] = group_table['groupmass'][idx_groups]
See the tutorials for additional demonstrations of alternative uses of the
crossmatch
function.