crossmatch¶
- halotools.utils.crossmatch(x, y, skip_bounds_checking=False)[source]¶
Finds where the elements of
xappear in the arrayy, including repeats.The elements in x may be repeated, but the elements in y must be unique. The arrays x and y may be only partially overlapping.
The applications of this function envolve cross-matching two catalogs/data tables which share an objectID. For example, if you have a primary data table and a secondary data table containing supplementary information about (some of) the objects, the
crossmatchfunction can be used to “value-add” the primary table with data from the second.For another example, suppose you have a single data table with an object ID column and also a column for a “host” ID column (e.g.,
halo_hostidin Halotools-provided catalogs), you can use thecrossmatchfunction to create new columns storing properties of the associated host.See Creating value-added halo catalogs through cross-matching and Cross-matching galaxy and halo catalogs for tutorials on common usages of this function with halo and galaxy catalogs.
- Parameters:
- xinteger array
Array of integers with possibly repeated entries.
- yinteger array
Array of unique integers.
- skip_bounds_checkingbool, optional
The first step in the
crossmatchfunction is to test that the input arrays satisfy the assumptions of the algorithm (namely thatxandystore integers, and that all values inyare unique). Ifskip_bounds_checkingis set to True, this testing is bypassed and the function evaluates faster. Default is False.
- Returns:
- idx_xinteger array
Integer array used to apply a mask to x such that x[idx_x] == y[idx_y]
- y_idxinteger array
Integer array used to apply a mask to y such that x[idx_x] == y[idx_y]
See also
Notes
The matching between
xandyis done on the sorted arrays. A consequence of this is that x[idx_x] and y[idx_y] will generally be a subset ofxandyin sorted order.Examples
Let’s create some fake data to demonstrate basic usage of the function. First, let’s suppose we have two tables of objects,
table1andtable2. There are no repeated elements in any table, but these tables only partially overlap. The example below demonstrates how to transfer column data fromtable2intotable1for the subset of objects that appear in both tables.>>> num_table1 = int(1e6) >>> x = np.random.rand(num_table1) >>> objid = np.arange(num_table1) >>> from astropy.table import Table >>> table1 = Table({'x': x, 'objid': objid})
>>> num_table2 = int(1e6) >>> objid = np.arange(5e5, num_table2+5e5) >>> y = np.random.rand(num_table2) >>> table2 = Table({'y': y, 'objid': objid})
Note that
table1andtable2only partially overlap. In the code below, we will initialize a newycolumn fortable1, and for those rows with anobjidthat appears in bothtable1andtable2, we’ll transfer the values ofyfromtable2totable1.>>> idx_table1, idx_table2 = crossmatch(table1['objid'].data, table2['objid'].data) >>> table1['y'] = np.zeros(len(table1), dtype = table2['y'].dtype) >>> table1['y'][idx_table1] = table2['y'][idx_table2]
Now we’ll consider a slightly more complicated example in which there are repeated entries in the input array
x. Suppose in this case that our dataxcomes with a natural grouping, for example into those galaxies that occupy a common halo. If we have a separate tableythat stores attributes of the group, we may wish to broadcast some group property such as total group mass amongst all the group members.First create some new dummy data to demonstrate this application of the
crossmatchfunction:>>> num_galaxies = int(1e6) >>> x = np.random.rand(num_galaxies) >>> objid = np.arange(num_galaxies) >>> num_groups = int(1e4) >>> groupid = np.random.randint(0, num_groups, num_galaxies) >>> galaxy_table = Table({'x': x, 'objid': objid, 'groupid': groupid})
>>> groupmass = np.random.rand(num_groups) >>> groupid = np.arange(num_groups) >>> group_table = Table({'groupmass': groupmass, 'groupid': groupid})
Now we use the
crossmatchto paint the appropriate value ofgroupmassonto each galaxy:>>> idx_galaxies, idx_groups = crossmatch(galaxy_table['groupid'].data, group_table['groupid'].data) >>> galaxy_table['groupmass'] = np.zeros(len(galaxy_table), dtype = group_table['groupmass'].dtype) >>> galaxy_table['groupmass'][idx_galaxies] = group_table['groupmass'][idx_groups]
See the tutorials for additional demonstrations of alternative uses of the
crossmatchfunction.