Galaxy Catalog Analysis Example: Galaxy group identification

In this example, we’ll show how to start from a sample of mock galaxies and identify galaxy groups in an observationally realistic manner.

There is also an IPython Notebook in the following location that can be used as a companion to the material in this section of the tutorial:

halotools/docs/notebooks/galcat_analysis/basic_examples/galaxy_catalog_analysis_tutorial5.ipynb

By following this tutorial together with this notebook, you can play around with your own variations of the calculation as you learn the basic syntax.

Generate a mock galaxy catalog

Let’s start out by generating a mock galaxy catalog into an N-body simulation in the usual way. Here we’ll assume you have the z=0 rockstar halos for the bolshoi simulation, as this is the default halo catalog.

from halotools.empirical_models import PrebuiltSubhaloModelFactory
model = PrebuiltSubhaloModelFactory('behroozi10')
from halotools.sim_manager import CachedHaloCatalog
halocat = CachedHaloCatalog(simname = 'bolshoi', redshift = 0, halo_finder = 'rockstar')
model.populate_mock(halocat)

Our mock galaxies are stored in the galaxy_table of model.mock in the form of an Astropy Table.

Extract subsamples of galaxy positions

Galaxy group identification is conducted by the FoFGroups class. As described in Formatting your xyz coordinates for Mock Observables calculations, functions in the mock_observables package such FoFGroups take array inputs in a specific form: a (Npts, 3)-shape Numpy array. You can use the return_xyz_formatted_array convenience function for this purpose, which we will do after first throwing out galaxies below a stellar mass completeness cut of \(M_{\ast} > 10^{10.75}M_{\odot}\) for illustration purposes.

sample_mask = model.mock.galaxy_table['stellar_mass'] > 10**10.75
galaxy_sample = model.mock.galaxy_table[sample_mask]
x = galaxy_sample['x']
y = galaxy_sample['y']
z = galaxy_sample['z']
vz = galaxy_sample['vz']

from halotools.mock_observables import return_xyz_formatted_array

pos = return_xyz_formatted_array(x, y, z,
            velocity=vz, velocity_distortion_dimension = 'z')

Note that in the above code we have also applied redshift-space distortions to the positions, as this has an important effect on galaxy group identification.

Identify FoF galaxy groups

In order to identify FoF groups, it is necessary to choose a linking length in both the line-of-sight and transverse directions. These linking lengths are specified by \(b_{\rm para}\) and \(b_{\rm perp}\), respectively. We will choose values based on Berlind et al 2006, which were optimized to minimize bias in the group multiplicity function, but you can set these to any values you like.

from halotools.mock_observables import FoFGroups

b_para, b_perp = 0.7, 0.15
groups = FoFGroups(pos, b_perp, b_para,
                      Lbox = model.mock.Lbox, num_threads='max')

galaxy_sample['fof_group_id'] = groups.group_ids

The galaxy_sample storing the mock now has a column storing the group IDs as they would have been found in real observational data, including redshift-space distortion effects.

Determine group centrals and satellites

In the following calculation, we’ll use the group_member_generator in order to group our galaxy table together according to the FoF-determined group, and then calculate a few quantities based on group membership. You can read about the group_member_generator in its docstring. You may also find it useful to review Galaxy Catalog Analysis Example: Galaxy properties as a function of halo mass for usage similar to what follows, only with more exposition.

Here we perform another two-column sort. First, the galaxies are sorted by their FoF group ID, and then within each grouping, they are sorted by \(-M_{\ast}\), which will place the most massive galaxy first within each FoF group.

from halotools.utils import group_member_generator

galaxy_sample['negative_stellar_mass'] = -1*galaxy_sample['stellar_mass']
galaxy_sample.sort(['fof_group_id', 'negative_stellar_mass'])
grouping_key = 'fof_group_id'
requested_columns = []

group_gen = group_member_generator(galaxy_sample, grouping_key, requested_columns)

group_central = np.zeros(len(galaxy_sample), dtype=bool)
for first, last, member_props in group_gen:
    temp_result = [False for member in xrange(first, last)]
    temp_result[0] = True
    group_central[first:last] = temp_result

galaxy_sample['group_central'] = group_central

Let’s inspect our results

print(galaxy_sample[0:15])
halo_upid  halo_mpeak  halo_x ... stellar_mass fof_group_id group_central
---------- ---------- ------- ... ------------ ------------ -------------
        -1  2.549e+12 20.8524 ...  1.47289e+11            0          True
        -1  8.513e+11 20.6768 ...  8.08962e+10            0         False
        -1  1.237e+12 22.3349 ...  2.40317e+11            1          True
3058440575  3.237e+12 21.9039 ...  1.56945e+11            1         False
        -1  1.144e+14  21.812 ...  1.41576e+11            1         False
3058440575   1.39e+12 22.3782 ...  8.50839e+10            1         False
3058440575  1.163e+12 21.7744 ...  8.11016e+10            1         False
3058440575   2.56e+12 21.9585 ...  7.28933e+10            1         False
        -1  9.709e+13 26.1803 ...  4.35889e+11            2          True
        -1  1.869e+12 25.4072 ...   9.4756e+10            2         False
        -1  8.876e+11 25.4922 ...  9.16859e+10            2         False
        -1  1.373e+12  25.946 ...  9.06732e+10            2         False
3058441456  2.926e+13 25.6703 ...  1.56814e+11            3          True
3058441456  8.404e+11 26.3213 ...  8.19847e+10            4          True
        -1  4.076e+13 23.7934 ...  1.71193e+11            5          True

Calculating group richness \(N_{\rm group}\)

Now we’ll use the same generator to calculate the total number of members in each FoF group.

grouping_key = 'fof_group_id'
requested_columns = []

group_gen = group_member_generator(galaxy_sample, grouping_key, requested_columns)

group_richness = np.zeros(len(galaxy_sample), dtype=int)
for first, last, member_props in group_gen:
    group_richness[first:last] = last-first
galaxy_sample['group_richness'] = group_richness
print(galaxy_sample[0:15])
halo_upid  halo_mpeak  halo_x ... fof_group_id group_central group_richness
---------- ---------- ------- ... ------------ ------------- --------------
        -1  2.549e+12 20.8524 ...            0          True              2
        -1  8.513e+11 20.6768 ...            0         False              2
        -1  1.237e+12 22.3349 ...            1          True              6
3058440575  3.237e+12 21.9039 ...            1         False              6
        -1  1.144e+14  21.812 ...            1         False              6
3058440575   1.39e+12 22.3782 ...            1         False              6
3058440575  1.163e+12 21.7744 ...            1         False              6
3058440575   2.56e+12 21.9585 ...            1         False              6
        -1  9.709e+13 26.1803 ...            2          True              4
        -1  1.869e+12 25.4072 ...            2         False              4
        -1  8.876e+11 25.4922 ...            2         False              4
        -1  1.373e+12  25.946 ...            2         False              4
3058441456  2.926e+13 25.6703 ...            3          True              1
3058441456  8.404e+11 26.3213 ...            4          True              1
        -1  4.076e+13 23.7934 ...            5          True              1

Calculate true halo mass of group central \(M_{\rm cen}^{\rm true}\)

galaxy_sample.sort(['fof_group_id', 'negative_stellar_mass'])
grouping_key = 'fof_group_id'
requested_columns = ['halo_mvir_host_halo']
group_gen = group_member_generator(galaxy_sample, grouping_key, requested_columns)

group_central_true_mvir = np.zeros(len(galaxy_sample))
for first, last, member_props in group_gen:
    member_masses = member_props[0]
    true_mass = member_masses[0]
    group_central_true_mvir[first:last] = true_mass

galaxy_sample['group_central_true_mvir'] = group_central_true_mvir