Formatting your xyz coordinates for Mock Observables calculations

The mock_observables package adopts a specific convention for how its functions accept spatial coordinate inputs. If you have a collection of Npts coordinates for either Ndim=2 or Ndim=3, the convention is that you will pass a multi-dimensional Numpy array of shape (Npts, Ndim) storing the coordinates. All the mock_observables functions that operate on multi-dimensional data follow this convention. For example, tpcf, void_prob_func and mean_delta_sigma all accept data formatted as ndarray of shape (Npts, 3), while angular_tpcf accepts a ndarray of shape (Npts, 2).

Example of how to transform your coordinates

Suppose you have a collection of x, y, z arrays storing the spatial positions of halos or galaxies.

>>> Npts = int(1e5)
>>> Lbox = 250
>>> import numpy as np
>>> x = np.random.uniform(0, Lbox, Npts)
>>> y = np.random.uniform(0, Lbox, Npts)
>>> z = np.random.uniform(0, Lbox, Npts)

In order to bundle these arrays into the shape of the multi-dimensional array used by the mock_observables package:

>>> pos = np.vstack((x, y, z)).T

The pos array is now formatted in a form that can be directly passed, for example, to the tpcf function as the first positional argument.

If you had two-dimensional data instead:

>>> ra = np.random.uniform(0, 2*np.pi, Npts)
>>> dec = np.random.uniform(-np.pi/2., np.pi/2, Npts)
>>> angular_coords = np.vstack((ra, dec)).T

The angular_coords array is now formatted in a form that can be directly passed, for example, to the angular_tpcf function as the first positional argument.

Using the return_xyz_formatted_array convenience function

When using the mock_observables package, the above transformation is so commonly encountered that there is a convenience function dedicated to handling it:

>>> from halotools.mock_observables import return_xyz_formatted_array
>>> pos = return_xyz_formatted_array(x, y, z)

There is no difference between using return_xyz_formatted_array or numpy.vstack. However, the return_xyz_formatted_array function comes with two additional features that are worthy of special mention.

Applying redshift-space distortions

For some science targets, you may wish to apply redshift-space distortions to your coordinates before computing the observable statistic. For example, RSD has a very significant impact on galaxy group identification, and so most applications using the FoFGroups feature will want to account for this effect. To do, you can use the velocity_distortion_dimension keyword argument together with the velocity keyword storing an array with the peculiar velocity in whatever dimension you want to distort. In the code below, we’ll apply redshift-space distortions assuming the default cosmology and redshift:

>>> velz = np.random.normal(loc=0, scale=100, size=Npts)
>>> pos_zdist = return_xyz_formatted_array(x, y, z, velocity=velz, velocity_distortion_dimension='z')

Under the distant-observer approximation, the pos_zdist array includes the effect of redshift-space distortions, so that pos_zdist[:, 0] and pos_zdist[:,1] slices can serve as the directions perpendicular to the line-of-sight, and pos_zdist[:, 2] the direction parallel to the line-of-sight.

You may wish to use the return_xyz_formatted_array function to apply realistic z-space distortions for mock galaxy samples “observed” at higher redshift, and/or assuming a different cosmology. This can be handled using the redshift and/or cosmology keyword arguments:

>>> from astropy.cosmology import Planck15
>>> redshift = 0.45
>>> velz = np.random.normal(loc=0, scale=100, size=Npts)
>>> pos_zdist = return_xyz_formatted_array(x, y, z, velocity=velz, velocity_distortion_dimension='z', cosmology=Planck15, redshift=redshift)

Selecting subsamples

There is an additional feature of the return_xyz_formatted_array function that allows you to retrieve a specific subsample of your coordinates. Let’s see how this works in a realistic example: retrieving the spatial positions of quiescent and star-forming samples in a mock galaxy catalog.

>>> from halotools.empirical_models import PrebuiltSubhaloModelFactory
>>> model = PrebuiltSubhaloModelFactory('smhm_binary_sfr')
>>> from halotools.sim_manager import FakeSim
>>> halocat = FakeSim()
>>> model.populate_mock(halocat)

Our model now has a mock object attached to it with a galaxy_table storing the mock galaxies in the form of an Astropy Table.

>>> x = model.mock.galaxy_table['x']
>>> y = model.mock.galaxy_table['y']
>>> z = model.mock.galaxy_table['z']
>>> red_sample_mask = model.mock.galaxy_table['quiescent'] == True
>>> red_pos = return_xyz_formatted_array(x, y, z, mask = red_sample_mask)
>>> blue_pos = return_xyz_formatted_array(x, y, z, mask = ~red_sample_mask)