# Formatting your xyz coordinates for Mock Observables calculations¶

The `mock_observables`

package adopts a specific convention for
how its functions accept spatial coordinate inputs.
If you have a collection of *Npts* coordinates for either *Ndim=2* or *Ndim=3*,
the convention is that you will pass a multi-dimensional Numpy array
of shape *(Npts, Ndim)* storing the coordinates.
All the `mock_observables`

functions that operate on multi-dimensional data
follow this convention. For example,
`tpcf`

, `void_prob_func`

and `mean_delta_sigma`

all accept data formatted as
`ndarray`

of shape *(Npts, 3)*, while `angular_tpcf`

accepts
a `ndarray`

of shape *(Npts, 2)*.

## Example of how to transform your coordinates¶

Suppose you have a collection of *x, y, z* arrays
storing the spatial positions of halos or galaxies.

```
>>> Npts = int(1e5)
>>> Lbox = 250
>>> import numpy as np
>>> x = np.random.uniform(0, Lbox, Npts)
>>> y = np.random.uniform(0, Lbox, Npts)
>>> z = np.random.uniform(0, Lbox, Npts)
```

In order to bundle these arrays into the shape of the multi-dimensional array
used by the `mock_observables`

package:

```
>>> pos = np.vstack((x, y, z)).T
```

The `pos`

array is now formatted in a form that can be directly passed, for example,
to the `tpcf`

function as the first positional argument.

If you had two-dimensional data instead:

```
>>> ra = np.random.uniform(0, 2*np.pi, Npts)
>>> dec = np.random.uniform(-np.pi/2., np.pi/2, Npts)
>>> angular_coords = np.vstack((ra, dec)).T
```

The `angular_coords`

array is now formatted in a form that can be directly passed, for example,
to the `angular_tpcf`

function as the first positional argument.

## Using the `return_xyz_formatted_array`

convenience function¶

When using the `mock_observables`

package,
the above transformation is so commonly encountered that there is a convenience function
dedicated to handling it:

```
>>> from halotools.mock_observables import return_xyz_formatted_array
>>> pos = return_xyz_formatted_array(x, y, z)
```

There is no difference between using
`return_xyz_formatted_array`

or `numpy.vstack`

.
However, the `return_xyz_formatted_array`

function comes
with two additional features that are worthy of special mention.

### Applying redshift-space distortions¶

For some science targets, you may wish to apply redshift-space distortions to your
coordinates before computing the observable statistic.
For example, RSD has a very significant impact on galaxy group identification,
and so most applications using the `FoFGroups`

feature
will want to account for this effect.
To do, you can use the `velocity_distortion_dimension`

keyword argument together
with the `velocity`

keyword storing an array with
the peculiar velocity in whatever dimension you want to distort. In the code below,
we’ll apply redshift-space distortions assuming the default cosmology and redshift:

```
>>> velz = np.random.normal(loc=0, scale=100, size=Npts)
>>> pos_zdist = return_xyz_formatted_array(x, y, z, velocity=velz, velocity_distortion_dimension='z')
```

Under the distant-observer approximation,
the `pos_zdist`

array includes the effect of redshift-space distortions,
so that pos_zdist[:, 0] and pos_zdist[:,1] slices
can serve as the directions perpendicular to the line-of-sight,
and pos_zdist[:, 2] the direction parallel to the line-of-sight.

You may wish to use the `return_xyz_formatted_array`

function to apply realistic z-space
distortions for mock galaxy samples “observed” at higher redshift, and/or assuming a different cosmology.
This can be handled using the `redshift`

and/or `cosmology`

keyword arguments:

```
>>> from astropy.cosmology import Planck15
>>> redshift = 0.45
>>> velz = np.random.normal(loc=0, scale=100, size=Npts)
>>> pos_zdist = return_xyz_formatted_array(x, y, z, velocity=velz, velocity_distortion_dimension='z', cosmology=Planck15, redshift=redshift)
```

### Selecting subsamples¶

There is an additional feature of the
`return_xyz_formatted_array`

function
that allows you to retrieve a specific subsample of your coordinates.
Let’s see how this works in a realistic example:
retrieving the spatial positions of quiescent and star-forming samples
in a mock galaxy catalog.

```
>>> from halotools.empirical_models import PrebuiltSubhaloModelFactory
>>> model = PrebuiltSubhaloModelFactory('smhm_binary_sfr')
>>> from halotools.sim_manager import FakeSim
>>> halocat = FakeSim()
>>> model.populate_mock(halocat)
```

Our `model`

now has a `mock`

object attached to it with a `galaxy_table`

storing the mock galaxies in the form of an Astropy `Table`

.

```
>>> x = model.mock.galaxy_table['x']
>>> y = model.mock.galaxy_table['y']
>>> z = model.mock.galaxy_table['z']
```

```
>>> red_sample_mask = model.mock.galaxy_table['quiescent'] == True
>>> red_pos = return_xyz_formatted_array(x, y, z, mask = red_sample_mask)
>>> blue_pos = return_xyz_formatted_array(x, y, z, mask = ~red_sample_mask)
```