marked_tpcf

halotools.mock_observables.marked_tpcf(sample1, rbins, sample2=None, marks1=None, marks2=None, period=None, do_auto=True, do_cross=True, num_threads=1, weight_func_id=1, normalize_by=u'random_marks', iterations=1, randomize_marks=None, seed=None)[source] [edit on github]

Calculate the real space marked two-point correlation function, \(\mathcal{M}(r)\).

Example calls to this function appear in the documentation below. See the Formatting your xyz coordinates for Mock Observables calculations documentation page for instructions on how to transform your coordinate position arrays into the format accepted by the sample1 and sample2 arguments.

Parameters:

sample1 : array_like

Npts1 x 3 numpy array containing 3-D positions of points. See the Formatting your xyz coordinates for Mock Observables calculations documentation page, or the Examples section below, for instructions on how to transform your coordinate position arrays into the format accepted by the sample1 and sample2 arguments. Length units are comoving and assumed to be in Mpc/h, here and throughout Halotools.

rbins : array_like

array of boundaries defining the real space radial bins in which pairs are counted. Length units are comoving and assumed to be in Mpc/h, here and throughout Halotools.

sample2 : array_like, optional

Npts2 x 3 array containing 3-D positions of points. Passing sample2 as an input permits the calculation of the cross-correlation function. Default is None, in which case only the auto-correlation function will be calculated.

marks1 : array_like, optional

len(sample1) x N_marks array of marks. The supplied marks array must have the appropriate shape for the chosen weight_func_id (see Notes for requirements). If this parameter is not specified, it is set to numpy.ones((len(sample1), N_marks)).

marks2 : array_like, optional

len(sample2) x N_marks array of marks. The supplied marks array must have the appropriate shape for the chosen weight_func_id (see Notes for requirements). If this parameter is not specified, it is set to numpy.ones((len(sample2), N_marks)).

period : array_like, optional

Length-3 sequence defining the periodic boundary conditions in each dimension. If you instead provide a single scalar, Lbox, period is assumed to be the same in all Cartesian directions. If set to None (the default option), PBCs are set to infinity. Length units are comoving and assumed to be in Mpc/h, here and throughout Halotools.

do_auto : boolean, optional

Boolean determines whether the auto-correlation function will be calculated and returned. Default is True.

do_cross : boolean, optional

Boolean determines whether the cross-correlation function will be calculated and returned. Only relevant when sample2 is also provided. Default is True for the case where sample2 is provided, otherwise False.

num_threads : int, optional

Number of threads to use in calculation, where parallelization is performed using the python multiprocessing module. Default is 1 for a purely serial calculation, in which case a multiprocessing Pool object will never be instantiated. A string ‘max’ may be used to indicate that the pair counters should use all available cores on the machine.

weight_func_id : int, optional

Integer ID indicating which marking function should be used. See notes for a list of available marking functions.

normalize_by : string, optional

A string indicating how to normailze the weighted pair counts in the marked correlation function calculation. Options are: ‘random_marks’ or ‘number_counts’. See Notes for more detail.

iterations : int, optional

integer indicating the number of times to calculate the random weights, taking the mean of the outcomes. Only applicable if normalize_by is set to ‘random_marks’. See Notes for further explanation.

randomize_marks : array_like, optional

Boolean array of length N_marks indicating which elements should be randomized when calculating the random weighted pair counts. Default is [True]*N_marks. This parameter is only applicable if normalize_by is set to ‘random_marks’. See Notes for more detail.

seed : int, optional

Random number seed used to shuffle the marks and to randomly downsample data, if applicable. Default is None, in which case downsampling and shuffling will be stochastic.

Returns:

marked_correlation_function(s) : numpy.array

len(rbins)-1 length array containing the marked correlation function \(\mathcal{M}(r)\) computed in each of the bins defined by rbins.

\[\mathcal{M}(r) \equiv \mathrm{WW}(r) / \mathrm{XX}(r),\]

where \(\mathrm{WW}(r)\) is the weighted number of pairs with separations equal to \(r\), and \(\mathrm{XX}(r)\) is dependent on the choice of the normalize_by parameter. If normalize_by is ‘random_marks’ \(XX \equiv \mathcal{RR}\), the weighted pair counts where the marks have been randomized marks. If normalize_by is ‘number_counts’ \(XX \equiv DD\), the unweighted pair counts. See Notes for more detail.

If sample2 is passed as input, three arrays of length len(rbins)-1 are returned:

\[\mathcal{M}_{11}(r), \ \mathcal{M}_{12}(r), \ \mathcal{M}_{22}(r),\]

the autocorrelation of sample1, the cross-correlation between sample1 and sample2, and the autocorrelation of sample2. If do_auto or do_cross is set to False, the appropriate result(s) is not returned.

Notes

Pairs are counted using marked_npairs_3d.

If the period argument is passed in, the ith coordinate of all points must be between 0 and period[i].

normalize_by indicates how to calculate \(\mathrm{XX}\). If normalize_by is ‘random_marks’, then \(\mathrm{XX} \equiv \mathcal{RR}\), and \(\mathcal{RR}\) is calculated by randomizing the marks among points according to the randomize_marks mask. This marked correlation function is then:

\[\mathcal{M}(r) \equiv \frac{\sum_{ij}f(m_i,m_j)}{\sum_{kl}f(m_k,m_l)}\]

where the sum in the numerator is of pairs \(i,j\) with separation \(r\), and marks \(m_i,m_j\). \(f()\) is the marking function, weight_func_id. The sum in the denominator is over an equal number of random pairs \(k,l\). The calculation of this sum can be done multiple times, by setting the iterations parameter. The mean of the sum is then taken amongst iterations and used in the calculation.

If normalize_by is ‘number_counts’, then \(\mathrm{XX} \equiv \mathrm{DD}\) is calculated by counting total number of pairs using npairs_3d. This is:

\[\mathcal{M}(r) \equiv \frac{\sum_{ij}f(m_i,m_j)}{\sum_{ij} 1},\]

There are multiple marking functions available. In general, each requires a different number of marks per point, N_marks. The marking function gets passed two vectors per pair, w1 and w2, of length N_marks and return a float. The available marking functions, weight_func_id and the associated integer ID numbers are:

  1. multiplicaitive weights (N_marks = 1)
    \[f(w_1,w_2) = w_1[0] \times w_2[0]\]
  2. summed weights (N_marks = 1)
    \[f(w_1,w_2) = w_1[0] + w_2[0]\]
  3. equality weights (N_marks = 2)
    \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_1[1]\times w_2[1] & : w_1[0] = w_2[0] \\ 0.0 & : w_1[0] \neq w_2[0] \\ \end{array} \right.\end{split}\]
  4. inequality weights (N_marks = 2)
    \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_1[1]\times w_2[1] & : w_1[0] \neq w_2[0] \\ 0.0 & : w_1[0] = w_2[0] \\ \end{array} \right.\end{split}\]
  5. greater than weights (N_marks = 2)
    \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_1[1]\times w_2[1] & : w_2[0] > w_1[0] \\ 0.0 & : w_2[0] \leq w_1[0] \\ \end{array} \right.\end{split}\]
  6. less than weights (N_marks = 2)
    \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_1[1]\times w_2[1] & : w_2[0] < w_1[0] \\ 0.0 & : w_2[0] \geq w_1[0] \\ \end{array} \right.\end{split}\]
  7. greater than tolerance weights (N_marks = 2)
    \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : w_2[0]>(w_1[0]+w_1[1]) \\ 0.0 & : w_2[0] \leq (w_1[0]+w_1[1]) \\ \end{array} \right.\end{split}\]
  8. less than tolerance weights (N_marks = 2)
    \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : w_2[0]<(w_1[0]+w_1[1]) \\ 0.0 & : w_2[0] \geq (w_1[0]+w_1[1]) \\ \end{array} \right.\end{split}\]
  9. tolerance weights (N_marks = 2)
    \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : |w_1[0]-w_2[0]|<w_1[1] \\ 0.0 & : |w_1[0]-w_2[0]| \geq w_1[1] \\ \end{array} \right.\end{split}\]
  10. exclusion weights (N_marks = 2)
    \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : |w_1[0]-w_2[0]|>w_1[1] \\ 0.0 & : |w_1[0]-w_2[0]| \leq w_1[1] \\ \end{array} \right.\end{split}\]
  11. ratio weights (N_marks = 2)
    \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : w2[0] > w1[0]*w1[1] \\ 0.0 & : otherwise \\ \end{array} \right.\end{split}\]

Examples

For demonstration purposes we create a randomly distributed set of points within a periodic unit cube.

>>> Npts = 1000
>>> Lbox = 1.0
>>> period = np.array([Lbox,Lbox,Lbox])
>>> x = np.random.random(Npts)
>>> y = np.random.random(Npts)
>>> z = np.random.random(Npts)

We transform our x, y, z points into the array shape used by the function by taking the transpose of the result of numpy.vstack. This boilerplate transformation is used throughout the mock_observables sub-package:

>>> coords = np.vstack((x,y,z)).T

Assign random floats in the range [0,1] to the points to use as the marks:

>>> marks = np.random.random(Npts)

Use the multiplicative marking function:

>>> rbins = np.logspace(-2,-1,10)
>>> MCF = marked_tpcf(coords, rbins, marks1=marks, period=period, normalize_by='number_counts', weight_func_id=1)

The result should be consistent with \(\langle {\rm mark}\rangle^2\) at all r within the statistical errors.