marked_tpcf¶
- halotools.mock_observables.marked_tpcf(sample1, rbins, sample2=None, marks1=None, marks2=None, period=None, do_auto=True, do_cross=True, num_threads=1, weight_func_id=1, normalize_by='random_marks', iterations=1, randomize_marks=None, seed=None)[source]¶
Calculate the real space marked two-point correlation function, \(\mathcal{M}(r)\).
Example calls to this function appear in the documentation below. See the Formatting your xyz coordinates for Mock Observables calculations documentation page for instructions on how to transform your coordinate position arrays into the format accepted by the
sample1
andsample2
arguments.- Parameters:
- sample1array_like
Npts1 x 3 numpy array containing 3-D positions of points. See the Formatting your xyz coordinates for Mock Observables calculations documentation page, or the Examples section below, for instructions on how to transform your coordinate position arrays into the format accepted by the
sample1
andsample2
arguments. Length units are comoving and assumed to be in Mpc/h, here and throughout Halotools.- rbinsarray_like
array of boundaries defining the real space radial bins in which pairs are counted. Length units are comoving and assumed to be in Mpc/h, here and throughout Halotools.
- sample2array_like, optional
Npts2 x 3 array containing 3-D positions of points. Passing
sample2
as an input permits the calculation of the cross-correlation function. Default is None, in which case only the auto-correlation function will be calculated.- marks1array_like, optional
len(sample1) x N_marks array of marks. The supplied marks array must have the appropriate shape for the chosen
weight_func_id
(see Notes for requirements). If this parameter is not specified, it is set to numpy.ones((len(sample1), N_marks)).- marks2array_like, optional
len(sample2) x N_marks array of marks. The supplied marks array must have the appropriate shape for the chosen
weight_func_id
(see Notes for requirements). If this parameter is not specified, it is set to numpy.ones((len(sample2), N_marks)).- periodarray_like, optional
Length-3 sequence defining the periodic boundary conditions in each dimension. If you instead provide a single scalar, Lbox, period is assumed to be the same in all Cartesian directions. If set to None (the default option), PBCs are set to infinity. Length units are comoving and assumed to be in Mpc/h, here and throughout Halotools.
- do_autoboolean, optional
Boolean determines whether the auto-correlation function will be calculated and returned. Default is True.
- do_crossboolean, optional
Boolean determines whether the cross-correlation function will be calculated and returned. Only relevant when
sample2
is also provided. Default is True for the case wheresample2
is provided, otherwise False.- num_threadsint, optional
Number of threads to use in calculation, where parallelization is performed using the python
multiprocessing
module. Default is 1 for a purely serial calculation, in which case a multiprocessing Pool object will never be instantiated. A string ‘max’ may be used to indicate that the pair counters should use all available cores on the machine.- weight_func_idint, optional
Integer ID indicating which marking function should be used. See notes for a list of available marking functions.
- normalize_bystring, optional
A string indicating how to normailze the weighted pair counts in the marked correlation function calculation. Options are: ‘random_marks’ or ‘number_counts’. See Notes for more detail.
- iterationsint, optional
integer indicating the number of times to calculate the random weights, taking the mean of the outcomes. Only applicable if
normalize_by
is set to ‘random_marks’. See Notes for further explanation.- randomize_marksarray_like, optional
Boolean array of length N_marks indicating which elements should be randomized when calculating the random weighted pair counts. Default is [True]*N_marks. This parameter is only applicable if
normalize_by
is set to ‘random_marks’. See Notes for more detail.- seedint, optional
Random number seed used to shuffle the marks and to randomly downsample data, if applicable. Default is None, in which case downsampling and shuffling will be stochastic.
- Returns:
- marked_correlation_function(s)numpy.array
len(rbins)-1 length array containing the marked correlation function \(\mathcal{M}(r)\) computed in each of the bins defined by
rbins
.\[\mathcal{M}(r) \equiv \mathrm{WW}(r) / \mathrm{XX}(r),\]where \(\mathrm{WW}(r)\) is the weighted number of pairs with separations equal to \(r\), and \(\mathrm{XX}(r)\) is dependent on the choice of the
normalize_by
parameter. Ifnormalize_by
is ‘random_marks’ \(XX \equiv \mathcal{RR}\), the weighted pair counts where the marks have been randomized marks. Ifnormalize_by
is ‘number_counts’ \(XX \equiv DD\), the unweighted pair counts. See Notes for more detail.If
sample2
is passed as input, three arrays of length len(rbins)-1 are returned:\[\mathcal{M}_{11}(r), \ \mathcal{M}_{12}(r), \ \mathcal{M}_{22}(r),\]the autocorrelation of
sample1
, the cross-correlation betweensample1
andsample2
, and the autocorrelation ofsample2
. Ifdo_auto
ordo_cross
is set to False, the appropriate result(s) is not returned.
Notes
Pairs are counted using
marked_npairs_3d
.If the
period
argument is passed in, the ith coordinate of all points must be between 0 and period[i].normalize_by
indicates how to calculate \(\mathrm{XX}\). Ifnormalize_by
is ‘random_marks’, then \(\mathrm{XX} \equiv \mathcal{RR}\), and \(\mathcal{RR}\) is calculated by randomizing the marks among points according to therandomize_marks
mask. This marked correlation function is then:\[\mathcal{M}(r) \equiv \frac{\sum_{ij}f(m_i,m_j)}{\sum_{kl}f(m_k,m_l)}\]where the sum in the numerator is of pairs \(i,j\) with separation \(r\), and marks \(m_i,m_j\). \(f()\) is the marking function,
weight_func_id
. The sum in the denominator is over an equal number of random pairs \(k,l\). The calculation of this sum can be done multiple times, by setting theiterations
parameter. The mean of the sum is then taken amongst iterations and used in the calculation.If
normalize_by
is ‘number_counts’, then \(\mathrm{XX} \equiv \mathrm{DD}\) is calculated by counting total number of pairs usingnpairs_3d
. This is:\[\mathcal{M}(r) \equiv \frac{\sum_{ij}f(m_i,m_j)}{\sum_{ij} 1},\]There are multiple marking functions available. In general, each requires a different number of marks per point, N_marks. The marking function gets passed two vectors per pair, w1 and w2, of length N_marks and return a float. The available marking functions,
weight_func_id
and the associated integer ID numbers are:- multiplicaitive weights (N_marks = 1)
- \[f(w_1,w_2) = w_1[0] \times w_2[0]\]
- summed weights (N_marks = 1)
- \[f(w_1,w_2) = w_1[0] + w_2[0]\]
- equality weights (N_marks = 2)
- \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_1[1]\times w_2[1] & : w_1[0] = w_2[0] \\ 0.0 & : w_1[0] \neq w_2[0] \\ \end{array} \right.\end{split}\]
- inequality weights (N_marks = 2)
- \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_1[1]\times w_2[1] & : w_1[0] \neq w_2[0] \\ 0.0 & : w_1[0] = w_2[0] \\ \end{array} \right.\end{split}\]
- greater than weights (N_marks = 2)
- \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_1[1]\times w_2[1] & : w_2[0] > w_1[0] \\ 0.0 & : w_2[0] \leq w_1[0] \\ \end{array} \right.\end{split}\]
- less than weights (N_marks = 2)
- \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_1[1]\times w_2[1] & : w_2[0] < w_1[0] \\ 0.0 & : w_2[0] \geq w_1[0] \\ \end{array} \right.\end{split}\]
- greater than tolerance weights (N_marks = 2)
- \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : w_2[0]>(w_1[0]+w_1[1]) \\ 0.0 & : w_2[0] \leq (w_1[0]+w_1[1]) \\ \end{array} \right.\end{split}\]
- less than tolerance weights (N_marks = 2)
- \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : w_2[0]<(w_1[0]+w_1[1]) \\ 0.0 & : w_2[0] \geq (w_1[0]+w_1[1]) \\ \end{array} \right.\end{split}\]
- tolerance weights (N_marks = 2)
- \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : |w_1[0]-w_2[0]|<w_1[1] \\ 0.0 & : |w_1[0]-w_2[0]| \geq w_1[1] \\ \end{array} \right.\end{split}\]
- exclusion weights (N_marks = 2)
- \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : |w_1[0]-w_2[0]|>w_1[1] \\ 0.0 & : |w_1[0]-w_2[0]| \leq w_1[1] \\ \end{array} \right.\end{split}\]
- ratio weights (N_marks = 2)
- \[\begin{split}f(w_1,w_2) = \left \{ \begin{array}{ll} w_2[1] & : w2[0] > w1[0]*w1[1] \\ 0.0 & : otherwise \\ \end{array} \right.\end{split}\]
Examples
For demonstration purposes we create a randomly distributed set of points within a periodic unit cube.
>>> Npts = 1000 >>> Lbox = 1.0 >>> period = np.array([Lbox,Lbox,Lbox])
>>> x = np.random.random(Npts) >>> y = np.random.random(Npts) >>> z = np.random.random(Npts)
We transform our x, y, z points into the array shape used by the function by taking the transpose of the result of
numpy.vstack
. This boilerplate transformation is used throughout themock_observables
sub-package:>>> coords = np.vstack((x,y,z)).T
Assign random floats in the range [0,1] to the points to use as the marks:
>>> marks = np.random.random(Npts)
Use the multiplicative marking function:
>>> rbins = np.logspace(-2,-1,10) >>> MCF = marked_tpcf(coords, rbins, marks1=marks, period=period, normalize_by='number_counts', weight_func_id=1)
The result should be consistent with \(\langle {\rm mark}\rangle^2\) at all r within the statistical errors.