distribution_matching_indices

halotools.utils.distribution_matching_indices(input_distribution, output_distribution, nselect, bins, seed=None)[source]

Calcuate a set of indices that will resample (with replacement) input_distribution so that it matches output_distribution.

This function is useful, for example, for comparing a pair of samples with matching stellar mass functions.

Parameters:
input_distributionndarray

Numpy array of shape (npts1, ) storing the distribution that requires modification

output_distributionndarray

Numpy array of shape (npts2, ) defining the desired output distribution

nselectint

Number of points to select from input_distribution.

binsndarray

Binning used to estimate the PDFs. Default is 100 bins automatically determined by numpy.histogram.

seedint, optional

Random number seed used to generate indices. Default is None for stochastic results.

Returns:
indicesndarray

Numpy array of shape (nselect, ) storing indices ranging from [0, npts1) such that input_distribution[indices] will have a PDF that matches the PDF of output_distribution.

Notes

Pay careful attention that your bins are appropriate for your two distributions. The PDF of the returned result will only match the output_distribution PDF tabulated in the input bins. Depending on the two distributions and your choice of bins, may not be possible to construct matching PDFs if your sampling is too sparse or your bins are inappropriate.

Examples

>>> npts1, npts2 = int(1e5), int(1e4)
>>> input_distribution = np.random.normal(loc=0, scale=1, size=npts1)
>>> output_distribution = np.random.normal(loc=.5, scale=0.5, size=npts2)
>>> nselect = int(2e4)
>>> bins = np.linspace(-2, 2, 50)
>>> indices = distribution_matching_indices(input_distribution, output_distribution, nselect, bins)
../_images/matched_distributions.png