distribution_matching_indices¶
- halotools.utils.distribution_matching_indices(input_distribution, output_distribution, nselect, bins, seed=None)[source]¶
Calcuate a set of indices that will resample (with replacement)
input_distribution
so that it matchesoutput_distribution
.This function is useful, for example, for comparing a pair of samples with matching stellar mass functions.
- Parameters:
- input_distributionndarray
Numpy array of shape (npts1, ) storing the distribution that requires modification
- output_distributionndarray
Numpy array of shape (npts2, ) defining the desired output distribution
- nselectint
Number of points to select from
input_distribution
.- binsndarray
Binning used to estimate the PDFs. Default is 100 bins automatically determined by
numpy.histogram
.- seedint, optional
Random number seed used to generate indices. Default is None for stochastic results.
- Returns:
- indicesndarray
Numpy array of shape (nselect, ) storing indices ranging from [0, npts1) such that
input_distribution[indices]
will have a PDF that matches the PDF ofoutput_distribution
.
Notes
Pay careful attention that your bins are appropriate for your two distributions. The PDF of the returned result will only match the
output_distribution
PDF tabulated in the inputbins
. Depending on the two distributions and your choice of bins, may not be possible to construct matching PDFs if your sampling is too sparse or your bins are inappropriate.Examples
>>> npts1, npts2 = int(1e5), int(1e4) >>> input_distribution = np.random.normal(loc=0, scale=1, size=npts1) >>> output_distribution = np.random.normal(loc=.5, scale=0.5, size=npts2) >>> nselect = int(2e4) >>> bins = np.linspace(-2, 2, 50) >>> indices = distribution_matching_indices(input_distribution, output_distribution, nselect, bins)