Instructions for Reducing and Caching a Rockstar Catalog¶
This section of the documentation describes how to reduce
a Rockstar hlist ASCII file into an hdf5 file stored in Halotools cache.
If you just want to read an hlist file without caching the data,
you should instead use the TabularAsciiReader
class.
You are responsible for acquiring or generating your own catalog of Rockstar halos. The Simulations and Halo Catalogs Provided by Halotools section of the documentation provides links to the web locations of the original ASCII data upon which the Halotools-provided catalogs are based. Additional catalogs can also be found at The CosmoSim database.
Before reducing and caching your catalog with
please carefully read both this tutorial and the entire docstring of
the RockstarHlistReader
class.
Initializing the reader¶
To instantiate the RockstarHlistReader
class,
in addition to the path to the ASCII you must provide the following information:
the columns of data you want
metadata used to keep track of the simulation in cache
an output filename
We will comment on each of these three inputs in turn. For a description of how to additionally make on-the-fly row-cuts, see the section on Making on-the-fly row-cuts (optional) below.
Specifying the columns you want with the columns_to_keep_dict.¶
In order to use the RockstarHlistReader
class,
you must manually inspect the hlist file to determine what information you want,
and in what column the information is stored. This information can be determined
by inspecting the header of the ASCII file. See the docstring of the
RockstarHlistReader
class for instructions on exactly
how this dictionary is formatted.
This step of the reduction cannot be robustly automated because there is no universal standard form for Rockstar headers. Even if the header became standard, existing hlist files that are currently publicly available and in wide use would not conform to the new standard. With your labor in this step, you are providing the necessary standardization.
Specifing the simulation metadata¶
There are six required pieces of metadata that you must specify:
the simname
, halo_finder
, version_name
, redshift
,
Lbox
and particle_mass
.
(In most cases, specifying the halo_finder
is redundant,
though the RockstarHlistReader
class
can also be used to reduce and cache any halo catalog that is formatted in the
same way as a typical hlist file. In fact, that is how the Bolshoi-BDM catalogs
provided by Halotools were generated).
The first four of these pieces of metadata govern how the cache will
be used to keep track of this catalog. After caching the halos with the
read_halocat
method,
you can load the cached catalog into memory as follows:
>>> from halotools.sim_manager import CachedHaloCatalog
>>> halocat = CachedHaloCatalog(simname = simname, halo_finder = halo_finder, version_name = version_name, redshift = redshift)
Each time you process a new halo catalog, we recommend that you choose a different version_name
,
especially if you make different cuts.
If you use one of the same simulations as those provided by Halotools,
it is recommended that you follow the simname
conventions laid out on the
Simulations and Halo Catalogs Provided by Halotools page.
Although not strictly necessary, make an effort to specify the redshift accurately to four decimals
as this is the string format used to store the redshift
metadata.
Lbox
should be given in Mpc/h and particle_mass
in Msun/h.
Although optional, it is strongly recommended that you also set the
processing_notes
argument to be some string giving a plain-language description of
the row-cuts you placed on the catalog (see Making on-the-fly row-cuts (optional) below).
Choosing your output_fname¶
By setting the output_fname
to be the absolute path to an hdf5 file,
you are free to store the halos in any location on disk that you like.
By setting output_fname
to the string std_cache_loc
,
Halotools will place the reduced catalog in the following location on disk:
$HOME/.astropy/cache/halotools/halo_catalogs/simname/halo_finder/input_fname.version_name.hdf5
Wherever you store the hdf5 file of halos, you should try to choose a reasonably permanent location to keep them. Moving halo catalogs around on disk is a common way to introduce buggy behavior into simulation analysis. If you decide you want to change the disk location of the hdf5 file you produced after storing it in cache, you will need to update the cache log with the new location. In that event, see the Relocating Simulation Data and Updating the Cache section of the documentation.
Making on-the-fly row-cuts (optional)¶
Halo catalogs typically occupy many Gb of disk space. Because of the
shape of the CDM mass function, most halos in any catalog are right at (or beyond)
the resolution limits of the simulation. Thus for many science targets
most of the halos in the catalog are irrelevant and so you should not waste
disk space storing them. For example, the Halotools-provided catalogs
only include halos and subhalos with a few hundred particles (as described
in the processing_notes
metadata bound to these catalogs).
The RockstarHlistReader
class allows you to
apply cuts on the rows of ASCII data as the file is being read, so that only
halos passing your desired cuts will be stored in the cached catalog.
This not only saves disk space, but because the cuts are applied on-the-fly,
this also allows you to reduce a halo catalog that is too large to fit into RAM.
With the RockstarHlistReader
class, only the final, reduced
catalog need fit into memory.
By default, no row-cuts are made, but the following four optional keyword arguments allow you to construct a highly customizable on-the-fly cut on the ASCII rows:
row_cut_min_dict, row_cut_max_dict, row_cut_eq_dict and row_cut_neq_dict.
See the notes in the RockstarHlistReader
docstring
for how to construct a cut of your liking with these arguments.
Running the reader¶
Once you have instantiated the RockstarHlistReader
class,
you can read the ASCII data by calling the
read_halocat
method.
As described in the the read_halocat
docstring,
this method does not return anything but instead binds the halo catalog to the
halo_table
attribute of the reader instance. If you call the
read_halocat
method with no arguments,
that is all that will happen: by default, Halotools will not write large amounts of
data to your disk.
However, in the majority of use-cases you should set both of these arguments to True,
in which case your reduced catalog will be saved on disk and stored in cache.
The end result¶
After calling the read_halocat
method,
your catalog is now stored in cache and you can load it into memory using
the CachedHaloCatalog
class as follows:
>>> from halotools.sim_manager import CachedHaloCatalog
>>> halocat = CachedHaloCatalog(simname = simname, halo_finder = halo_finder, version_name = version_name, redshift = redshift)
When you load an instance of the CachedHaloCatalog
class,
the metadata of the hdf5 file you created is inspected and all its metadata gets bound to the
CachedHaloCatalog
as convenience-attributes. For example,
you can remind yourself of the cuts you placed on the catalog:
>>> print(halocat.processing_notes)
The RockstarHlistReader
automatically creates
some additional metadata to help with your bookkeeping. For example:
>>> print(halocat.orig_ascii_fname)
>>> print(halocat.time_of_catalog_production)
See the docstring of the CachedHaloCatalog
class for more information.