Instructions for Reducing and Caching a Rockstar Catalog

This section of the documentation describes how to reduce a Rockstar hlist ASCII file into an hdf5 file stored in Halotools cache. If you just want to read an hlist file without caching the data, you should instead use the TabularAsciiReader class.

You are responsible for acquiring or generating your own catalog of Rockstar halos. The Simulations and Halo Catalogs Provided by Halotools section of the documentation provides links to the web locations of the original ASCII data upon which the Halotools-provided catalogs are based. Additional catalogs can also be found at The CosmoSim database.

Before reducing and caching your catalog with please carefully read both this tutorial and the entire docstring of the RockstarHlistReader class.

Initializing the reader

To instantiate the RockstarHlistReader class, in addition to the path to the ASCII you must provide the following information:

  1. the columns of data you want

  2. metadata used to keep track of the simulation in cache

  3. an output filename

We will comment on each of these three inputs in turn. For a description of how to additionally make on-the-fly row-cuts, see the section on Making on-the-fly row-cuts (optional) below.

Specifying the columns you want with the columns_to_keep_dict.

In order to use the RockstarHlistReader class, you must manually inspect the hlist file to determine what information you want, and in what column the information is stored. This information can be determined by inspecting the header of the ASCII file. See the docstring of the RockstarHlistReader class for instructions on exactly how this dictionary is formatted.

This step of the reduction cannot be robustly automated because there is no universal standard form for Rockstar headers. Even if the header became standard, existing hlist files that are currently publicly available and in wide use would not conform to the new standard. With your labor in this step, you are providing the necessary standardization.

Specifing the simulation metadata

There are six required pieces of metadata that you must specify: the simname, halo_finder, version_name, redshift, Lbox and particle_mass. (In most cases, specifying the halo_finder is redundant, though the RockstarHlistReader class can also be used to reduce and cache any halo catalog that is formatted in the same way as a typical hlist file. In fact, that is how the Bolshoi-BDM catalogs provided by Halotools were generated).

The first four of these pieces of metadata govern how the cache will be used to keep track of this catalog. After caching the halos with the read_halocat method, you can load the cached catalog into memory as follows:

>>> from halotools.sim_manager import CachedHaloCatalog
>>> halocat = CachedHaloCatalog(simname = simname, halo_finder = halo_finder, version_name = version_name, redshift = redshift) 

Each time you process a new halo catalog, we recommend that you choose a different version_name, especially if you make different cuts. If you use one of the same simulations as those provided by Halotools, it is recommended that you follow the simname conventions laid out on the Simulations and Halo Catalogs Provided by Halotools page. Although not strictly necessary, make an effort to specify the redshift accurately to four decimals as this is the string format used to store the redshift metadata.

Lbox should be given in Mpc/h and particle_mass in Msun/h.

Although optional, it is strongly recommended that you also set the processing_notes argument to be some string giving a plain-language description of the row-cuts you placed on the catalog (see Making on-the-fly row-cuts (optional) below).

Choosing your output_fname

By setting the output_fname to be the absolute path to an hdf5 file, you are free to store the halos in any location on disk that you like. By setting output_fname to the string std_cache_loc, Halotools will place the reduced catalog in the following location on disk:

$HOME/.astropy/cache/halotools/halo_catalogs/simname/halo_finder/input_fname.version_name.hdf5

Wherever you store the hdf5 file of halos, you should try to choose a reasonably permanent location to keep them. Moving halo catalogs around on disk is a common way to introduce buggy behavior into simulation analysis. If you decide you want to change the disk location of the hdf5 file you produced after storing it in cache, you will need to update the cache log with the new location. In that event, see the Relocating Simulation Data and Updating the Cache section of the documentation.

Making on-the-fly row-cuts (optional)

Halo catalogs typically occupy many Gb of disk space. Because of the shape of the CDM mass function, most halos in any catalog are right at (or beyond) the resolution limits of the simulation. Thus for many science targets most of the halos in the catalog are irrelevant and so you should not waste disk space storing them. For example, the Halotools-provided catalogs only include halos and subhalos with a few hundred particles (as described in the processing_notes metadata bound to these catalogs).

The RockstarHlistReader class allows you to apply cuts on the rows of ASCII data as the file is being read, so that only halos passing your desired cuts will be stored in the cached catalog. This not only saves disk space, but because the cuts are applied on-the-fly, this also allows you to reduce a halo catalog that is too large to fit into RAM. With the RockstarHlistReader class, only the final, reduced catalog need fit into memory.

By default, no row-cuts are made, but the following four optional keyword arguments allow you to construct a highly customizable on-the-fly cut on the ASCII rows:

row_cut_min_dict, row_cut_max_dict, row_cut_eq_dict and row_cut_neq_dict.

See the notes in the RockstarHlistReader docstring for how to construct a cut of your liking with these arguments.

Running the reader

Once you have instantiated the RockstarHlistReader class, you can read the ASCII data by calling the read_halocat method. As described in the the read_halocat docstring, this method does not return anything but instead binds the halo catalog to the halo_table attribute of the reader instance. If you call the read_halocat method with no arguments, that is all that will happen: by default, Halotools will not write large amounts of data to your disk. However, in the majority of use-cases you should set both of these arguments to True, in which case your reduced catalog will be saved on disk and stored in cache.

The end result

After calling the read_halocat method, your catalog is now stored in cache and you can load it into memory using the CachedHaloCatalog class as follows:

>>> from halotools.sim_manager import CachedHaloCatalog
>>> halocat = CachedHaloCatalog(simname = simname, halo_finder = halo_finder, version_name = version_name, redshift = redshift) 

When you load an instance of the CachedHaloCatalog class, the metadata of the hdf5 file you created is inspected and all its metadata gets bound to the CachedHaloCatalog as convenience-attributes. For example, you can remind yourself of the cuts you placed on the catalog:

>>> print(halocat.processing_notes) 

The RockstarHlistReader automatically creates some additional metadata to help with your bookkeeping. For example:

>>> print(halocat.orig_ascii_fname) 
>>> print(halocat.time_of_catalog_production) 

See the docstring of the CachedHaloCatalog class for more information.