class halotools.sim_manager.RockstarHlistReader(input_fname, columns_to_keep_dict, output_fname, simname, halo_finder, redshift, version_name, Lbox, particle_mass, header_char='#', row_cut_min_dict={}, row_cut_max_dict={}, row_cut_eq_dict={}, row_cut_neq_dict={}, overwrite=False, ignore_nearby_redshifts=False, dz_tol=0.05, processing_notes=' ', **kwargs)[source] [edit on github]

The RockstarHlistReader reads Rockstar hlist ASCII files, stores them as hdf5 files in the Halotools cache, and updates the cache log.

It is important that you carefully read the Instructions for Reducing and Caching a Rockstar Catalog before using this class.

RockstarHlistReader is a subclass of TabularAsciiReader, and supplements this behavior with the ability to read, update, and search the Halotools cache log.

If you are planning to use the Halotools cache manager to store and keep track of your halo catalogs, this is the class to use. For a stand-alone reader of Rockstar hlists or large ASCII files in general, you should instead use the TabularAsciiReader class.

Notes

When the row_cut_min_dict, row_cut_max_dict, row_cut_eq_dict and row_cut_neq_dict keyword arguments are used simultaneously, only rows passing all cuts will be kept.

Examples

Suppose you wish to reduce the ASCII data stored by input_fname into a data structure with the following columns: halo ID, virial mass, x, y, z position, and peak circular velocity, where the data are stored in column 1, 45, 17, 18, 19 and 56, respectively, where the first column is index 0. If you wish to keep all rows of the halo catalog:

>>> columns_to_keep_dict = {'halo_id': (1, 'i8'), 'halo_mvir': (45, 'f4'), 'halo_x': (17, 'f4'), 'halo_y': (18, 'f4'), 'halo_z': (19, 'f4'), 'halo_rvir': (36, 'f4')}
>>> simname = 'any_nickname'
>>> halo_finder = 'rockstar'
>>> version_name = 'rockstar_v1.53_no_cuts'
>>> redshift = 0.3478
>>> Lbox, particle_mass = 400, 3.5e8
>>> reader = RockstarHlistReader(input_fname, columns_to_keep_dict, output_fname, simname, halo_finder, redshift, version_name, Lbox, particle_mass)

The halo catalog is now stored in cache and can be loaded into memory at any time using the CachedHaloCatalog class with the following syntax.

>>> from halotools.sim_manager import CachedHaloCatalog
>>> halocat = CachedHaloCatalog(simname = 'any_nickname', halo_finder = 'rockstar', version_name = 'rockstar_v1.53_no_cuts', redshift = 0.3)

Note that once you have stored the catalog with the precise redshift, to load it back into memory you do not need to remember the exact redshift to four significant digits, you just need to be within dz_tol. You can always verify that you are working with the catalog you intended by inspecting the metadata:

>>> print(halocat.redshift)
>>> print(halocat.version_name)

Now suppose that for your science target of interest, subhalos in your simulation with $$M_{\rm vir} < 10^{10} M_{\odot}/h$$ are not properly resolved. In this case you can use the row_cut_min_dict keyword argument to discard such halos as the file is read.

>>> row_cut_min_dict = {'halo_mvir': 1e10}
>>> version_name = 'rockstar_v1.53_mvir_gt_100'
>>> processing_notes = 'All halos with halo_mvir < 1e10 km/s were thrown out during the initial catalog reduction'
>>> reader = RockstarHlistReader(input_fname, columns_to_keep_dict, output_fname, simname, halo_finder, redshift, version_name, Lbox, particle_mass, row_cut_min_dict=row_cut_min_dict, processing_notes=processing_notes)

Note the list we passed to the read_halocat method via the columns_to_convert_from_kpc_to_mpc argument. In common rockstar catalogs, $$R_{\rm vir}$$ is stored in kpc/h units, while halo centers are stored in Mpc/h units, a potential source of buggy behavior. Take note of all units in your raw halo catalog before caching reductions of it.

After calling read_halocat, the halo catalog is also stored in cache, and we load it in the same way as before but now using a different version_name:

>>> halocat = CachedHaloCatalog(simname = 'any_nickname', halo_finder = 'rockstar', version_name = 'rockstar_v1.53_mvir_gt_100', redshift = 0.3)

Using the processing_notes argument is helpful in case you forgot exactly how the catalog was initially reduced. The processing_notes string you passed to the constructor is stored as metadata on the cached hdf5 file and is automatically bound to the CachedHaloCatalog instance:

>>> print(halocat.processing_notes)
>>> 'All halos with halo_mvir < 1e10 were thrown out during the initial catalog reduction'

Any cut you placed on the catalog during its initial reduction is automatically bound to the cached halo catalog as additional metadata. In this case, since we placed a lower bound on $$M_{\rm vir}$$:

>>> print(halocat.halo_mvir_row_cut_min)
>>> 100

This metadata provides protection against typographical errors that may have been accidentally introduced in the hand-written processing_notes. Additional metadata that is automatically bound to all cached catalogs includes other sanity checks on our bookkeeping such as orig_ascii_fname and time_of_catalog_production.

Methods Summary

 add_supplementary_halocat_columns() Add the halo_nfw_conc and halo_hostid columns. read_halocat(columns_to_convert_from_kpc_to_mpc) Method reads the ascii data and binds the resulting catalog to self.halo_table. update_cache_log() Method updates the cache log with the new catalog, provided that it is safe to add to the cache. write_to_disk() Method writes self.halo_table to self.output_fname and also calls the self._write_metadata method to place the hdf5 file into standard form.

Methods Documentation

Add the halo_nfw_conc and halo_hostid columns. This implementation will eventually change in favor of something more flexible.

Method reads the ascii data and binds the resulting catalog to self.halo_table.

By default, the optional write_to_disk and update_cache_log arguments are set to False because Halotools will not write large amounts of data to disk without your explicit instructions to do so.

If you want an untouched replica of the downloaded halo catalog on disk, then you should set both of these arguments to True, in which case your reduced catalog will be saved on disk and stored in cache immediately.

However, you may wish to supplement your halo catalog with additional halo properties before storing on disk. This can easily be accomplished by simply manually adding columns to the halo_table attribute of the RockstarHlistReader instance after reading in the data. In this case, set write_to_disk to False, add your new data, and then call the write_to_disk method and update_cache_log methods in succession. In such a case, it is good practice to make an explicit note of what you have done in the processing_notes attribute of the reader instance so that you will have a permanent record of how you processed the catalog.

Parameters: columns_to_convert_from_kpc_to_mpc : list of strings List providing column names that should be divided by 1000 in order to convert from kpc/h to Mpc/h units. This is necessary with typical rockstar catalogs for the halo_rvir, halo_rs and halo_xoff columns, which are stored in kpc/h, whereas halo centers are typically stored in Mpc/h. All strings appearing in columns_to_convert_from_kpc_to_mpc must also appear in the columns_to_keep_dict. It is permissible for columns_to_convert_from_kpc_to_mpc to be an empty list. See Notes for further discussion. Note that this feature is only temporary. The API of this function will change when Halotools adopts Astropy Units. write_to_disk : bool, optional If True, the write_to_disk method will be called automatically. Default is False, in which case you must call the write_to_disk method yourself to store the processed catalog. In that case, you will also need to manually call the update_cache_log method after writing to disk. update_cache_log : bool, optional If True, the update_cache_log method will be called automatically. Default is False, in which case you must call the update_cache_log method yourself to add the the processed catalog to the cache. add_supplementary_halocat_columns : bool, optional Boolean determining whether the halo_table will have additional columns added to it computed by the add_supplementary_halocat_columns method. Default is True. Note that this feature is rather bare-bones and is likely to significantly evolve and/or entirely vanish in future releases. chunk_memory_size : int, optional Determine the approximate amount of Megabytes of memory that will be processed in chunks. This variable must be smaller than the amount of RAM on your machine; choosing larger values typically improves performance. Default is 500 Mb.

Notes

Regarding the columns_to_convert_from_kpc_to_mpc argument, of course there could be other columns whose units you want to convert prior to caching the catalog, and simply division by 1000 may not be the appropriate unit conversion. To handle such cases, you should do the following. First, use the read_halocat method with the write_to_disk and update_cache_log arguments both set to False. This will load the catalog from disk into memory. Now you are free to overwrite any column in the halo_table that you wish. When you have finished preparing the catalog, call the write_to_disk and update_cache_log methods (in that order). As you do so, be sure to include explicit notes of all manipulations you made on the halo_table between the time you called read_halocat and write_to_disk, and bind these notes to the processing_notes argument.

update_cache_log()[source] [edit on github]

Method updates the cache log with the new catalog, provided that it is safe to add to the cache.

write_to_disk()[source] [edit on github]

Method writes self.halo_table to self.output_fname and also calls the self._write_metadata method to place the hdf5 file into standard form.

It is likely that you will want to call the update_cache_log method after calling write_to_disk so that you can take advantage of the convenient syntax provided by the CachedHaloCatalog class.