RockstarHlistReader¶
- class halotools.sim_manager.RockstarHlistReader(input_fname, columns_to_keep_dict, output_fname, simname, halo_finder, redshift, version_name, Lbox, particle_mass, header_char='#', row_cut_min_dict={}, row_cut_max_dict={}, row_cut_eq_dict={}, row_cut_neq_dict={}, overwrite=False, ignore_nearby_redshifts=False, dz_tol=0.05, processing_notes=' ', **kwargs)[source]¶
Bases:
TabularAsciiReader
The
RockstarHlistReader
reads Rockstar hlist ASCII files, stores them as hdf5 files in the Halotools cache, and updates the cache log.It is important that you carefully read the Instructions for Reducing and Caching a Rockstar Catalog before using this class.
RockstarHlistReader
is a subclass ofTabularAsciiReader
, and supplements this behavior with the ability to read, update, and search the Halotools cache log.If you are planning to use the Halotools cache manager to store and keep track of your halo catalogs, this is the class to use. For a stand-alone reader of Rockstar hlists or large ASCII files in general, you should instead use the
TabularAsciiReader
class.- Parameters:
- input_fnamestring
Absolute path of the file to be processed.
- columns_to_keep_dictdict
Dictionary used to define which columns of the tabular ASCII data will be kept.
Each key of the dictionary will be the name of the column in the returned data table. The value bound to each key is a two-element tuple. The first tuple entry is an integer providing the index of the column to be kept, starting from 0. The second tuple entry is a string defining the Numpy dtype of the data in that column, e.g., ‘f4’ for a float, ‘f8’ for a double, or ‘i8’ for a long.
Thus an example
columns_to_keep_dict
could be {‘halo_mvir’: (1, ‘f4’), ‘halo_id’: (0, ‘i8’), ‘halo_spin’: (45, ‘f4’)}The columns of all halo tables stored in the Halotools cache must must begin with the substring
halo_
. At a minimum, any halo table stored in cache must have the following columns:halo_id
,halo_x
,halo_y
,halo_z
, plus at least one additional column (typically storing a mass-like variable). These requirements must be met if you want to use the Halotools cache system, or if you want Halotools to populate your halo catalog with mock galaxies. If you do not want to conform to these conventions, just use theTabularAsciiReader
and handle the file storage using your own preferred method.- output_fnamestring
Absolute path to the location where the hdf5 file will be stored. The file extension must be ‘.hdf5’. If the file already exists, you must set the keyword argument
overwrite
to True.If output_fname is set to
std_cache_loc
, Halotools will place the catalog in the following location:$HOME/.astropy/cache/halotools/halo_catalogs/simname/halo_finder/input_fname.version_name.hdf5
- simnamestring
Nickname of the simulation used as a shorthand way to keep track of the halo catalogs in your cache. The simnames of the Halotools-provided catalogs are ‘bolshoi’, ‘bolplanck’, ‘consuelo’ and ‘multidark’.
- halo_finderstring
Nickname of the halo-finder used to generate the hlist file from particle data. Most likely this should be ‘rockstar’, though there are also publicly available hlists processed with the ‘bdm’ halo-finder.
- redshiftfloat
Redshift of the halo catalog.
- version_namestring
Nickname of the version of the halo catalog you produce using RockstarHlistReader. The
version_name
is used as a bookkeeping tool in the cache log.If you process your own halo catalog with the RockstarHlistReader, you should choose your own version name that differs from the version names of the Halotools-provided catalogs.
- Lboxfloat
Box size of the simulation in Mpc/h.
Lbox
will automatically be added to thesupplementary_metadata_dict
so that your hdf5 file will have the box size bound as metadata.- particle_massfloat
Mass of the dark matter particles of the simulation in Msun/h.
particle_mass
will automatically be added to thesupplementary_metadata_dict
so that your hdf5 file will have the particle mass bound as metadata.- row_cut_min_dictdict, optional
Dictionary used to place a lower-bound cut on the rows of the tabular ASCII data, e.g., to ignore halos below some mass cut.
Each key of the dictionary must also be a key of the input
columns_to_keep_dict
; for purposes of good bookeeping, you are not permitted to place a cut on a column that you do not keep. The value bound to each key serves as the lower bound on the data stored in that row. A row with a smaller value than this lower bound for the corresponding column will not appear in the returned data table.For example, if row_cut_min_dict = {‘mass’: 1e10}, then all rows of the returned data table will have a mass greater than 1e10.
- row_cut_max_dictdict, optional
Dictionary used to place an upper-bound cut on the rows of the tabular ASCII data, e.g., to ignore halos not satisfying some relaxation criterion.
Each key of the dictionary must also be a key of the input
columns_to_keep_dict
; for purposes of good bookeeping, you are not permitted to place a cut on a column that you do not keep. The value bound to each key serves as the upper bound on the data stored in that row. A row with a larger value than this upper bound for the corresponding column will not appear in the returned data table.For example, if row_cut_min_dict = {‘mass’: 1e15}, then all rows of the returned data table will have a mass less than 1e15.
- row_cut_eq_dictdict, optional
Dictionary used to place an equality cut on the rows of the tabular ASCII data, e.g., to ignore subhalos.
Each key of the dictionary must also be a key of the input
columns_to_keep_dict
; for purposes of good bookeeping, you are not permitted to place a cut on a column that you do not keep. The value bound to each key serves as the required value for the data stored in that row. Only rows with a value equal to this required value for the corresponding column will appear in the returned data table.For example, if row_cut_eq_dict = {‘upid’: -1}, then all rows of the returned data table will have a upid of -1.
- row_cut_neq_dictdict, optional
Dictionary used to place an inequality cut on the rows of the tabular ASCII data.
Each key of the dictionary must also be a key of the input
columns_to_keep_dict
; for purposes of good bookeeping, you are not permitted to place a cut on a column that you do not keep. The value bound to each key serves as a forbidden value for the data stored in that row. Rows with a value equal to this forbidden value for the corresponding column will not appear in the returned data table.For example, if row_cut_neq_dict = {‘upid’: -1}, then no rows of the returned data table will have a upid of -1.
- header_charstr, optional
String to be interpreted as a header line of the ascii hlist file. Default is ‘#’.
- overwritebool, optional
If the chosen
output_fname
already exists, then you must setoverwrite
to True in order to write the file to disk. Default is False.- ignore_nearby_redshiftsbool, optional
Flag used to determine whether nearby redshifts in cache will be ignored. If there are existing halo catalogs in the Halotools cache with matching
simname
,halo_finder
andversion_name
, and if one or more of those catalogs has a redshift withindz_tol
, then the ignore_nearby_redshifts flag must be set to True in order for the new halo catalog to be stored in cache. Default is False.- dz_tolfloat, optional
Tolerance determining when another halo catalog in cache is deemed nearby. Default is 0.05.
- processing_notesstring, optional
String used to provide supplementary notes that will be attached to the hdf5 file storing your halo catalog.
Notes
When the
row_cut_min_dict
,row_cut_max_dict
,row_cut_eq_dict
androw_cut_neq_dict
keyword arguments are used simultaneously, only rows passing all cuts will be kept.Examples
Suppose you wish to reduce the ASCII data stored by
input_fname
into a data structure with the following columns: halo ID, virial mass, x, y, z position, and peak circular velocity, where the data are stored in column 1, 45, 17, 18, 19 and 56, respectively, where the first column is index 0. If you wish to keep all rows of the halo catalog:>>> columns_to_keep_dict = {'halo_id': (1, 'i8'), 'halo_mvir': (45, 'f4'), 'halo_x': (17, 'f4'), 'halo_y': (18, 'f4'), 'halo_z': (19, 'f4'), 'halo_rvir': (36, 'f4')} >>> simname = 'any_nickname' >>> halo_finder = 'rockstar' >>> version_name = 'rockstar_v1.53_no_cuts' >>> redshift = 0.3478 >>> Lbox, particle_mass = 400, 3.5e8
>>> reader = RockstarHlistReader(input_fname, columns_to_keep_dict, output_fname, simname, halo_finder, redshift, version_name, Lbox, particle_mass) >>> reader.read_halocat(write_to_disk = True, update_cache_log = True)
The halo catalog is now stored in cache and can be loaded into memory at any time using the
CachedHaloCatalog
class with the following syntax.>>> from halotools.sim_manager import CachedHaloCatalog >>> halocat = CachedHaloCatalog(simname = 'any_nickname', halo_finder = 'rockstar', version_name = 'rockstar_v1.53_no_cuts', redshift = 0.3)
Note that once you have stored the catalog with the precise redshift, to load it back into memory you do not need to remember the exact redshift to four significant digits, you just need to be within
dz_tol
. You can always verify that you are working with the catalog you intended by inspecting the metadata:>>> print(halocat.redshift) >>> print(halocat.version_name)
Now suppose that for your science target of interest, subhalos in your simulation with \(M_{\rm vir} < 10^{10} M_{\odot}/h\) are not properly resolved. In this case you can use the
row_cut_min_dict
keyword argument to discard such halos as the file is read.>>> row_cut_min_dict = {'halo_mvir': 1e10} >>> version_name = 'rockstar_v1.53_mvir_gt_100' >>> processing_notes = 'All halos with halo_mvir < 1e10 km/s were thrown out during the initial catalog reduction'
>>> reader = RockstarHlistReader(input_fname, columns_to_keep_dict, output_fname, simname, halo_finder, redshift, version_name, Lbox, particle_mass, row_cut_min_dict=row_cut_min_dict, processing_notes=processing_notes) >>> reader.read_halocat(['halo_rvir'], write_to_disk = True, update_cache_log = True)
Note the list we passed to the
read_halocat
method via the columns_to_convert_from_kpc_to_mpc argument. In common rockstar catalogs, \(R_{\rm vir}\) is stored in kpc/h units, while halo centers are stored in Mpc/h units, a potential source of buggy behavior. Take note of all units in your raw halo catalog before caching reductions of it.After calling
read_halocat
, the halo catalog is also stored in cache, and we load it in the same way as before but now using a differentversion_name
:>>> halocat = CachedHaloCatalog(simname = 'any_nickname', halo_finder = 'rockstar', version_name = 'rockstar_v1.53_mvir_gt_100', redshift = 0.3)
Using the
processing_notes
argument is helpful in case you forgot exactly how the catalog was initially reduced. Theprocessing_notes
string you passed to the constructor is stored as metadata on the cached hdf5 file and is automatically bound to theCachedHaloCatalog
instance:>>> print(halocat.processing_notes) >>> 'All halos with halo_mvir < 1e10 were thrown out during the initial catalog reduction'
Any cut you placed on the catalog during its initial reduction is automatically bound to the cached halo catalog as additional metadata. In this case, since we placed a lower bound on \(M_{\rm vir}\):
>>> print(halocat.halo_mvir_row_cut_min) >>> 100
This metadata provides protection against typographical errors that may have been accidentally introduced in the hand-written
processing_notes
. Additional metadata that is automatically bound to all cached catalogs includes other sanity checks on our bookkeeping such asorig_ascii_fname
andtime_of_catalog_production
.Methods Summary
Add the halo_nfw_conc and halo_hostid columns.
read_halocat
(columns_to_convert_from_kpc_to_mpc)Method reads the ascii data and binds the resulting catalog to
self.halo_table
.Method updates the cache log with the new catalog, provided that it is safe to add to the cache.
Method writes
self.halo_table
toself.output_fname
and also calls theself._write_metadata
method to place the hdf5 file into standard form.Methods Documentation
- add_supplementary_halocat_columns()[source]¶
Add the halo_nfw_conc and halo_hostid columns. This implementation will eventually change in favor of something more flexible.
- read_halocat(columns_to_convert_from_kpc_to_mpc, write_to_disk=False, update_cache_log=False, add_supplementary_halocat_columns=True, **kwargs)[source]¶
Method reads the ascii data and binds the resulting catalog to
self.halo_table
.By default, the optional
write_to_disk
andupdate_cache_log
arguments are set to False because Halotools will not write large amounts of data to disk without your explicit instructions to do so.If you want an untouched replica of the downloaded halo catalog on disk, then you should set both of these arguments to True, in which case your reduced catalog will be saved on disk and stored in cache immediately.
However, you may wish to supplement your halo catalog with additional halo properties before storing on disk. This can easily be accomplished by simply manually adding columns to the
halo_table
attribute of theRockstarHlistReader
instance after reading in the data. In this case, setwrite_to_disk
to False, add your new data, and then call thewrite_to_disk
method andupdate_cache_log
methods in succession. In such a case, it is good practice to make an explicit note of what you have done in theprocessing_notes
attribute of the reader instance so that you will have a permanent record of how you processed the catalog.- Parameters:
- columns_to_convert_from_kpc_to_mpclist of strings
List providing column names that should be divided by 1000 in order to convert from kpc/h to Mpc/h units. This is necessary with typical rockstar catalogs for the
halo_rvir
,halo_rs
andhalo_xoff
columns, which are stored in kpc/h, whereas halo centers are typically stored in Mpc/h. All strings appearing incolumns_to_convert_from_kpc_to_mpc
must also appear in thecolumns_to_keep_dict
. It is permissible forcolumns_to_convert_from_kpc_to_mpc
to be an empty list. See Notes for further discussion.Note that this feature is only temporary. The API of this function will change when Halotools adopts Astropy Units.
- write_to_diskbool, optional
If True, the
write_to_disk
method will be called automatically. Default is False, in which case you must call thewrite_to_disk
method yourself to store the processed catalog. In that case, you will also need to manually call theupdate_cache_log
method after writing to disk.- update_cache_logbool, optional
If True, the
update_cache_log
method will be called automatically. Default is False, in which case you must call theupdate_cache_log
method yourself to add the the processed catalog to the cache.- add_supplementary_halocat_columnsbool, optional
Boolean determining whether the halo_table will have additional columns added to it computed by the add_supplementary_halocat_columns method. Default is True.
Note that this feature is rather bare-bones and is likely to significantly evolve and/or entirely vanish in future releases.
- chunk_memory_sizeint, optional
Determine the approximate amount of Megabytes of memory that will be processed in chunks. This variable must be smaller than the amount of RAM on your machine; choosing larger values typically improves performance. Default is 500 Mb.
Notes
Regarding the
columns_to_convert_from_kpc_to_mpc
argument, of course there could be other columns whose units you want to convert prior to caching the catalog, and simply division by 1000 may not be the appropriate unit conversion. To handle such cases, you should do the following. First, use theread_halocat
method with thewrite_to_disk
andupdate_cache_log
arguments both set to False. This will load the catalog from disk into memory. Now you are free to overwrite any column in the halo_table that you wish. When you have finished preparing the catalog, call thewrite_to_disk
andupdate_cache_log
methods (in that order). As you do so, be sure to include explicit notes of all manipulations you made on the halo_table between the time you calledread_halocat
andwrite_to_disk
, and bind these notes to theprocessing_notes
argument.
- update_cache_log()[source]¶
Method updates the cache log with the new catalog, provided that it is safe to add to the cache.
- write_to_disk()[source]¶
Method writes
self.halo_table
toself.output_fname
and also calls theself._write_metadata
method to place the hdf5 file into standard form.It is likely that you will want to call the
update_cache_log
method after callingwrite_to_disk
so that you can take advantage of the convenient syntax provided by theCachedHaloCatalog
class.