legendsimflow package ¶

legendsimflow.archive.create_plots_tarball(generated_dir, output, prefix)¶

Archive all plots/ directories under generated_dir into a .tar.xz.

Parameters:

generated_dir (Path) – The generated/ directory of the production cycle.
output (Path) – Path to write the .tar.xz tarball.
prefix (str) – Prefix directory name inside the archive (e.g. prod-v1-plots).

Return type:

legendsimflow.awkward module¶

legendsimflow.awkward.ak_isin(elements, test_elements, *, assume_unique=False)¶

legendsimflow.cli module¶

legendsimflow.cli._partition(xs, n)¶

legendsimflow.cli.snakemake_nersc_batch_cli()¶: Implementation of the snakemake-nersc-batch CLI.

legendsimflow.cli.snakemake_nersc_cli()¶: Implementation of the snakemake-nersc CLI.

legendsimflow.commands module¶

legendsimflow.commands._confine_by_volume(is_surface, volume, surface_max_intersections=100)¶

Helper function to generate confinement macro lines for a given volume.

Return type:: list[str]

legendsimflow.commands._get_full_name(node)¶

Get the name of the function being called, including the module path if it’s an attribute access.

Return type:: str

legendsimflow.commands.get_confinement_from_function(function_string, reg)¶

Get the confinement commands for a function defined in the GDML.

The function string must correspond to the following format:

module.function(<...>, arg=...)

where <...> will be replaced with the pyg4ometry.geant4.Registry instance for the geometry.

Parameters:

function_string – String describing the function to be used.
reg – The pyg4ometry registry containing the geometry information.

Returns:

list[str] – A list of remage confinement commands corresponding to the function definition.

Return type:

list[str]

legendsimflow.commands.make_remage_macro(config, simid, tier='stp', geom=None)¶

Render the remage macro for a given simulation and write it to disk.

This function reads the simulation configuration for the provided tier/simid, assembles the macro substitutions (e.g. GENERATOR, CONFINEMENT) using values and references defined under config.metadata, renders the specified macro template, writes the final macro file to the canonical input path, and returns both the macro text and the output file path.

Parameters:

config (AttrsDict) – Mapping-like Snakemake configuration that supports attribute-style access (e.g. config.experiment, config.metadata, etc.). The following fields are used: - experiment: name of the experiment to select tier-specific metadata. - metadata.tier[tier][experiment].generators: generator definitions. - metadata.tier[tier][experiment].confinement: confinement definitions.
simid (str) – Simulation identifier to select the simconfig.
tier (str) – Simulation tier (e.g. “stp”, “ver”, …). Default is “stp”.
geom (str | None) – Path to the geometry file.

Return type:

tuple[str, Path]

Returns:

A tuple with
- The rendered macro text.
- The path where the macro was written.

Notes

The macro template path is taken from the simconfig template field.
Supported substitutions currently include: GENERATOR and CONFINEMENT.
The user can provide arbitrary macro substitutions with the optional macro_substitutions field.
The macro is written to the canonical path returned by patterns.input_simjob_filename().
If config.nersc.dvs_ro is set, the vertices file will be read from the read-only filesystem mount /dvs_ro at NERSC.

legendsimflow.commands.remage_run(config, simid, *, jobid=None, tier='stp', geom='{input.geom}', procs=1, output='{output}', macro_free=False)¶

Build a remage CLI invocation string for a given simulation.

This constructs a shell-escaped command line for remage. When macro_free is True, the macro is rendered inline via make_remage_macro() and its content is passed directly on the CLI. When macro_free is False (default), the pre-existing macro file path is referenced on the CLI and substitutions are passed via --macro-substitutions; in that case the caller is responsible for generating the macro file beforehand (e.g. via the gen_remage_macro Snakemake rule).

Notes

Compatible with remage >= v0.16.
When macro_free is False (default), the command passes the macro file path and supplies macro substitutions via --macro-substitutions.
When macro_free is True, the rendered macro content is inlined on the CLI (comments and empty lines removed) and values are pre-substituted.
Two substitutions are always provided: N_EVENTS (from primaries_per_job or benchmark override) and SEED (a random 32-bit integer).
SEED is meant to be used as remage seed. It is determined by converting output to a 32-bit integer hash. If provided, the user config.simflow_rng_seed integer is added as offset.
The JOBID substitution is also provided if the jobid argument is not None.
If config.runcmd.remage is set, it is used to determine the remage executable (split with shlex.split()), otherwise remage is used.
If config.nersc.dvs_ro is set, remage is set to read all inputs from the read-only filesystem mount /dvs_ro at NERSC.
If config.nersc.scratch is set, the command will write the output file on the scratch disk and move it to the final expected destination at the end.

Parameters:

config (AttrsDict) – Snakemake-like configuration mapping. Must include metadata required by make_remage_macro() and optional benchmark and runcmd sections.
simid (str) – Simulation identifier for which to construct the command.
jobid (str | None) – Job identifier for the simulation run (string holding a zero-padded integer). Used as remage CLI macro substitution in case the macro contains it (e.g. if a vertices file is used).
tier (str) – Simulation tier (e.g., "stp", "ver"). Default is "stp".
geom (str | Path) – Path (or Snakemake placeholder) to the GDML geometry file.
procs (int) – Number of threads to pass to remage (integer or Snakemake placeholder). Internally uses remage’s --procs.
output (str | Path) – Path (or Snakemake placeholder) to the output remage file.
macro_free (bool) – If True, inline the macro contents on the CLI; if False, reference the macro file and pass substitutions via --macro-substitutions.

Return type:

str

Returns:

A shell-escaped command line suitable for direct execution.

legendsimflow.confine module¶

legendsimflow.confine._get_matching_volumes(volume_list, patterns)¶

Return volumes from volume_list whose names match patterns.

Wildcard patterns are supported via fnmatch.fnmatch().

Parameters:

volume_list (Iterable[str]) – List of volume names to search.
patterns (str | Sequence[str]) – Single wildcard pattern string or a list of patterns.

Return type:

list[str]

legendsimflow.confine.get_lar_minishroud_confine_commands(reg, pattern='minishroud_tube*', inside=True, lar_name='liquid_argon', outer_radius_in_mm=None, outer_height_in_mm=None)¶

Extract the commands for the LAr confinement inside/outside the NMS from the GDML.

Parameters:

reg (Registry) – The registry describing the geometry.
pattern (str | Sequence[str]) – The pattern used to search for physical volumes of minishrouds.
inside (bool) – If True, generate points inside the minishroud (NMS) volumes; if False, exclude the minishroud volumes from the generation region.
lar_name (str) – The name of the physical volume of the LAr.
outer_radius_in_mm (float | None) – If provided, gives an outer radius for the confinement. Only supported for outside confinement (inside=False).
outer_height_in_mm (float | None) – If provided, gives an outer height for the confinement. Only supported for outside confinement (inside=False).

Return type:

list[str]

Returns:

A list of confinement commands for remage.

legendsimflow.exceptions module¶

exception legendsimflow.exceptions.SimflowConfigError(message, block=None)¶

Bases: Exception

legendsimflow.hpge_pars module¶

legendsimflow.hpge_pars._iter_noise_waveforms(raw_files, hit_files, lh5_group, dsp_config, dsp_output, *, threshold=5, length=1000, energy_var='cuspEmax_cal')¶

Yield noise waveforms one at a time without accumulating them all in memory.

Parameters are the same as get_noise_maxima_and_sample().

legendsimflow.hpge_pars._lookup_generated_pars_file(l200data, metadata, runid, *, hit_tier_name='hit', pars_db=None)¶

Return type:: tuple[Any, Any]

legendsimflow.hpge_pars._remove_outliers(data, sigma=5)¶

Remove elements more than sigma standard deviations from the mean.

Return type:: ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

legendsimflow.hpge_pars.build_aoe_res_func(function)¶

A/E resolution function builder.

Return type:: Callable

legendsimflow.hpge_pars.build_aoe_res_func_dict(l200data, metadata, runid, *, hit_tier_name='hit', aoe_res_pars=None)¶

Build A/E resolution functions for each HPGe detector in a LEGEND-200 run.

Return type:

dict[str, Callable]

Returns:

Mapping of HPGe name to A/E resolution as a function of energy, where
energy is expected in units of keV.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
aoe_res_pars (dict | AttrsDict | None) – from lookup_aoe_res_metadata().

legendsimflow.hpge_pars.build_aoe_res_func_from_entry(meta)¶

Build a bound A/E resolution callable from a single metadata entry.

Parameters:: meta (dict | AttrsDict) – A single detector’s A/E resolution metadata, with keys expression and parameters.
Return type:: Callable
Returns:: Callable that takes energy in keV and returns the A/E resolution (sigma).

legendsimflow.hpge_pars.build_energy_res_func(function)¶

Energy resolution function builder.

Return type:: Callable

legendsimflow.hpge_pars.build_energy_res_func_dict(l200data, metadata, runid, *, hit_tier_name='hit', energy_res_pars=None)¶

Build energy resolution functions for each HPGe detector in a LEGEND-200 run.

Return type:

dict[str, Callable]

Returns:

Mapping of HPGe name to energy resolution function (FWHM), where energy is
expected in units of keV.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
energy_res_pars (dict | AttrsDict | None) – from lookup_energy_res_metadata().

legendsimflow.hpge_pars.build_energy_res_func_from_entry(meta)¶

Build a bound energy resolution callable from a single metadata entry.

Parameters:: meta (dict | AttrsDict) – A single detector’s energy resolution metadata, with keys expression and parameters. Same format as one value from lookup_energy_res_metadata().
Return type:: Callable
Returns:: Callable that takes energy in keV and returns FWHM in keV.

legendsimflow.hpge_pars.estimate_mean_aoe(popt, energy=1593)¶

Estimate the maximum aoe from the parameters of the current_pulse_model popt.

Return type:: float

legendsimflow.hpge_pars.fit_currmod(times_list, current_list)¶

Fit the model to multiple raw HPGe current pulses simultaneously.

Normalises each waveform by its peak amplitude and uses iminuit.Minuit to minimise the summed RMS residual across all waveforms simultaneously. Fitting multiple waveforms provides a more robust estimate of the pulse-shape parameters than fitting a single event.

Parameters:

times_list (list[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]) – list of timestep arrays, one per waveform.
current_list (list[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]) – list of current-value arrays, one per waveform.

Return type:

Returns:

Tuple of the best-fit parameters (as a NumPy array), and arrays of the
best-fit model (time and current) evaluated around the peak.

legendsimflow.hpge_pars.fit_noise_gauss(data, bins, *, fit_range=None, sigma_range=None)¶

Fit the data to a Gaussian to extract the resolution.

Performs a binned maximum likelihood fit using minuit.

Parameters:

data (TypeAliasType) – an array of the data to fit.
bins (int) – The number of bins.
fit_result – The results of the iminuit fit.
fit_range (tuple | None) – The range to use for the fit, if None this is determined from the data as +/- 5 standard deviations round the mean.
sigma_range (tuple | None) – The range of sigma values for the fit, if None is determined from the data.

Return type:

Minuit

Returns:

The minuit object holding the fit results.

legendsimflow.hpge_pars.get_current_pulse(raw_file, lh5_group, idx, dsp_config, dsp_output='curr_av', align='tp_aoe_max')¶

Extract the current pulse.

Parameters:

raw_file (Path | str) – path to the raw tier file.
lh5_group (str) – where to find the waveform table.
idx (int) – the index of the waveform to read.
dsp_config (str) – the dspeed configuration file defining the DSP processing chain to estimate the current pulse.
dsp_output (str) – the name of the DSP output corresponding to the current pulse.
align (str) – DSP value around which the pulses are aligned.

Return type:

legendsimflow.hpge_pars.get_current_pulses(raw_file_idx_pairs, lh5_group, dsp_config, dsp_output='curr_av', align='tp_aoe_max')¶

Extract current pulses for multiple events.

Calls get_current_pulse() for each (raw_file, idx) pair and returns the results as two parallel lists.

Parameters:

raw_file_idx_pairs (list[tuple[Path | str, int]]) – list of (raw_file, idx) pairs.
lh5_group (str) – where to find the waveform table.
dsp_config (str | None) – the dspeed configuration file defining the DSP processing chain to estimate the current pulse.
dsp_output (str) – the name of the DSP output corresponding to the current pulse.
align (str | None) – DSP value around which the pulses are aligned.

Return type:

tuple[list[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]], list[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]]

Returns:

times_list – list of timestep arrays.
current_list – list of current-value arrays.

legendsimflow.hpge_pars.get_noise_maxima_and_sample(raw_files, hit_files, lh5_group, dsp_config, dsp_output, template, *, norm=1, sample_size=100, threshold=5, maximum_number=None, energy_var='cuspEmax_cal')¶

Compute waveform maxima on-the-fly, keeping only a small sample in memory.

This avoids storing all noise waveforms at once. Instead, it iterates through waveforms, computes the maximum of waveform + template for each, and only retains the first sample_size waveforms for plotting.

Parameters:

raw_files (list) – List of paths to raw files.
hit_files (list) – List of paths to hit files.
lh5_group (str) – The name of the lh5_group to find the waveform table in.
dsp_config (str) – the dspeed configuration file defining the DSP processing chain to estimate the current pulse.
dsp_output (str) – the name of the DSP output corresponding to the current pulse.
template (TypeAliasType) – the current-pulse template waveform.
norm (float) – normalisation for the template.
sample_size (int) – number of waveforms to keep for plotting.
threshold (float) – energy threshold to apply to select the noise waveforms.
maximum_number (int | None) – maximum number of waveforms to process.
energy_var (str) – the name of the energy variable to use for thresholding.

Return type:

tuple[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]

Returns:

sample_wfs – 2D array of the first sample_size waveforms (for plotting).
a_max – 1D array of the maximum of waveform + template for each waveform.

legendsimflow.hpge_pars.get_waveform_maxima(template, noise_wfs, *, norm=1)¶

Extract the maximum of each waveform based on combining the template with each waveform in noise_wfs.

Note

The length of the template must be the same as the waveforms in noise_wfs

Parameters:

template (TypeAliasType) – The template of the waveform to use.
noise_wfs (TypeAliasType) – 2D array of each noise waveform.
norm (float) – The normalisation for the template.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

legendsimflow.hpge_pars.lookup_aoe_res_metadata(l200data, metadata, runid, *, hit_tier_name='hit', pars_db=None)¶

Lookup the measured A/E resolution metadata from LEGEND-200 data.

The metadata refers to the following model:

\[\sigma_\text{A/E}(E) = \sqrt{a + (b/E)^c}\]

where $E$ is in keV.

Return type:

Returns:

Mapping of HPGe name to metadata dictionary.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
pars_db (TextDB | None) – optional existing non-lazy instance of TextDB(".../path/to/prod/generated/par_{hit_tier_name}").

legendsimflow.hpge_pars.lookup_currmod_fit_data(hit_files, lh5_group, ewin_center=1593, ewin_width=10, max_waveforms=1, get_drift_time=True)¶

Extract the indices of the events to fit.

Considers events with abs(A/E) < 1.5 and finds up to max_waveforms events closest to the median drift time. Returns a list of (event_index, file_index) pairs, sorted from closest to farthest from the median, with at most max_waveforms entries, together with the full and selected drift-time arrays for diagnostic purposes.

Parameters:

hit_files (list[str | Path]) – tier-hit files used to determine the best indices.
lh5_group (str) – where the tier-hit data is found in the files.
ewin_center (float) – center of the energy window to use for the event search (same units as in data).
ewin_width (float) – width of the energy window to use for the event search (same units as in data).
max_waveforms (int) – maximum number of waveforms to return.
get_drift_time (bool) – Read also drift time to select waveforms.

Return type:

tuple[list[tuple[int, int]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]

Returns:

pairs – list of (event_index, file_index) tuples, sorted by proximity to the median drift time.
all_dts – all drift-time values for events passing the energy and A/E cuts.
selected_dts – drift-time values for the selected subset of events.

legendsimflow.hpge_pars.lookup_currmod_fit_inputs(l200data, metadata, runid, hpge, hit_tier_name='hit', max_waveforms=100)¶

Find raw files, event indices and the DSP configuration file.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hpge (str) – name of the HPGe detector
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
max_waveforms (int) – maximum number of waveforms to return.

Return type:

tuple[list[tuple[Path, int]], Path, ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]

Returns:

raw_wf_pairs – list of (raw_file, event_index) pairs, up to max_waveforms.
dsp_cfg_file – path to the DSP configuration file.
all_dts – all drift-time values for events passing the energy and A/E cuts.
selected_dts – drift-time values for the selected subset of events.

legendsimflow.hpge_pars.lookup_energy_res_metadata(l200data, metadata, runid, *, hit_tier_name='hit', pars_db=None)¶

Lookup the measured HPGe energy resolution metadata from LEGEND-200 data.

The metadata refers to the following model:

\[\text{FWHM}(E) = \sqrt{a + bE}\]

where $E$ is in keV.

Return type:

Returns:

Mapping of HPGe name to metadata dictionary.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
pars_db (TextDB | None) – optional existing non-lazy instance of TextDB(".../path/to/prod/generated/par_{hit_tier_name}").

legendsimflow.hpge_pars.lookup_file_paths(l200data, runid, hit_tier_name)¶

Lookup the paths to the hit and raw files.

Return type:: AttrsDict

legendsimflow.hpge_pars.lookup_psd_cut_values(l200data, metadata, runid, *, hit_tier_name='hit', pars_db=None)¶

Lookup the measured PSD cut values from LEGEND-200 data.

Return type:

Returns:

Mapping of HPGe name to metadata dictionary.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
pars_db (TextDB | None) – optional existing non-lazy instance of TextDB(".../path/to/prod/generated/par_{hit_tier_name}").

legendsimflow.hpge_pars.plot_currmod_fit_result(t, A, model_t, model_A)¶

Plot the best fit results.

Return type:: tuple

legendsimflow.hpge_pars.plot_dt_selection(all_dts, selected_dts)¶

Plot the drift-time distribution and highlight the selected waveforms.

Draws a histogram of all drift-time values (passing the energy and A/E cuts) using the hist package and overlays a shaded band that spans the range of drift times of the events chosen for the current-pulse fit.

Parameters:

all_dts (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]) – Drift-time values for every event that passes the energy and A/E cuts.
selected_dts (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]) – Drift-time values for the subset of events selected for fitting.

Return type:

Returns:

fig – The matplotlib.figure.Figure.
ax – The matplotlib.axes.Axes.

legendsimflow.hpge_pars.plot_gauss_fit(data, fit_result, fit_range=None, bins=100, nominal_val=None)¶

Plot the result of the Gaussian fit.

Parameters:

data (TypeAliasType) – an array of the data to fit.
fit_result (Minuit) – the result of the Gaussian fit.
bins (int) – The number of bins.
fit_range (tuple | None) – The range to use for the fit, if None this is determined from the data as +/- 5 standard deviations round the mean.
nominal_val (float | None) – The nominal mean to add as a line on the plot.

Return type:

legendsimflow.hpge_pars.plot_noise_waveforms(noise, temp, norm=1)¶

Plot the waveforms with noise and the noise alone.

Return type:: tuple

legendsimflow.metadata module¶

legendsimflow.metadata._get_lh5_table(metadata, fname, hpge, tier, runid)¶

The correct LH5 table path.

Determines the correct path to a hpge detector table in tier tier.

Return type:: str

legendsimflow.metadata.decode_psd_usability(psd_usability_code)¶

Decode the PSD usability (see encode_psd_usability()).

Return type:: str

legendsimflow.metadata.decode_usability(usability_code)¶

Decode the HPGe usability (see encode_usability()).

Return type:: str

legendsimflow.metadata.encode_psd_usability(psd_usability)¶

Encode the PSD usability in an int.

Return type:: int

legendsimflow.metadata.encode_usability(usability)¶

Encode the HPGe usability in an int.

Return type:: int

legendsimflow.metadata.expand_runlist(metadata, runlist)¶

Expands a runlist as passed to the Simflow configuration.

A runlist is a list of:

runids in the form accepted by is_runid();
runlist DB queries in the form <tag>.<datatype>.<period> (see query_runlist_db()).

Return type:: list[str]

legendsimflow.metadata.extract_integer(file_path)¶

Read a single integer from a file, stripping surrounding whitespace.

Return type:: int

legendsimflow.metadata.get_runlist(config, simid)¶

Gets the runlist assigned to a simulation.

If not overridden in the hit-tier simconfig, returns the global runlist stored in config.runlist.

Return type:: list[str]

legendsimflow.metadata.get_sanitized_fccd(metadata, det_name)¶

Return the FCCD value for det_name, falling back to 1 mm if the FCCD field is absent.

Parameters:

metadata (LegendMetadata) – LEGEND metadata database.
det_name (str) – Detector name.

Return type:

float

legendsimflow.metadata.get_simconfig(config, tier, simid=None, field=None)¶

Return the simulation configuration for the given tier and simid.

Raise SimflowConfigError if any key is not found.

Parameters:

config (AttrsDict) – Simflow configuration object.
tier (str) – Tier name.
simid (str | None) – Simulation identifier.
field (str | None) – If not None, return the value of this key in the simconfig.

Return type:

legendsimflow.metadata.get_tier_settings(config, tier)¶

Return the settings block for tier and the current experiment.

Return type:: AttrsDict

legendsimflow.metadata.get_vtx_simconfig(config, simid)¶

Get the vertex generation configuration for a stp-tier simid.

Returns the vtx-tier generator requested by the stp-tier simulation with identifier simid.

Parameters:

config (AttrsDict) – Snakemake config.
simid (str) – simulation identifier.

Return type:

legendsimflow.metadata.is_runid(runid)¶

Whether a runid (run identifier) is correctly formatted.

It should be in the form <experiment>-<period>-<run>-<datatype>/XXX-pNN-rMMM-AAA where XXX is any alphanumeric experiment identifier.

Return type:: bool

legendsimflow.metadata.is_simid(simid)¶

Whether a simid (simulation identifier) is correctly formatted.

A valid simid must consist entirely of word characters (letters, digits, underscores) and hyphens, matching the pattern [-\w]+. Dots and other special characters are not allowed; in particular, dots are forbidden because they are used as the delimiter in the simlist format <tier>.<simid>.

Return type:: bool

legendsimflow.metadata.parse_runid(runid)¶

Extract runid fields.

Returns the experiment, period, run and datatype as a tuple. Period and run are integers.

Return type:: (str, int, int, str)

legendsimflow.metadata.query_runlist_db(metadata, query)¶

Query the runlist DB stored in legend-datasets.

Run expressions of the form r00n..r00m are automatically expanded into full run lists. If for example metadata.datasets.runlists.valid.phy.p02 == "r000..r002":

>>> query_runlist_db(metadata, "valid.phy.p02")
["l200-p02-r000-phy", "l200-p02-r001-phy", "l200-p02-r002-phy"]

Parameters:

metadata (LegendMetadata) – LEGEND metadata instance.
query (str) – expression in the form <tag>.<datatype>.<period> (see contents of runlists.yaml in legend-datasets.

Return type:

list[str]

legendsimflow.metadata.reference_cal_run(metadata, runid)¶

The reference calibration run for runid.

Warning

This function does not account for dataflow overrides (e.g. calibration back-applying)!

Return type:: str

legendsimflow.metadata.runinfo(metadata, runid)¶

Get the datasets.runinfo entry for a LEGEND run identifier.

Parameters:

metadata (LegendMetadata) – LEGEND metadata database.
runid (str) – a run identifier in the format <experiment>-<period>-<run>-<datatype>.

Return type:

str

legendsimflow.metadata.simpars(metadata, par, runid, experiment, default=<object object>)¶

Extract simflow parameters for a certain LEGEND run.

Queries the simflow parameters database stored under simprod.config.pars by experiment name experiment, parameter name par and LEGEND run identifier runid.

Parameters:

metadata (LegendMetadata) – LEGEND metadata database.
par (str) – name of directory under metadata.simprod.config.pars.{experiment}. Can be a nested property, as in e.g. geds.opv.value. . and / are allowed separators.
runid (str) – a run identifier in the format <experiment>-<period>-<run>-<datatype>.
experiment (str) – experiment identifier (e.g. l200cfg01, l1000dsg01). Selects the experiment-level subdirectory under simprod/config/pars/.
default (object) – value to return when the parameter directory is not found in the database or no validity entry matches runid. If not provided, such cases raise KeyError or LookupError. Other errors (e.g. malformed YAML) are always re-raised regardless of this argument.

Return type:

legendsimflow.metadata.smk_hash_simconfig(config, wildcards, field=None, ignore=None, **kwargs)¶

Get the dictionary hash for use in Snakemake rules.

Parameters:

config (AttrsDict) – Snakemake config.
wildcards (Wildcards) – Snakemake wildcards object.
field (str | None) – If not None, return the value of this key in the simconfig.
ignore (list | None) – Exclude these fields from the hash.
kwargs – provide a value for wildcards that might not be present in wildcards.

Return type:

str

legendsimflow.metadata.usability(metadata, det_name, runid, default=None)¶

Get the usability for analysis of det_name in run runid.

Looks for the analysis.usability metadata field in the channel map. By default, an error is thrown if no information is found. If default is set to a non-None value, it will be returned.

Return type:: str

legendsimflow.metadata.validate_simconfig_keys(simconfig, block=None)¶

Validate that all top-level keys of simconfig are valid simids.

Raises SimflowConfigError listing every invalid key if any are found.

Parameters:

simconfig (Mapping) – Dictionary whose top-level keys are expected to be simids (as loaded from a simconfig.yaml file).
block (str | None) – Optional config block label included in the error message for context.

Return type:

legendsimflow.nersc module¶

legendsimflow.nersc.dvs_ro(config, path)¶

Turn /global/... file paths to /dvs_ro/... on NERSC.

The input type is preserved.

Note

config must contain a nersc key mapped to a dictionary containing a dvs_ro: True key.

Return type:: str | Path | list[str | Path]

legendsimflow.nersc.dvs_ro_snakemake(snakemake)¶

Swap the read-only filesystem path in all Snakemake input files.

This function is meant to be used in Snakemake scripts, where the Snakemake rule attributes (input, output, …) are accessible from the special object snakemake.

Warning

This function mutates the input snakemake object in place.

legendsimflow.partitioning module¶

legendsimflow.partitioning.partition_simstat(n_events, n_events_part, runlist)¶

Partition the simulation event statistics according to run livetime.

Returns the following dictionary:

job_000:
  l200-p03-r001-phy: [0, 300]  # interval includes its edges
  l200-p03-r002-phy: [301, 456]
job_001:
  l200-p03-r002-phy: [0, 200]
  l200-p03-r003-phy: [201, 156]
...

where the number of events of each job is partitioned in runs, such that the global event partitioning in n_events_part is respected.

Parameters:

n_events (Mapping[str, int]) –
mapping of number of simulation events and simulation job.
```
job_0000: 5000
job_0001: 7000
...
```
n_events_part (Mapping[str, int]) –
mapping of fraction of total number of simulation events (summed over all jobs) per considered run, with weights equal to the run livetime fraction.
```
l200-p03-r001-phy: 300
l200-p03-r002-phy: 456
...
l200-<...>: tot_n_events
```
runlist (Iterable[str]) – list of runs in the form <experiment>-<period>-<run>-<datatype>.

Return type:

dict[str, dict[str, list[int]]]

legendsimflow.patterns module¶

Prepare pattern strings to be used in Snakemake rules.

Extra keyword arguments are typically interpreted as variables to be substituted in the returned (structure of) strings. They are passed to snakemake.io.expand().

Definitions:

simid: string identifier for the simulation run
simjob: one job of a simulation run (corresponds to one macro file and one output file)
jobid: zero-padded integer (i.e., a string) used to label a simulation job

legendsimflow.patterns._expand(pattern, keep_list=False, **kwargs)¶

Expand a path pattern with Snakemake wildcards.

Returning a scalar unless keep_list is set.

Return type:: str | Path

legendsimflow.patterns.benchmark_dtmap_filename(config, **kwargs)¶

The benchmark file path for drift time map generation for a detector and voltage.

Return type:: Path

legendsimflow.patterns.benchmark_filename(config, **kwargs)¶

Formats a benchmark file path for a simid and jobid.

Return type:: Path

legendsimflow.patterns.benchmark_tier_cvt_filename(config, **kwargs)¶

The benchmark file path for the cvt tier build for a simid.

Return type:: Path

legendsimflow.patterns.benchmark_tier_pdf_filename(config, **kwargs)¶

The benchmark file path for the pdf tier build for a simid.

Return type:: Path

legendsimflow.patterns.geom_config_filename(config, **kwargs)¶

The path to the geometry configuration YAML file for a tier and simid.

Return type:: Path

legendsimflow.patterns.geom_gdml_filename(config, **kwargs)¶

The path to the GDML geometry file for a tier and simid.

Return type:: Path

legendsimflow.patterns.geom_log_filename(config, **kwargs)¶

The log file path for geometry generation for a tier and simid.

Return type:: str

legendsimflow.patterns.input_currmod_evt_idx_file(config, **kwargs)¶

The path to the event index file used to extract current pulse waveforms.

Return type:: Path

legendsimflow.patterns.input_simid_filenames(config, n_macros, **kwargs)¶

Returns the full path to n_macros input files for a simid.

Needed by script that generates all macros for a simid.

Return type:: list[Path]

legendsimflow.patterns.input_simjob_filename(config, **kwargs)¶

Returns the full path to the input file for a simid, tier and job index.

Return type:: Path

legendsimflow.patterns.log_currmod_filename(config, **kwargs)¶

The log file path for current pulse model extraction for a detector and runid.

Return type:: Path

legendsimflow.patterns.log_dirname(config)¶

Directory where log files are stored.

Return type:: Path

legendsimflow.patterns.log_dtmap_filename(config, **kwargs)¶

The log file path for drift time map generation for a detector and voltage.

Return type:: Path

legendsimflow.patterns.log_eresmod_filename(config, **kwargs)¶

The log file path for HPGe observables model extraction for a runid.

Return type:: Path

legendsimflow.patterns.log_filename(config, **kwargs)¶

Formats a log file path for a simid and jobid.

Return type:: Path

legendsimflow.patterns.log_simstat_part_filename(config, **kwargs)¶

The log file path for simulation event statistics partitioning for a simid.

Return type:: Path

legendsimflow.patterns.log_tier_cvt_filename(config, **kwargs)¶

The log file path for the cvt tier build for a simid.

Return type:: Path

legendsimflow.patterns.log_tier_pdf_filename(config, **kwargs)¶

The log file path for the pdf tier build for a simid.

Return type:: Path

legendsimflow.patterns.output_aoeresmod_filename(config, **kwargs)¶

The path to the HPGe A/E resolution model parameter file for a runid.

Return type:: Path

legendsimflow.patterns.output_currmod_filename(config, **kwargs)¶

The path to the per-detector HPGe current pulse model parameter file.

Return type:: Path

legendsimflow.patterns.output_currmod_merged_filename(config, **kwargs)¶

The path to the merged HPGe current pulse model parameter file for a runid.

Return type:: Path

legendsimflow.patterns.output_dtmap_filename(config, **kwargs)¶

The path to the HPGe drift time map file for a detector and voltage.

Return type:: Path

legendsimflow.patterns.output_dtmap_merged_filename(config, **kwargs)¶

The path to the merged HPGe drift time map file for a runid.

Return type:: Path

legendsimflow.patterns.output_eresmod_filename(config, **kwargs)¶

The path to the HPGe energy resolution model parameter file for a runid.

Return type:: Path

legendsimflow.patterns.output_psdcuts_filename(config, **kwargs)¶

The path to the HPGe PSD cut values file for a runid.

Return type:: Path

legendsimflow.patterns.output_simid_filenames(config, n_macros, **kwargs)¶

Returns the full path to n_macros output files for a simid.

legendsimflow.patterns.output_simjob_filename(config, **kwargs)¶

Returns the full path to the output file for a simid, tier and job index.

Return type:: Path

legendsimflow.patterns.output_simjob_regex(config, **kwargs)¶

A glob-style regex matching all output files for a tier.

Return type:: str

legendsimflow.patterns.output_tier_cvt_filename(config, **kwargs)¶

The path to the merged cvt tier output file for a simid.

Return type:: Path

legendsimflow.patterns.output_tier_pdf_filename(config, **kwargs)¶

The path to the merged pdf tier output file for a simid.

Return type:: Path

legendsimflow.patterns.pdf_tarball_filename(config)¶

The path to the pdf tier archive tarball for the current production cycle.

The Simflow has no explicit knowledge of the production cycle name, so the name of the directory where the Simflow lives is used as a proxy.

Return type:: Path

legendsimflow.patterns.plot_currmod_filename(config, **kwargs)¶

The path to the current pulse model fit validation plot for a detector and runid.

Return type:: Path

legendsimflow.patterns.plot_dtmap_filename(config, **kwargs)¶

The path to the drift time map validation plot for a detector and voltage.

Return type:: Path

legendsimflow.patterns.plot_tier_cvt_observables_filename(config, **kwargs)¶

The path to the observable validation plot for a cvt simid.

Return type:: Path

legendsimflow.patterns.plot_tier_hit_observables_filename(config, **kwargs)¶

The path to the observable validation plot for a hit simid.

Return type:: Path

legendsimflow.patterns.plot_tier_opt_observables_filename(config, **kwargs)¶

The path to the observable validation plot for an opt simid.

Return type:: Path

legendsimflow.patterns.plot_tier_stp_vertices_filename(config, **kwargs)¶

The path to the primary vertex validation plot for a stp simid.

Return type:: Path

legendsimflow.patterns.plots_dirname(config, tier)¶

Returns the plots directory path for a tier.

Return type:: Path

legendsimflow.patterns.plots_tarball_filename(config)¶

The path to the plots archive tarball for the current production cycle.

The Simflow has no explicit knowledge of the production cycle name, so the name of the directory where the Simflow lives is used as a proxy.

Return type:: Path

legendsimflow.patterns.simjob_base_segment(config, **kwargs)¶

Formats a segment for a path including wildcards simid and jobid.

Return type:: str

legendsimflow.patterns.simstat_part_filename(config, **kwargs)¶

The path to the simulation event statistics partitioning file.

Return type:: Path

legendsimflow.patterns.tier_cvt_base_segment(config, **kwargs)¶

The base filename segment for cvt tier files for a simid.

Return type:: str

legendsimflow.patterns.tier_pdf_base_segment(config, **kwargs)¶

The base filename segment for pdf tier files for a simid.

Return type:: str

legendsimflow.patterns.vtx_filename_for_stp(config, simid, **kwargs)¶

Returns the vertices file needed for the ‘stp’ tier job, if needed.

Used as lambda function in the build_tier_stp Snakemake rule.

Return type:: Path | list

legendsimflow.plot module¶

legendsimflow.plot.decorate(fig)¶

legendsimflow.plot.n_nans(array)¶

legendsimflow.plot.plot_hist(h, ax, n_nans=None, **kwargs)¶

legendsimflow.plot.read_concat_wempty(files, table)¶

Return type:: Array | None

legendsimflow.plot.save_page(pdf, make_fig)¶

legendsimflow.plot.set_empty(ax)¶

legendsimflow.profile module¶

legendsimflow.profile._f(x)¶

Return type:: str

legendsimflow.profile._pct(x)¶

Return type:: str

legendsimflow.profile.make_profiler()¶

Return type:: tuple[Callable, Callable, Callable]

legendsimflow.psl module¶

legendsimflow.psl._check_pulse_shape_lib_keys(pulse_shape_lib)¶

Validate that the waveform map contains the required keys with correct types.

Return type:: None

legendsimflow.psl.align_waveforms_to_peak(wf_input, alignment_idx, nsamples_output_current_wfs)¶

Align an array of waveforms by shifting their maximum to a fixed index.

No normalization is performed; raw amplitudes are preserved.

Note

The output peak_indices is not the original drift time as the current waveform inherits baseline from convolution.

Parameters:

wf_input (Array | ndarray) – Input array of current waveforms
alignment_idx (int) – The index in the output array where the peak will be placed
nsamples_output_current_wfs (int) – The total length of the resulting aligned current waveforms

Return type:

tuple[ndarray, ndarray]

Returns:

shifted_wfs – 2D array of shifted waveforms
peak_indices – 1D array containing the original peak index for each current waveform

legendsimflow.psl.apply_electronics_response(wf_array, rf_kernel, batch_size=50000)¶

Vectorized convolution using FFT with batching to save memory.

Parameters:

wf_array (Array | ndarray) – Array of waveforms (all of same length)
rf_kernel (ndarray) – The response kernel (gaussian + exponential)
batch_size (int) – Number of waveforms to process at once (default is 50,000)

Return type:

Array

Returns:

convolved_wfs – The convolved waveforms as an Awkward Array

legendsimflow.psl.build_electronics_response_kernel(dt, mu_bandwidth, sigma_bandwidth, tau_rc, gaussian_only=False, *, kernel_length=600, kernel_start=-100)¶

Create the system response kernel (gaussian + exponential decay).

This is obtained by convolving a Gaussian (representing the digitizer bandwidth) with a causal exponential decay (representing the preamplifier response). The kernel is normalized to have a sum of 1.

Note

The ‘full’ mode of convolution results in a length of 2*kernel_length - 1. If gaussian_only is True, the kernel will have a length of kernel_length and will only contain the Gaussian component, since no convolution is performed.

Parameters:

dt (float) – The time step between samples in the waveform (in ns)
mu_bandwidth (float) – The mean of the Gaussian representing the digitizer bandwidth (in ns)
sigma_bandwidth (float) – The standard deviation of the Gaussian representing the digitizer bandwidth (in ns)
tau_rc (float) – The time constant of the exponential decay representing the preamplifier response (in ns)
gaussian_only (bool) – If True, only use the Gaussian component (default is False)
kernel_length (int) – The total length of the response kernel in samples (default is 600)
kernel_start (int) – The starting index of the kernel relative to the waveform (default is -100, meaning the kernel will cover from -100 to 500 samples)

Return type:

ndarray

Returns:

rf – The normalized response kernel

legendsimflow.psl.make_realistic_pulse_shape_lib(ideal_pulse_shape_lib_obj, rf_kernel, alignment_idx, nsamples_output_current_wfs, mw_pars, dt_data=1.0)¶

Apply the waveform post-processing chain to generate a realistic waveform map.

Starts from an ideal waveform map and performs the following steps:

Converts coordinates (m to mm)
Convolves with system response
Aligns by Peak Time
Calculates compensated Drift Time

Parameters:

ideal_pulse_shape_lib_obj (Mapping[str, Array | Scalar]) –
Mapping containing the ideal waveform map with coordinates and waveforms.

This should have the following format:
- r: 1D array of radial coordinates
- z: 1D array of axial coordinates
- dt: Time step between samples in the waveforms
- waveform_X: 3D array of ideal charge waveforms for angle X (shape: [n_z, n_r, n_samples])
rf_kernel (ndarray) – The system response kernel (from build_electronics_response_kernel())
alignment_idx (int) – The index in the output array where waveform peaks will be aligned
nsamples_output_current_wfs (int) – The total length of the resulting aligned current waveforms
mw_pars (dict[str, float | int]) –
Dictionary of parameters for the moving window average step, with keys:
- length: The length of the moving window (in samples)
- num_mw: The number of moving windows to use in the moving window average
- mw_type: The type of moving window to apply (see dspeed.processors.moving_window_multi for details)
dt_data (float) – The time step of the original data waveforms (in ns), used to scale the derivative.

Return type:

dict[str, Array | Scalar]

Returns:

realistic_pulse_shape_lib – Struct containing the processed realistic waveform map with the following keys:

r: 1D array of radial coordinates
z: 1D array of axial coordinates
t0: Global time offset applied to align waveforms
waveform_X: 3D array of processed current waveforms for angle X (shape: [n_r, n_z, nsamples_output_current_wfs], spatial axes reversed relative to Julia due to HDF5 column-/row-major conversion)
drift_time_X: 2D array of calculated drift times for angle X (shape: [n_r, n_z])

legendsimflow.reboost module¶

legendsimflow.reboost._cluster_photoelectrons_flat(offsets, t, a, thr)¶

Numba-accelerated clustering kernel for innermost list level.

Parameters:

offsets (ndarray) – 1D int64 array of list offsets (length = n_lists + 1).
t (ndarray) – 1D array of sorted times for all elements.
a (ndarray) – 1D array of amplitudes corresponding to times.
thr (float) – Maximum time span within a cluster.

Return type:

tuple[ndarray, ndarray, ndarray]

Returns:

out_t – Clustered times (first time in each cluster).
out_a – Clustered amplitudes (sum of amplitudes in each cluster).
counts – Number of clusters per original list.

legendsimflow.reboost._listoffset_chain(layout)¶

Extract the chain of offsets from nested ListOffsetArrays.

Parameters:

layout (Content) – An awkward array layout.

Return type:

tuple[list[ndarray], NumpyArray]

Returns:

offsets_chain – List of np.int64 arrays, one per nested list depth, from outermost to innermost.
content_numpy_layout – The final NumpyArray content node.

legendsimflow.reboost.cluster_photoelectrons(times, amps, thr)¶

Cluster photoelectrons within the instrument time resolution.

Clusters hits at axis=-1 (innermost lists) such that within each cluster the time span (last_time - first_time) does not exceed thr. This is useful for combining photoelectrons that arrive within the time resolution of the detector, treating them as a single detected event.

The output time is the first time of each cluster; the amplitude is the sum of all amplitudes in the cluster.

Parameters:

times (Array) – Awkward array of hit times. Must be sorted in ascending order within each innermost list. Sorting is the caller’s responsibility; unsorted input produces undefined behavior.
amps (Array) – Awkward array of amplitudes corresponding to times. Must have the same structure (nesting depth and list lengths) as times.
thr (float) – Maximum time span within a cluster (e.g., the detector time resolution).

Return type:

tuple[Array, Array]

Returns:

clustered_times – Awkward array with the same nesting structure, containing the first time of each cluster.
clustered_amps – Awkward array with the same nesting structure, containing the summed amplitude of each cluster.

Raises:

ValueError – If times and amps have different nesting depths or different numbers of elements.

Examples

>>> times = ak.Array([[0.0, 0.6, 1.1, 1.4, 2.3]])
>>> amps = ak.Array([[1.0, 2.0, 3.0, 4.0, 5.0]])
>>> t_out, a_out = cluster_photoelectrons(times, amps, thr=1.0)
>>> ak.to_list(t_out)
[[0.0, 1.1, 2.3]]
>>> ak.to_list(a_out)
[[3.0, 7.0, 5.0]]

legendsimflow.reboost.gauss_smear(arr_true, arr_reso)¶

Smear values with expected resolution.

Samples from gaussian and shifts negative values to a fixed, tiny positive value.

Return type:: Array

legendsimflow.reboost.get_remage_detector_uids(h5file, *, lh5_table='stp')¶

Get mapping of detector names to UIDs from a remage output file.

The remage LH5 output files contain a link structure that lets the user access detector tables by UID. For example:

├── stp · struct{det1,det2,optdet1,optdet2,scint1,scint2}
└── __by_uid__ · struct{det001,det002,det011,det012,det101,det102}
    ├── det001 -> /stp/scint1
    ├── det002 -> /stp/scint2
    ├── det011 -> /stp/det1
    ├── det012 -> /stp/det2
    ├── det101 -> /stp/optdet1
    └── det102 -> /stp/optdet2

This function analyzes this structure and returns:

{1: 'scint1',
'scint2',
'det1',
'det2',
'optdet1',
'optdet2'}

Parameters:

h5file (str | Path) – Path to remage output file.
lh5_table (str) – Name of the LH5 table group to inspect.

Return type:

dict

legendsimflow.reboost.get_remage_hit_range(tcm, det_name, uid, evt_idx_range)¶

Extract the range of remage output rows for an event range.

Queries the remage TCM (stored below /tcm in stp_file) with the input evt_idx_range = [i, j] to extract the first and last index of rows (hits) in the det_name detector table that correspond to the input event range. Returns the start index and number of rows to read after it as a tuple.

Parameters:

tcm (Array) – Time-coincidence map.
det_name (str) – name of the detector table in stp_file.
uid (int) – remage unique identifier for detector det_name.
evt_idx_range (list[int]) – [first, last] (i.e. first included, last included) index of events of interest present in the remage output file. Only positive indices are supported.

Return type:

tuple[int]

legendsimflow.reboost.get_senstables(geom, det_type=None)¶

Return type:: list[str]

legendsimflow.reboost.hpge_corrected_drift_time(chunk, dt_map, det_loc)¶

HPGe drift time heuristic corrected for crystal axis effects.

Note

This function will be moved to reboost.

Return type:: Array

legendsimflow.reboost.hpge_max_current(edep, drift_time, currmod_pars, **kwargs)¶

Calculate the maximum of the current pulse.

Parameters:

edep (Array) – energy deposited at each step.
drift_time (Array) – drift time of each energy deposit.
currmod_pars (Mapping) – dictionary storing the parameters of the current model (see reboost.hpge.psd.get_current_template())
kwargs – forwarded to reboost.hpge.psd.maximum_current().

Return type:

Array

legendsimflow.reboost.load_hpge_dtmaps(config, det_name, runid)¶

Load HPGe drift time maps from disk.

Automatically finds and loads drift time maps for crystal axes <100> <110>. If no map is found, None is returned.

Parameters:

config (AttrsDict) – Simflow configuration object.
det_name (str) – HPGe detector name.
runid (str) – Run identifier.

Return type:

dict[str, HPGeRZField] | None

Note

This function will be moved to reboost.

legendsimflow.reboost.make_output_chunk(chunk)¶

Prepare output detector table chunk for the hit tier.

Note

This function will be moved to reboost.

Return type:: Table

legendsimflow.reboost.smear_photoelectrons(array, fwhm_in_pe, rng=None)¶

Smear photoelectron pulse amplitudes.

Returns an array of gaussian distributed single-photoelectron amplitudes with the same shape of the input array.

Return type:: Array

legendsimflow.reboost.write_chunk(chunk, objname, outfile, objuid)¶

Write detector table chunks for the hit tier to disk.

Note

This function will be moved to reboost.

Return type:: None

legendsimflow.spms_pars module¶

legendsimflow.spms_pars._next_rc_evt_file(evt_files, rc_file_state)¶

Return the next evt file, cycling through the list before repeating.

Parameters:

evt_files (Sequence[str | Path]) – Ordered sequence of evt file paths to cycle through.
rc_file_state (dict[str, Any]) – Mutable state dict shared across calls. On the first call it is populated with keys order (the file list), idx (current position, int), and completed_cycle (bool, set to True once the list has been exhausted once). Subsequent calls increment idx and wrap it when all files have been visited.

Return type:

str | Path

Returns:

str | Path – Path to the next evt file to process.

legendsimflow.spms_pars._process_spms_windows(time, energy, win_ranges, time_domain_ns, min_sep_ns)¶

Process SiPM data within specified window ranges.

Each (start, end) range in win_ranges is tiled with non-overlapping windows of length time_domain_ns[1] - time_domain_ns[0], separated by min_sep_ns. PE hits falling inside each window are selected and their times are shifted so that the window start maps to time_domain_ns[0].

The function works on arrays of any rank. For N-D input (e.g. shape (n_events, n_channels, n_pe)), each extracted window produces one output block of the same shape along all but the innermost axis, with only the PE dimension filtered. Blocks from all windows are then concatenated along axis=0, so M source events processed through W windows yield W * M output entries.

Parameters:

time (Array) – PE hit times. Any shape; the innermost axis is the PE axis.
energy (Array) – PE energies, same shape as time.
win_ranges (Sequence[tuple[float, float]]) – List of (start, end) tuples defining the time ranges to tile, in nanoseconds.
time_domain_ns (tuple[float, float]) – Target time range (start, end) for output times in nanoseconds. The window length is end - start. E.g. (-1000, 5000) selects 6000 ns windows and maps their start to -1000 ns.
min_sep_ns (float) – Minimum gap between consecutive windows in nanoseconds.

Return type:

tuple[Array, Array]

Returns:

npe – PE energies extracted from all windows, concatenated along axis=0.
t0 – PE times relative to each window’s start (bounded by time_domain_ns), same shape as npe.

legendsimflow.spms_pars.build_rc_evt_index_lookup(rc_evt_files)¶

Build per-file trigger index lookup for RC extraction.

Parameters:

rc_evt_files (Sequence[str | Path]) – Evt-tier files to index.

Return type:

dict[str, dict[str, ndarray]]

Returns:

dict – Dictionary keyed by file path (as string) with entries:

forced_pulser: row indices of forced/pulser, non-muon events
geds: row indices of HPGe-triggered, non-muon events

legendsimflow.spms_pars.get_chunk_rc_data(rc_evt_files, rc_file_state, chunk_size, rc_index_lookup)¶

Assemble random-coincidence data for one chunk.

Parameters:

rc_evt_files (Sequence[str | Path]) – Ordered sequence of evt files that can provide random-coincidence data. Must not be empty.
rc_file_state (dict[str, Any]) – Mutable state for file cycling and carryover between chunks. Expected keys are created/updated internally (e.g. order, idx, counts, carryover).
chunk_size (int) – Number of random-coincidence events requested for the current chunk. Must be positive.
rc_index_lookup (dict[str, dict[str, ndarray]]) – Precomputed mapping from evt file to trigger-event indices, built with build_rc_evt_index_lookup().

Return type:

Array

Returns:

ak.Array – Random-coincidence data for one chunk with fields rawid (chunk_size, n_channels), npe (chunk_size, n_channels, n_pe) and t0 (same shape as npe).

legendsimflow.spms_pars.get_rc_evt_mask(evt_file)¶

Compute boolean event masks for random-coincidence extraction.

Parameters:

evt_file (str | Path) – Path to the evt-tier LH5 file.

Return type:

tuple[Array, Array]

Returns:

mask_forced_pulser – Boolean mask selecting forced-trigger and pulser events, excluding muon coincidences.
mask_geds – Boolean mask selecting HPGe-triggered events, excluding muon coincidences.

legendsimflow.spms_pars.get_rc_library(evt_file, rc_index_lookup, time_domain_ns=(-1000, 5000), min_sep_ns=6000, ext_trig_range_ns=((1000, 44000), (55000, 100000)), ge_trig_range_ns=((1000, 44000),))¶

Extract a library of random-coincidence (RC) events from an evt file.

To be used in correcting the SiPM photoelectrons with random coincidences.

For each qualifying trigger event, the SiPM waveform is divided into multiple non-overlapping time windows (see _process_spms_windows). Each window yields one independent RC event, so the total number of entries in the returned library is n_source_events x n_windows. The per-channel structure is preserved: npe and t0 have shape (n_rc_events, n_channels, n_pe) and rawid has shape (n_rc_events, n_channels), matching the spms/* layout of the evt tier.

Two trigger categories are processed with different window ranges to avoid contaminating RC events with physics signal:

Forced/pulser triggers: full waveform outside the central trigger window, ((1_000, 44_000), (55_000, 100_000)) ns by default.
HPGe/LAr triggers: first half only (before the trigger), ((1_000, 44_000),) ns by default.

Both categories are filtered to exclude muon coincidences.

Parameters:

evt_file (str | Path) – Event tier data file.
rc_index_lookup (dict[str, dict[str, ndarray]]) – Precomputed mapping from evt file to trigger-event indices, built with build_rc_evt_index_lookup.
time_domain_ns (tuple[float, float]) – Target time range (start, end) for output times in nanoseconds. E.g., (-1000, 5000) means output times will be in [-1000, 5000]. Default: (-1_000, 5_000).
min_sep_ns (float) – Minimal separation time between two windows in a trace, in nanoseconds. Default 6000.
ext_trig_range_ns (Sequence[tuple[float, float]]) – Window ranges for forced/pulser trigger events, as a sequence of (start, end) pairs in nanoseconds. Default: ((1_000, 44_000), (55_000, 100_000)).
ge_trig_range_ns (Sequence[tuple[float, float]]) – Window ranges for HPGe/LAr trigger events, as a sequence of (start, end) pairs in nanoseconds. Default: ((1_000, 44_000),).

Return type:

Array

Returns:

ak.Array – Record array with fields rawid (channel UIDs, shape (n_rc_events, n_channels)), npe (PE energies, shape (n_rc_events, n_channels, n_pe)), and t0 (times relative to each window start, same shape as npe). Channel ordering within each event matches the source spms/rawid ordering in evt_file.

legendsimflow.spms_pars.lookup_evt_files(l200data, runid, evt_tier_name)¶

Look up the evt tier file paths for a given run.

Parameters:

l200data (str | Path) – Root path to the LEGEND-200 data directory.
runid (str) – Run identifier string (e.g. "l200-p16-r008-phy").
evt_tier_name (str) – Name of the evt tier (e.g. "evt").

Return type:

list[Path]

Returns:

list[Path] – Matching evt-tier file paths for the given run.

legendsimflow.tcm module¶

legendsimflow.tcm.build_tcm(hit_files, out_file)¶

Re-create the TCM table from remage.

Use remage fields evtid and t0 (the latter is assumed to be in nanoseconds) to build coincidences. The settings are identical to the remage built-in TCM settings.

Return type:: None

legendsimflow.tcm.merge_stp_n_opt_tcms(tcm_stp, tcm_opt, *, scintillator_uid)¶

Merge tcm_opt rows into tcm_stp at the scintillator uid.

For each axis=0 row of tcm_stp, if tcm_stp.table_key contains scintillator_uid, replace that single element by splicing in the next row of tcm_opt.table_key. The same splice is applied to row_in_table using the corresponding tcm_opt.row_in_table, preserving alignment between table_key[i][j] and row_in_table[i][j].

Parameters:

tcm_stp – Awkward record arrays with fields table_key and row_in_table.
tcm_opt – Awkward record arrays with fields table_key and row_in_table.
scintillator_uid – Scalar value in tcm_stp.table_key marking where to splice in tcm_opt, i.e. the UID of the scintillator table.

Returns:

ak.Array – Record array with the same length as tcm_stp.

legendsimflow.tcm.merge_stp_n_opt_tcms_chunk(tcm_stp, tcm_opt, *, scintillator_uid)¶

Chunk-level implementation of merge_stp_n_opt_tcms().

This function assumes tcm_opt contains exactly as many rows as there are rows in tcm_stp that contain scintillator_uid, in the same order.

legendsimflow.tcm.merge_stp_n_opt_tcms_to_lh5(stp_file, opt_file, out_file, *, scintillator_uid, buffer_len='50*MB')¶

Stream-merge STP and OPT TCMs and write unified TCM to disk in chunks.

Iterates over stp_file:/tcm using LH5Iterator. For each chunk, reads only the required number of OPT TCM rows (those corresponding to STP rows containing the scintillator_uid placeholder) via lh5.read_as with explicit indices. The merged output is appended to out_file:/tcm.

Return type:: None

legendsimflow.utils module¶

legendsimflow.utils._curve_fit_popt_to_dict(popt)¶

Get the scipy.optimize.curve_fit() parameter results as a dictionary.

Return type:: dict

legendsimflow.utils._make_path(d)¶

legendsimflow.utils._merge_defaults(user, default)¶

Recursively merge default values into user configuration.

Merges values from default into user without overwriting existing user values. For nested dictionaries, performs recursive merge.

Parameters:

user (dict) – User configuration dictionary.
default (dict) – Default configuration dictionary.

Return type:

dict

Returns:

dict – Merged configuration dictionary with user values taking precedence.

legendsimflow.utils.add_field_string(name, chunk, data)¶

Add a string to the output table.

This is done in an HDF5-friendly way by storing the runid as a fixed-length string.

Return type:: None

legendsimflow.utils.apply_path_defaults(paths)¶

Set default values for optional path keys derived from paths.pars.

The following keys are optional in the Simflow configuration and, if absent, are derived from paths.pars:

geom: defaults to {paths.pars}/geom
dtmaps: defaults to {paths.pars}/hpge/dtmaps

Parameters:: paths (dict) – The paths section of the Simflow configuration, with all values already converted to pathlib.Path objects.
Return type:: None

legendsimflow.utils.check_nans_leq(array, name, less_than_frac=0.1, min_entries=100)¶

Raise an exception if the fraction of NaN values in array is above threshold.

Parameters:

array (TypeAliasType) – the array to analyze.
name (str) – array name for exception message.
less_than_frac (float) – raise exception if fraction of NaNs is above this threshold.
min_entries (int) – minimum number of entries required to apply the fraction check. With fewer entries, a warning is logged instead of raising an exception.

Return type:

legendsimflow.utils.get_dict_value(d, field, default=None)¶

Return a value from a nested dictionary using a dot-separated field path.

Parameters:

d (dict) – Dictionary to query.
field (str) – Dot-separated path (e.g. "a.b.c").
default (Any | None) – Value returned if the field is not found. Defaults to None.

Return type:

Any

legendsimflow.utils.get_evt_tier_name(l200data)¶

Extract the name of the evt tier for this production cycle.

If the pet tier is present this is used else the evt tier is used.

Parameters:: l200data (str) – Path to the production cycle of l200 data.
Return type:: str

legendsimflow.utils.get_hit_tier_name(l200data)¶

Extract the name of the hit tier for this production cycle.

If the pht tier is present this is used else the hit tier is used.

Parameters:: l200data (str) – Path to the production cycle of l200 data.
Return type:: str

legendsimflow.utils.hash_dict(d)¶

Compute the hash of a Python dict.

Return type:: str

legendsimflow.utils.init_generated_pars_db(l200data, tier=None, lazy=True)¶

Initializes the pars database from a LEGEND-200 data production.

Parameters:

l200data (str | Path) – path to LEGEND-200 data production cycle.
tier (str | None) – pars subfolder referring to a tier. If None, return the full par database.
lazy (bool) – see TextDB.

Return type:

TextDB

legendsimflow.utils.init_simflow_context(raw_config, workflow=None, logger=None)¶

Pre-process and sanitize the Simflow configuration.

Returns a dictionary with useful objects to be used in the Simflow Snakefiles (i.e. the “context”):

set default configuration fields;
substitute $_ and environment variables;
convert to AttrsDict;
cast filesystem paths to pathlib.Path;
clone and configure legend-metadata;
attach a LegendMetadata instance to the Simflow configuration;
export important environment variables.

Parameters:

raw_config (dict | AttrsDict | str | Path) – Simflow configuration mapping or path to a configuration file.
workflow – Snakemake workflow instance. If None, occurrences of $_ in the configuration will be replaced with the path to the current working directory.
logger (Logger | None) – Logger to use for status messages (e.g. the Snakemake logger when called from a Snakefile). Defaults to the module logger.

Return type:

legendsimflow.utils.link_external_paths(config, workflow_basedir, *, logger=None)¶

Symlink user-overridden paths back into their default locations.

When the user has manually overridden a paths.<key> entry in simflow-config.yaml to point outside the current production cycle (e.g. reusing the hit tier from another production), this function creates a symlink at the canonical default location pointing to the override. Snakemake rules keep reading config.paths.<key> directly; the symlink only exists so the prod cycle’s own generated/ tree shows the external data in the standard layout.

The default locations are computed from the simflow’s own template at <workflow_basedir>/../templates/default.yaml, with $_ substituted to the current working directory (the prod cycle root in the standard Snakemake invocation). Created symlinks are relative to the destination’s parent, keeping the prod cycle portable.

For each supported key:

if config.paths.<key> resolves to the default location, no override is in effect; any stale symlink at that location is removed;
otherwise a symlink is created (or refreshed) at the default location pointing to config.paths.<key>.

Real directories at the default location are never touched. The call is a safe no-op when <workflow_basedir>/../templates/default.yaml does not exist.

Supported keys (relative to paths): every tier.<name>, pars, macros, geom and dtmaps. Default paths for geom and dtmaps fall back to <pars>/geom and <pars>/hpge/dtmaps when absent from the template (mirroring apply_path_defaults()).

Parameters:

config (AttrsDict) – Simflow configuration as returned by init_simflow_context().
workflow_basedir (str | Path) – Snakemake workflow basedir (workflow.basedir in a Snakefile). Used only to locate the simflow’s default template.
logger (Logger | None) – Logger to use for status messages (e.g. the Snakemake logger when called from a Snakefile). Defaults to the module logger.

Return type:

legendsimflow.utils.lookup_dataflow_config(l200data)¶

Finds and loads the dataflow configuration file.

Parameters:

l200data (Path | str) – The path to the L200 data production cycle.

Return type:

Returns:

the dataflow configuration file as a dictionary with substitutions
performed.

legendsimflow.utils.sanitize_dict_with_defaults(read_dict, defaults)¶

Swap-in defaults when values are illegal.

Return type:: dict

legendsimflow.utils.setup_logdir_link(config)¶

Set up the timestamp-tagged directory for the workflow log files.

Parameters:

config (AttrsDict) – Simflow configuration object.
proctime – Processing time identifier for the log directory.

Return type: