legendsimflow package ¶

legendsimflow.archive.create_plots_tarball(generated_dir, output, prefix)¶

Archive all plots/ directories under generated_dir into a .tar.xz.

Parameters:

generated_dir (Path) – The generated/ directory of the production cycle.
output (Path) – Path to write the .tar.xz tarball.
prefix (str) – Prefix directory name inside the archive (e.g. prod-v1-plots).

Return type:

legendsimflow.awkward module¶

legendsimflow.awkward.ak_isin(elements, test_elements, *, assume_unique=False)¶

legendsimflow.cli module¶

legendsimflow.cli._partition(xs, n)¶

legendsimflow.cli.snakemake_nersc_batch_cli()¶: Implementation of the snakemake-nersc-batch CLI.

legendsimflow.cli.snakemake_nersc_cli()¶: Implementation of the snakemake-nersc CLI.

legendsimflow.commands module¶

legendsimflow.commands._confine_by_volume(is_surface, volume, surface_max_intersections=100)¶

Helper function to generate confinement macro lines for a given volume.

Return type:: list[str]

legendsimflow.commands._get_full_name(node)¶

Get the name of the function being called, including the module path if it’s an attribute access.

Return type:: str

legendsimflow.commands.get_confinement_from_function(function_string, reg)¶

Get the confinement commands for a function defined in the GDML.

The function string must correspond to the following format:

module.function(<...>, arg=...)

where <...> will be replaced with the pyg4ometry.geant4.Registry instance for the geometry.

Parameters:

function_string – String describing the function to be used.
reg – The pyg4ometry registry containing the geometry information.

Returns:

list[str] – A list of remage confinement commands corresponding to the function definition.

Return type:

list[str]

legendsimflow.commands.make_remage_macro(config, simid, tier='stp', geom=None)¶

Render the remage macro for a given simulation and write it to disk.

This function reads the simulation configuration for the provided tier/simid, assembles the macro substitutions (e.g. GENERATOR, CONFINEMENT) using values and references defined under config.metadata, renders the specified macro template, writes the final macro file to the canonical input path, and returns both the macro text and the output file path.

Parameters:

config (AttrsDict) – Mapping-like Snakemake configuration that supports attribute-style access (e.g. config.experiment, config.metadata, etc.). The following fields are used: - experiment: name of the experiment to select tier-specific metadata. - metadata.tier[tier][experiment].generators: generator definitions. - metadata.tier[tier][experiment].confinement: confinement definitions.
simid (str) – Simulation identifier to select the simconfig.
tier (str) – Simulation tier (e.g. “stp”, “ver”, …). Default is “stp”.
geom (str | None) – Path to the geometry file.

Return type:

tuple[str, Path]

Returns:

A tuple with
- The rendered macro text.
- The path where the macro was written.

Notes

The macro template path is taken from the simconfig template field.
Supported substitutions currently include: GENERATOR and CONFINEMENT.
The user can provide arbitrary macro substitutions with the optional macro_substitutions field.
The macro is written to the canonical path returned by patterns.input_simjob_filename().
If config.nersc.dvs_ro is set, the vertices file will be read from the read-only filesystem mount /dvs_ro at NERSC.

legendsimflow.commands.remage_run(config, simid, *, jobid=None, tier='stp', geom='{input.geom}', procs=1, output='{output}', macro_free=False)¶

Build a remage CLI invocation string for a given simulation.

This constructs a shell-escaped command line for remage. When macro_free is True, the macro is rendered inline via make_remage_macro() and its content is passed directly on the CLI. When macro_free is False (default), the pre-existing macro file path is referenced on the CLI and substitutions are passed via --macro-substitutions; in that case the caller is responsible for generating the macro file beforehand (e.g. via the gen_remage_macro Snakemake rule).

Notes

Compatible with remage >= v0.16.
When macro_free is False (default), the command passes the macro file path and supplies macro substitutions via --macro-substitutions.
When macro_free is True, the rendered macro content is inlined on the CLI (comments and empty lines removed) and values are pre-substituted.
Two substitutions are always provided: N_EVENTS (from primaries_per_job or benchmark override) and SEED (a random 32-bit integer).
SEED is meant to be used as remage seed. It is determined by converting output to a 32-bit integer hash. If provided, the user config.simflow_rng_seed integer is added as offset.
The JOBID substitution is also provided if the jobid argument is not None.
If config.runcmd.remage is set, it is used to determine the remage executable (split with shlex.split()), otherwise remage is used.
If config.nersc.dvs_ro is set, remage is set to read all inputs from the read-only filesystem mount /dvs_ro at NERSC.
If config.nersc.scratch is set, the command will write the output file on the scratch disk and move it to the final expected destination at the end.

Parameters:

config (AttrsDict) – Snakemake-like configuration mapping. Must include metadata required by make_remage_macro() and optional benchmark and runcmd sections.
simid (str) – Simulation identifier for which to construct the command.
jobid (str | None) – Job identifier for the simulation run (string holding a zero-padded integer). Used as remage CLI macro substitution in case the macro contains it (e.g. if a vertices file is used).
tier (str) – Simulation tier (e.g., "stp", "ver"). Default is "stp".
geom (str | Path) – Path (or Snakemake placeholder) to the GDML geometry file.
procs (int) – Number of threads to pass to remage (integer or Snakemake placeholder). Internally uses remage’s --procs.
output (str | Path) – Path (or Snakemake placeholder) to the output remage file.
macro_free (bool) – If True, inline the macro contents on the CLI; if False, reference the macro file and pass substitutions via --macro-substitutions.

Return type:

str

Returns:

A shell-escaped command line suitable for direct execution.

legendsimflow.confine module¶

legendsimflow.confine._get_matching_volumes(volume_list, patterns)¶

Return volumes from volume_list whose names match patterns.

Wildcard patterns are supported via fnmatch.fnmatch().

Parameters:

volume_list (Iterable[str]) – List of volume names to search.
patterns (str | Sequence[str]) – Single wildcard pattern string or a list of patterns.

Return type:

list[str]

legendsimflow.confine.get_lar_minishroud_confine_commands(reg, pattern='minishroud_tube*', inside=True, lar_name='liquid_argon', outer_radius_in_mm=None, outer_height_in_mm=None)¶

Extract the commands for the LAr confinement inside/outside the NMS from the GDML.

Parameters:

reg (Registry) – The registry describing the geometry.
pattern (str | Sequence[str]) – The pattern used to search for physical volumes of minishrouds.
inside (bool) – If True, generate points inside the minishroud (NMS) volumes; if False, exclude the minishroud volumes from the generation region.
lar_name (str) – The name of the physical volume of the LAr.
outer_radius_in_mm (float | None) – If provided, gives an outer radius for the confinement. Only supported for outside confinement (inside=False).
outer_height_in_mm (float | None) – If provided, gives an outer height for the confinement. Only supported for outside confinement (inside=False).

Return type:

list[str]

Returns:

A list of confinement commands for remage.

legendsimflow.exceptions module¶

exception legendsimflow.exceptions.SimflowConfigError(message, block=None)¶

Bases: Exception

legendsimflow.geometry module¶

Helpers producing the stp geometry validation plots.

legendsimflow.geometry.load_vis_scene(config)¶

Return the geometry rendering scene for the current experiment.

Starts from DEFAULT_VIS_SCENE and merges (shallow, per top-level key) an optional per-experiment override read from <paths.config>/geom/<experiment>-vis-config.yaml in the metadata.

Return type:: dict

legendsimflow.geometry.make_hpge_mass_plot(config, geom_config, output)¶

Write the simulated-vs-measured HPGe mass comparison plot to output.

Compares the mass of each detector built by {mod}`pygeoml200` to the measured mass in legend-metadata (or the public testdata masses for a public geometry).

Return type:: None

legendsimflow.geometry.render_geometry(config, geom_config, output)¶

Render the geometry off-screen to output (PNG) using load_vis_scene().

Rebuilds the geometry with the light-weight segmented fiber model (the simulation GDML uses the detailed one, far too heavy to render). Rendering goes through the software OSMesa backend, so no GPU or X server is needed.

Return type:: None

legendsimflow.hpge_electronics_tuning module¶

Tune the electronics response parameters of the simulation against data superpulses.

Fits the Gaussian sigma and exponential tau of the system response kernel by minimising the mean RMS between simulated and measured current superpulses across drift-time slices.

legendsimflow.hpge_electronics_tuning.build_cost_function(ideal_wfs_slice, data_superpulses, dt, alignment_idx, nsamples_output, comparison_window=None, weight_power=0.0)¶

Build the scalar cost function for the Minuit minimiser.

The returned function has signature cost(sigma, tau) -> float and computes the mean RMS across all drift-time slices.

Parameters:

ideal_wfs_slice (dict[Slice, ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]) – Preselected ideal waveforms per slice, {Slice: ideal_wfs_array}.
data_superpulses (dict[Slice, Superpulse]) – Data superpulses keyed by the same slices.
dt (float) – Time step of the ideal waveforms in ns.
alignment_idx (int) – Sample index where the current peak is placed after alignment.
nsamples_output (int) – Length of the output waveforms.
comparison_window (tuple[float, float] | None) – (t_min, t_max) in ns. Passed through to compute_rms_in_slice().
weight_power (float) – Data-amplitude weight exponent p (w = |data|**p) passed through to compute_rms_in_slice(). 0 (default) is the unweighted RMS.

Return type:

Callable

Returns:

cost – cost(sigma, tau) -> float.

legendsimflow.hpge_electronics_tuning.compute_rms_in_slice(sim_avg, sim_time, data_sp, comparison_window=None, weight_power=0.0)¶

RMS residual between a simulated and a data current superpulse.

The simulation is linearly interpolated onto the data time grid. Samples outside the sim time range are excluded from the comparison.

With weight_power > 0 the squared residuals are weighted by the data current amplitude, w = |data|**weight_power, before the RMS, so the high-amplitude samples around the current peak dominate the cost and the near-zero tail is down-weighted (a data-amplitude-weighted RMS). weight_power = 0 (the default) reproduces the plain equal-weight RMS.

Parameters:

sim_avg (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]) – Simulated average current waveform, shape (n_sim,).
sim_time (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]) – Time axis of the simulation in ns, shape (n_sim,).
data_sp (Superpulse) – Data superpulse. Only current_wf and current_time_axis are read.
comparison_window (tuple[float, float] | None) – (t_min, t_max) in ns relative to the current peak. If None, the full waveform overlap is used.
weight_power (float) – Exponent p of the data-amplitude weight w = |data|**p applied to the squared residuals. 0 (default) is the unweighted RMS; larger values concentrate the fit on the current peak and its flanks.

Return type:

float

Returns:

rms – (Optionally data-amplitude-weighted) root mean square of the residuals.

legendsimflow.hpge_electronics_tuning.fit_electronics_parameters(ideal_wfs_slice, data_superpulses, dt, alignment_idx, nsamples_output, *, sigma_start, tau_start, sigma_limits, tau_limits, comparison_window=None, weight_power=0.0, max_calls=5000)¶

Fit the electronics response parameters sigma and tau.

Minimises the mean RMS between simulated and measured current superpulses across drift-time slices using Minuit (MIGRAD).

Parameters:

ideal_wfs_slice (dict[Slice, ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]) – Ideal charge waveforms per slice, as returned by get_ideal_wfs_all_slices().
data_superpulses (dict[Slice, Superpulse]) – Data superpulses keyed by the same slices.
dt (float) – Time step of the ideal waveforms in ns.
alignment_idx (int) – Sample index for current-peak alignment.
nsamples_output (int) – Length of the output waveforms.
sigma_start (float) – Initial value for the Gaussian sigma in ns.
tau_start (float) – Initial value for the exponential tau in ns.
sigma_limits (tuple[float, float]) – Hard bounds (lo, hi) for sigma in ns.
tau_limits (tuple[float, float]) – Hard bounds (lo, hi) for tau in ns.
comparison_window (tuple[float, float] | None) – (t_min, t_max) in ns relative to the current peak. If None, the full waveform overlap is used.
weight_power (float) – Data-amplitude weight exponent p for the cost (w = |data|**p); 0 (default) is the plain equal-weight RMS, larger values bias the fit toward the current peak. See compute_rms_in_slice().
max_calls (int) – Maximum number of Minuit function evaluations.

Return type:

Returns:

dict – Keys: sigma, tau, best_rms, ideal_wfs_slice, dt, alignment_idx, nsamples_output, minuit, cost_history.

legendsimflow.hpge_electronics_tuning.get_ideal_wfs_all_slices(ideal_pulse_shape_lib, data_superpulses, angle='000')¶

Select ideal waveforms per drift-time slice.

Reads the ideal pulse-shape library, flattens the (r, z) grid, and selects waveforms whose drift time falls in each data superpulse slice.

Parameters:

ideal_pulse_shape_lib (Struct) – Ideal waveform map (LGDO Struct) as read from lh5. Must contain waveform_{angle}_deg (an Array) and dt (time step), a Scalar.
data_superpulses (dict[Slice, Superpulse]) – Data superpulses keyed by slice.
angle (str) – Crystal axis angle tag, e.g. "000".

Return type:

Returns:

dict – Keys:

ideal_wfs_slice : dict[Slice, NDArray]
dt : time step in ns
alignment_idx : sample index for current-peak alignment
nsamples_output : output waveform length (from data)

legendsimflow.hpge_electronics_tuning.plot_best_fit(result, data_superpulses, comparison_window=None, plot_window=None, plot_charge=False, detector_name=None)¶

Overlay data and best-fit simulated superpulses (current or charge).

One panel per drift-time slice, sorted by drift time. Each panel shows the data superpulse and the simulation at the best-fit (sigma, tau), with the per-slice RMS annotated.

Parameters:

result (dict) – Dictionary returned by fit_electronics_parameters(). Uses sigma, tau, best_rms, ideal_wfs_slice, dt, alignment_idx, and nsamples_output.
data_superpulses (dict[Slice, Superpulse]) – Data superpulses keyed by slice.
comparison_window (tuple[float, float] | None) – (t_min, t_max) in ns relative to the current peak. If given, the window is shaded on each panel.
plot_window (tuple[float, float] | None) – (t_min, t_max) in ns for the x-axis limits. Defaults to comparison_window if set, otherwise auto-scaled.
plot_charge (bool) – Plot the charge (instead of the current), waveforms.
detector_name (str | None) – Optional detector name to include in the figure title.

Return type:

Returns:

fig (matplotlib.figure.Figure)
axes (array of matplotlib.axes.Axes)
data_amax (the Amax value of the highest drift time slice)
mc_amax (the Amax value of the highest drift time slice)

legendsimflow.hpge_electronics_tuning.plot_convergence(result)¶

Convergence diagnostics for the Minuit optimisation.

Three panels: RMS vs function call, sigma trajectory, tau trajectory. The best-fit value is marked with a dashed red line in each parameter panel.

Parameters:

result (dict) – Dictionary returned by fit_electronics_parameters(). Uses cost_history, sigma, tau, and best_rms.

Return type:

Returns:

fig (matplotlib.figure.Figure)
axes (tuple of matplotlib.axes.Axes)

legendsimflow.hpge_electronics_tuning.select_ideal_wfs_in_slice(ideal_wfs, dt, sl)¶

Select ideal waveforms whose drift time falls in a slice.

Drift times are computed on the fly for the provided waveforms.

Parameters:

ideal_wfs (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]) – Ideal charge waveforms, shape (n_wfs, n_samples).
dt (float) – Time step in ns.
sl (Slice) – Drift-time slice. Only sl.drift_time_range is used.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

selected – Waveforms in the slice, shape (n_selected, n_samples).

legendsimflow.hpge_pars module¶

legendsimflow.hpge_pars._iter_noise_waveforms(raw_files, hit_files, lh5_group, dsp_config, dsp_output, *, threshold=5, length=1000, energy_var='cuspEmax_cal')¶

Yield noise waveforms one at a time without accumulating them all in memory.

Parameters are the same as get_noise_maxima_and_sample().

legendsimflow.hpge_pars._lookup_generated_pars_file(l200data, metadata, runid, *, hit_tier_name='hit', pars_db=None)¶

Return type:: tuple[Any, Any]

legendsimflow.hpge_pars._remove_outliers(data, sigma=5)¶

Remove elements more than sigma standard deviations from the mean.

Return type:: ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

legendsimflow.hpge_pars.build_aoe_mean_func_dict(aoe_mean_pars=None, sim_type='single_template')¶

Build A/E mean functions for each HPGe detector in a LEGEND-200 run.

Return type:

dict[str, Callable]

Returns:

Mapping of HPGe name to A/E mean as a function of energy, where
energy is expected in units of keV.

Parameters:

aoe_mean_pars (dict | AttrsDict | None) – Parameters of the A/E energy-dependence model. May contain a default entry, which the caller is expected to have already expanded across the relevant detectors.
sim_type (str) – Type of PSD simulation (single_template or psl)

legendsimflow.hpge_pars.build_aoe_mean_func_from_entry(meta, sim_type='single_template')¶

Build a bound A/E mean callable from a single detector’s correction entry.

Parameters:

meta (dict | AttrsDict) – A single detector’s A/E energy-dependence entry, with single_template and psl sub-blocks, each carrying expression and pars (a, b).
sim_type (str) – Type of PSD simulation (single_template or psl).

Return type:

Callable

Returns:

Callable that takes energy in keV and returns the mean A/E.

legendsimflow.hpge_pars.build_aoe_res_func(function)¶

A/E resolution function builder.

Return type:: Callable

legendsimflow.hpge_pars.build_aoe_res_func_dict(l200data, metadata, runid, *, hit_tier_name='hit', aoe_res_pars=None)¶

Build A/E resolution functions for each HPGe detector in a LEGEND-200 run.

Return type:

dict[str, Callable]

Returns:

Mapping of HPGe name to A/E resolution as a function of energy, where
energy is expected in units of keV.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
aoe_res_pars (dict | AttrsDict | None) – from lookup_aoe_res_metadata().

legendsimflow.hpge_pars.build_aoe_res_func_from_entry(meta)¶

Build a bound A/E resolution callable from a single metadata entry.

Parameters:: meta (dict | AttrsDict) – A single detector’s A/E resolution metadata, with keys expression and parameters.
Return type:: Callable
Returns:: Callable that takes energy in keV and returns the A/E resolution (sigma).

legendsimflow.hpge_pars.build_energy_res_func(function)¶

Energy resolution function builder.

Return type:: Callable

legendsimflow.hpge_pars.build_energy_res_func_dict(l200data, metadata, runid, *, hit_tier_name='hit', energy_res_pars=None)¶

Build energy resolution functions for each HPGe detector in a LEGEND-200 run.

Return type:

dict[str, Callable]

Returns:

Mapping of HPGe name to energy resolution function (FWHM), where energy is
expected in units of keV.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
energy_res_pars (dict | AttrsDict | None) – from lookup_energy_res_metadata().

legendsimflow.hpge_pars.build_energy_res_func_from_entry(meta)¶

Build a bound energy resolution callable from a single metadata entry.

Parameters:: meta (dict | AttrsDict) – A single detector’s energy resolution metadata, with keys expression and parameters. Same format as one value from lookup_energy_res_metadata().
Return type:: Callable
Returns:: Callable that takes energy in keV and returns FWHM in keV.

legendsimflow.hpge_pars.estimate_mean_aoe(popt, energy=1593)¶

Estimate the maximum aoe from the parameters of the current_pulse_model popt.

Return type:: float

legendsimflow.hpge_pars.fit_currmod(times_list, current_list)¶

Fit the model to multiple raw HPGe current pulses simultaneously.

Normalises each waveform by its peak amplitude and uses iminuit.Minuit to minimise the summed RMS residual across all waveforms simultaneously. Fitting multiple waveforms provides a more robust estimate of the pulse-shape parameters than fitting a single event.

Parameters:

times_list (list[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]) – list of timestep arrays, one per waveform.
current_list (list[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]) – list of current-value arrays, one per waveform.

Return type:

Returns:

Tuple of the best-fit parameters (as a NumPy array), the initial guesses, arrays of the
best-fit model (time and current) evaluated around the peak and the initial model evaluated around the peak.

legendsimflow.hpge_pars.fit_noise_gauss(data, bins, *, fit_range=None, sigma_range=None)¶

Fit the data to a Gaussian to extract the resolution.

Performs a binned maximum likelihood fit using minuit.

Parameters:

data (ArrayLike) – an array of the data to fit.
bins (int) – The number of bins.
fit_result – The results of the iminuit fit.
fit_range (tuple | None) – The range to use for the fit, if None this is determined from the data as +/- 5 standard deviations round the mean.
sigma_range (tuple | None) – The range of sigma values for the fit, if None is determined from the data.

Return type:

Minuit

Returns:

The minuit object holding the fit results.

legendsimflow.hpge_pars.get_current_pulse(raw_file, lh5_group, idx, dsp_config, dsp_output='curr_av', align='tp_aoe_max')¶

Extract the current pulse.

Parameters:

raw_file (Path | str) – path to the raw tier file.
lh5_group (str) – where to find the waveform table.
idx (int) – the index of the waveform to read.
dsp_config (str) – the dspeed configuration file defining the DSP processing chain to estimate the current pulse.
dsp_output (str) – the name of the DSP output corresponding to the current pulse.
align (str) – DSP value around which the pulses are aligned.

Return type:

legendsimflow.hpge_pars.get_current_pulses(raw_file_idx_pairs, lh5_group, dsp_config, dsp_output='curr_av', align='tp_aoe_max')¶

Extract current pulses for multiple events.

Calls get_current_pulse() for each (raw_file, idx) pair and returns the results as two parallel lists.

Parameters:

raw_file_idx_pairs (list[tuple[Path | str, int]]) – list of (raw_file, idx) pairs.
lh5_group (str) – where to find the waveform table.
dsp_config (str | None) – the dspeed configuration file defining the DSP processing chain to estimate the current pulse.
dsp_output (str) – the name of the DSP output corresponding to the current pulse.
align (str | None) – DSP value around which the pulses are aligned.

Return type:

tuple[list[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]], list[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]]

Returns:

times_list – list of timestep arrays.
current_list – list of current-value arrays.

legendsimflow.hpge_pars.get_noise_maxima_and_sample(raw_files, hit_files, lh5_group, dsp_config, dsp_output, template, *, norm=1, sample_size=100, threshold=5, maximum_number=None, energy_var='cuspEmax_cal')¶

Compute waveform maxima on-the-fly, keeping only a small sample in memory.

This avoids storing all noise waveforms at once. Instead, it iterates through waveforms, computes the maximum of waveform + template for each, and only retains the first sample_size waveforms for plotting.

Parameters:

raw_files (list) – List of paths to raw files.
hit_files (list) – List of paths to hit files.
lh5_group (str) – The name of the lh5_group to find the waveform table in.
dsp_config (str) – the dspeed configuration file defining the DSP processing chain to estimate the current pulse.
dsp_output (str) – the name of the DSP output corresponding to the current pulse.
template (ArrayLike) – the current-pulse template waveform.
norm (float) – normalisation for the template.
sample_size (int) – number of waveforms to keep for plotting.
threshold (float) – energy threshold to apply to select the noise waveforms.
maximum_number (int | None) – maximum number of waveforms to process.
energy_var (str) – the name of the energy variable to use for thresholding.

Return type:

tuple[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]

Returns:

sample_wfs – 2D array of the first sample_size waveforms (for plotting).
a_max – 1D array of the maximum of waveform + template for each waveform.

legendsimflow.hpge_pars.get_waveform_maxima(template, noise_wfs, *, norm=1)¶

Extract the maximum of each waveform based on combining the template with each waveform in noise_wfs.

Note

The length of the template must be the same as the waveforms in noise_wfs

Parameters:

template (ArrayLike) – The template of the waveform to use.
noise_wfs (ArrayLike) – 2D array of each noise waveform.
norm (float) – The normalisation for the template.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

legendsimflow.hpge_pars.lookup_aoe_res_metadata(l200data, metadata, runid, *, hit_tier_name='hit', pars_db=None)¶

Lookup the measured A/E resolution metadata from LEGEND-200 data.

The metadata refers to the following model:

\[\sigma_\text{A/E}(E) = \sqrt{a + (b/E)^c}\]

where $E$ is in keV.

Return type:

Returns:

Mapping of HPGe name to metadata dictionary.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
pars_db (TextDB | None) – optional existing non-lazy instance of TextDB(".../path/to/prod/generated/par_{hit_tier_name}").

legendsimflow.hpge_pars.lookup_currmod_fit_data(hit_files, lh5_group, ewin_center=1593, ewin_width=10, max_waveforms=1, get_drift_time=True)¶

Extract the indices of the events to fit.

Considers events with abs(A/E) < 1.5 and finds up to max_waveforms events closest to the median drift time. Returns a list of (event_index, file_index) pairs, sorted from closest to farthest from the median, with at most max_waveforms entries, together with the full and selected drift-time arrays for diagnostic purposes.

Parameters:

hit_files (list[str | Path]) – tier-hit files used to determine the best indices.
lh5_group (str) – where the tier-hit data is found in the files.
ewin_center (float) – center of the energy window to use for the event search (same units as in data).
ewin_width (float) – width of the energy window to use for the event search (same units as in data).
max_waveforms (int) – maximum number of waveforms to return.
get_drift_time (bool) – Read also drift time to select waveforms.

Return type:

tuple[list[tuple[int, int]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]

Returns:

pairs – list of (event_index, file_index) tuples, sorted by proximity to the median drift time.
all_dts – all drift-time values for events passing the energy and A/E cuts.
selected_dts – drift-time values for the selected subset of events.

legendsimflow.hpge_pars.lookup_currmod_fit_inputs(l200data, metadata, runid, hpge, hit_tier_name='hit', max_waveforms=100)¶

Find raw files, event indices and the DSP configuration file.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hpge (str) – name of the HPGe detector
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
max_waveforms (int) – maximum number of waveforms to return.

Return type:

tuple[list[tuple[Path, int]], Path, ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]

Returns:

raw_wf_pairs – list of (raw_file, event_index) pairs, up to max_waveforms.
dsp_cfg_file – path to the DSP configuration file.
all_dts – all drift-time values for events passing the energy and A/E cuts.
selected_dts – drift-time values for the selected subset of events.

legendsimflow.hpge_pars.lookup_energy_res_metadata(l200data, metadata, runid, *, hit_tier_name='hit', pars_db=None)¶

Lookup the measured HPGe energy resolution metadata from LEGEND-200 data.

The metadata refers to the following model:

\[\text{FWHM}(E) = \sqrt{a + bE}\]

where $E$ is in keV.

Return type:

Returns:

Mapping of HPGe name to metadata dictionary.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
pars_db (TextDB | None) – optional existing non-lazy instance of TextDB(".../path/to/prod/generated/par_{hit_tier_name}").

legendsimflow.hpge_pars.lookup_file_paths(l200data, runid, hit_tier_name)¶

Lookup the paths to the hit and raw files.

Return type:: AttrsDict

legendsimflow.hpge_pars.lookup_psd_cut_values(l200data, metadata, runid, *, hit_tier_name='hit', pars_db=None)¶

Lookup the measured PSD cut values from LEGEND-200 data.

Return type:

Returns:

Mapping of HPGe name to metadata dictionary.

Parameters:

l200data (str | Path) – The path to the L200 data production cycle.
metadata (LegendMetadata) – The metadata instance
runid (str) – LEGEND-200 run identifier, must be of the form {EXPERIMENT}-{PERIOD}-{RUN}-{TYPE}.
hit_tier_name (str) – name of the hit tier. This is typically “hit” or “pht”.
pars_db (TextDB | None) – optional existing non-lazy instance of TextDB(".../path/to/prod/generated/par_{hit_tier_name}").

legendsimflow.hpge_pars.plot_currmod_fit_result(t, A, model_t, model_A, init_model)¶

Plot the best fit results.

Return type:: tuple

legendsimflow.hpge_pars.plot_dt_selection(all_dts, selected_dts)¶

Plot the drift-time distribution and highlight the selected waveforms.

Draws a histogram of all drift-time values (passing the energy and A/E cuts) using the hist package and overlays a shaded band that spans the range of drift times of the events chosen for the current-pulse fit.

Parameters:

all_dts (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]) – Drift-time values for every event that passes the energy and A/E cuts.
selected_dts (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]) – Drift-time values for the subset of events selected for fitting.

Return type:

Returns:

fig – The matplotlib.figure.Figure.
ax – The matplotlib.axes.Axes.

legendsimflow.hpge_pars.plot_gauss_fit(data, fit_result, fit_range=None, bins=100, nominal_val=None)¶

Plot the result of the Gaussian fit.

Parameters:

data (ArrayLike) – an array of the data to fit.
fit_result (Minuit) – the result of the Gaussian fit.
bins (int) – The number of bins.
fit_range (tuple | None) – The range to use for the fit, if None this is determined from the data as +/- 5 standard deviations round the mean.
nominal_val (float | None) – The nominal mean to add as a line on the plot.

Return type:

legendsimflow.hpge_pars.plot_noise_waveforms(noise, temp, norm=1)¶

Plot the waveforms with noise and the noise alone.

Return type:: tuple

legendsimflow.metadata module¶

legendsimflow.metadata._get_lh5_table(metadata, fname, hpge, tier, runid)¶

The correct LH5 table path.

Determines the correct path to a hpge detector table in tier tier.

Return type:: str

legendsimflow.metadata.decode_psd_usability(psd_usability_code)¶

Decode the PSD usability (see encode_psd_usability()).

Return type:: str

legendsimflow.metadata.decode_usability(usability_code)¶

Decode the HPGe usability (see encode_usability()).

Return type:: str

legendsimflow.metadata.encode_psd_usability(psd_usability)¶

Encode the PSD usability in an int.

Return type:: int

legendsimflow.metadata.encode_usability(usability)¶

Encode the HPGe usability in an int.

Return type:: int

legendsimflow.metadata.expand_runlist(metadata, runlist)¶

Expands a runlist as passed to the Simflow configuration.

A runlist is a list of:

runids in the form accepted by is_runid();
runlist DB queries in the form <tag>.<datatype>.<period> (see query_runlist_db()).

Return type:: list[str]

legendsimflow.metadata.extract_integer(file_path)¶

Read a single integer from a file, stripping surrounding whitespace.

Return type:: int

legendsimflow.metadata.get_par_settings(config, par)¶

Return the settings block for par and the current experiment.

Return type:: AttrsDict

legendsimflow.metadata.get_runlist(config, simid)¶

Gets the runlist assigned to a simulation.

If not overridden in the hit-tier simconfig, returns the global runlist stored in config.runlist.

Return type:: list[str]

legendsimflow.metadata.get_sanitized_fccd(metadata, det_name)¶

Return the FCCD value for det_name, falling back to 1 mm if the FCCD field is absent.

Parameters:

metadata (LegendMetadata) – LEGEND metadata database.
det_name (str) – Detector name.

Return type:

float

legendsimflow.metadata.get_simconfig(config, tier, simid=None, field=None)¶

Return the simulation configuration for the given tier and simid.

Raise SimflowConfigError if any key is not found.

Parameters:

config (AttrsDict) – Simflow configuration object.
tier (str) – Tier name.
simid (str | None) – Simulation identifier.
field (str | None) – If not None, return the value of this key in the simconfig.

Return type:

legendsimflow.metadata.get_tier_settings(config, tier)¶

Return the settings block for tier and the current experiment.

Return type:: AttrsDict

legendsimflow.metadata.get_vtx_simconfig(config, simid)¶

Get the vertex generation configuration for a stp-tier simid.

Returns the vtx-tier generator requested by the stp-tier simulation with identifier simid.

Parameters:

config (AttrsDict) – Snakemake config.
simid (str) – simulation identifier.

Return type:

legendsimflow.metadata.is_runid(runid)¶

Whether a runid (run identifier) is correctly formatted.

It should be in the form <experiment>-<period>-<run>-<datatype>/XXX-pNN-rMMM-AAA where XXX is any alphanumeric experiment identifier.

Return type:: bool

legendsimflow.metadata.is_simid(simid)¶

Whether a simid (simulation identifier) is correctly formatted.

A valid simid must consist entirely of word characters (letters, digits, underscores) and hyphens, matching the pattern [-\w]+. Dots and other special characters are not allowed; in particular, dots are forbidden because they are used as the delimiter in the simlist format <tier>.<simid>.

Return type:: bool

legendsimflow.metadata.parse_runid(runid)¶

Extract runid fields.

Returns the experiment, period, run and datatype as a tuple. Period and run are integers.

Return type:: (str, int, int, str)

legendsimflow.metadata.query_runlist_db(metadata, query)¶

Query the runlist DB stored in legend-datasets.

Run expressions of the form r00n..r00m are automatically expanded into full run lists. If for example metadata.datasets.runlists.valid.phy.p02 == "r000..r002":

>>> query_runlist_db(metadata, "valid.phy.p02")
["l200-p02-r000-phy", "l200-p02-r001-phy", "l200-p02-r002-phy"]

Parameters:

metadata (LegendMetadata) – LEGEND metadata instance.
query (str) – expression in the form <tag>.<datatype>.<period> (see contents of runlists.yaml in legend-datasets.

Return type:

list[str]

legendsimflow.metadata.reference_cal_run(metadata, runid)¶

The reference calibration run for runid.

Warning

This function does not account for dataflow overrides (e.g. calibration back-applying)!

Return type:: str

legendsimflow.metadata.runinfo(metadata, runid)¶

Get the datasets.runinfo entry for a LEGEND run identifier.

Parameters:

metadata (LegendMetadata) – LEGEND metadata database.
runid (str) – a run identifier in the format <experiment>-<period>-<run>-<datatype>.

Return type:

str

legendsimflow.metadata.simpars(metadata, par, runid, experiment, default=<object object>)¶

Extract simflow parameters for a certain LEGEND run.

Queries the simflow parameters database stored under simprod.config.pars by experiment name experiment, parameter name par and LEGEND run identifier runid.

Parameters:

metadata (LegendMetadata) – LEGEND metadata database.
par (str) – name of directory under metadata.simprod.config.pars.{experiment}. Can be a nested property, as in e.g. geds.opv.value. . and / are allowed separators.
runid (str) – a run identifier in the format <experiment>-<period>-<run>-<datatype>.
experiment (str) – experiment identifier (e.g. l200cfg01, l1000dsg01). Selects the experiment-level subdirectory under simprod/config/pars/.
default (object) – value to return when the parameter directory is not found in the database or no validity entry matches runid. If not provided, such cases raise KeyError or LookupError. Other errors (e.g. malformed YAML) are always re-raised regardless of this argument.

Return type:

legendsimflow.metadata.smk_hash_simconfig(config, wildcards, field=None, ignore=None, **kwargs)¶

Get the dictionary hash for use in Snakemake rules.

Parameters:

config (AttrsDict) – Snakemake config.
wildcards (Wildcards) – Snakemake wildcards object.
field (str | None) – If not None, return the value of this key in the simconfig.
ignore (list | None) – Exclude these fields from the hash.
kwargs – provide a value for wildcards that might not be present in wildcards.

Return type:

str

legendsimflow.metadata.usability(metadata, det_name, runid, default=None)¶

Get the usability for analysis of det_name in run runid.

Looks for the analysis.usability metadata field in the channel map. By default, an error is thrown if no information is found. If default is set to a non-None value, it will be returned.

Return type:: str

legendsimflow.metadata.validate_simconfig_keys(simconfig, block=None)¶

Validate that all top-level keys of simconfig are valid simids.

Raises SimflowConfigError listing every invalid key if any are found.

Parameters:

simconfig (Mapping) – Dictionary whose top-level keys are expected to be simids (as loaded from a simconfig.yaml file).
block (str | None) – Optional config block label included in the error message for context.

Return type:

legendsimflow.nersc module¶

legendsimflow.nersc.dvs_ro(config, path)¶

Turn /global/... file paths to /dvs_ro/... on NERSC.

The input type is preserved.

Note

config must contain a nersc key mapped to a dictionary containing a dvs_ro: True key.

Return type:: str | Path | list[str | Path]

legendsimflow.nersc.dvs_ro_snakemake(snakemake)¶

Swap the read-only filesystem path in all Snakemake input files.

This function is meant to be used in Snakemake scripts, where the Snakemake rule attributes (input, output, …) are accessible from the special object snakemake.

Warning

This function mutates the input snakemake object in place.

legendsimflow.partitioning module¶

legendsimflow.partitioning.partition_simstat(n_events, n_events_part, runlist)¶

Partition the simulation event statistics according to run livetime.

Returns the following dictionary:

job_000:
  l200-p03-r001-phy: [0, 300]  # interval includes its edges
  l200-p03-r002-phy: [301, 456]
job_001:
  l200-p03-r002-phy: [0, 200]
  l200-p03-r003-phy: [201, 156]
...

where the number of events of each job is partitioned in runs, such that the global event partitioning in n_events_part is respected.

Parameters:

n_events (Mapping[str, int]) –
mapping of number of simulation events and simulation job.
```
job_0000: 5000
job_0001: 7000
...
```
n_events_part (Mapping[str, int]) –
mapping of fraction of total number of simulation events (summed over all jobs) per considered run, with weights equal to the run livetime fraction.
```
l200-p03-r001-phy: 300
l200-p03-r002-phy: 456
...
l200-<...>: tot_n_events
```
runlist (Iterable[str]) – list of runs in the form <experiment>-<period>-<run>-<datatype>.

Return type:

dict[str, dict[str, list[int]]]

legendsimflow.patterns module¶

Prepare pattern strings to be used in Snakemake rules.

Extra keyword arguments are typically interpreted as variables to be substituted in the returned (structure of) strings. They are passed to snakemake.io.expand().

Definitions:

simid: string identifier for the simulation run
simjob: one job of a simulation run (corresponds to one macro file and one output file)
jobid: zero-padded integer (i.e., a string) used to label a simulation job

legendsimflow.patterns._expand(pattern, keep_list=False, **kwargs)¶

Expand a path pattern with Snakemake wildcards.

Returning a scalar unless keep_list is set.

Return type:: str | Path

legendsimflow.patterns.benchmark_dtmap_filename(config, **kwargs)¶

The benchmark file path for drift-time map generation for a detector and voltage.

Return type:: Path

legendsimflow.patterns.benchmark_filename(config, **kwargs)¶

Formats a benchmark file path for a simid and jobid.

Return type:: Path

legendsimflow.patterns.benchmark_ideal_psl_filename(config, **kwargs)¶

The benchmark file path for ideal pulse-shape library generation for a detector and voltage.

Return type:: Path

legendsimflow.patterns.benchmark_realistic_psl_filename(config, **kwargs)¶

The benchmark file path for realistic pulse-shape library generation for a detector and run.

Return type:: Path

legendsimflow.patterns.benchmark_tier_cvt_filename(config, **kwargs)¶

The benchmark file path for the cvt tier build for a simid.

Return type:: Path

legendsimflow.patterns.benchmark_tier_pdf_filename(config, **kwargs)¶

The benchmark file path for the pdf tier build for a simid.

Return type:: Path

legendsimflow.patterns.compute_superpulses(config, **kwargs)¶

Flag to compute the superpulses.

Return type:: bool

legendsimflow.patterns.detinfo_filename(config, flag)¶

Path to the per-flag detector-info file for flag in the par tier.

The par tier caches detector-level information (usability, modelability, …) as one YAML file per flag under pars/detinfo/, each a mapping runid -> detector -> value.

Return type:: Path

legendsimflow.patterns.geom_config_filename(config, **kwargs)¶

The path to the geometry configuration YAML file for a tier and simid.

Return type:: Path

legendsimflow.patterns.geom_gdml_filename(config, **kwargs)¶

The path to the GDML geometry file for a tier and simid.

Return type:: Path

legendsimflow.patterns.geom_log_filename(config, **kwargs)¶

The log file path for geometry generation for a tier and simid.

Return type:: str

legendsimflow.patterns.geom_vis_config_filename(config)¶

The path to the optional geometry rendering vis config for the experiment.

Return type:: Path

legendsimflow.patterns.input_currmod_evt_idx_file(config, **kwargs)¶

The path to the event index file used to extract current pulse waveforms.

Return type:: Path

legendsimflow.patterns.input_simid_filenames(config, n_macros, **kwargs)¶

Returns the full path to n_macros input files for a simid.

Needed by script that generates all macros for a simid.

Return type:: list[Path]

legendsimflow.patterns.input_simjob_filename(config, **kwargs)¶

Returns the full path to the input file for a simid, tier and job index.

Return type:: Path

legendsimflow.patterns.log_aoemeanmod_filename(config, **kwargs)¶

The log file path for A/E energy-correction extraction for a detector.

Return type:: Path

legendsimflow.patterns.log_currmod_filename(config, **kwargs)¶

The log file path for current-pulse model extraction for a detector and runid.

Return type:: Path

legendsimflow.patterns.log_dirname(config)¶

Directory where log files are stored.

Return type:: Path

legendsimflow.patterns.log_dtmap_filename(config, **kwargs)¶

The log file path for drift-time map generation for a detector and voltage.

Return type:: Path

legendsimflow.patterns.log_elecmod_filename(config, **kwargs)¶

The log file path for elec pulse model extraction for a detector and runid.

Return type:: Path

legendsimflow.patterns.log_eresmod_filename(config, **kwargs)¶

The log file path for HPGe observables model extraction for a runid.

Return type:: Path

legendsimflow.patterns.log_filename(config, **kwargs)¶

Formats a log file path for a simid and jobid.

Return type:: Path

legendsimflow.patterns.log_ideal_psl_filename(config, **kwargs)¶

The log file path for ideal pulse-shape library generation for a detector and voltage.

Return type:: Path

legendsimflow.patterns.log_realistic_psl_filename(config, **kwargs)¶

The log file path for realistic pulse-shape library generation for a detector and run.

Return type:: Path

legendsimflow.patterns.log_simstat_part_filename(config, **kwargs)¶

The log file path for simulation event statistics partitioning for a simid.

Return type:: Path

legendsimflow.patterns.log_superpulses_filename(config, build_per_runid=False, **kwargs)¶

The log file path for HPGe superpulse generation for a detector.

Return type:: Path

legendsimflow.patterns.log_tier_cvt_filename(config, **kwargs)¶

The log file path for the cvt tier build for a simid.

Return type:: Path

legendsimflow.patterns.log_tier_pdf_filename(config, **kwargs)¶

The log file path for the pdf tier build for a simid.

Return type:: Path

legendsimflow.patterns.output_aoemeanmod_filename(config, **kwargs)¶

The path to the per-detector A/E energy-dependence correction file.

Return type:: Path

legendsimflow.patterns.output_aoemeanmod_merged_filename(config, **kwargs)¶

The path to the single, merged, run-independent A/E energy-dependence correction file.

Return type:: Path

legendsimflow.patterns.output_aoeresmod_filename(config, **kwargs)¶

The path to the HPGe A/E resolution model parameter file for a runid.

Return type:: Path

legendsimflow.patterns.output_currmod_filename(config, **kwargs)¶

The path to the per-detector HPGe current-pulse model parameter file.

Return type:: Path

legendsimflow.patterns.output_currmod_merged_filename(config, **kwargs)¶

The path to the merged HPGe current-pulse model parameter file for a runid.

Return type:: Path

legendsimflow.patterns.output_dtmap_filename(config, **kwargs)¶

The path to the HPGe drift-time map file for a detector and voltage.

Return type:: Path

legendsimflow.patterns.output_dtmap_info_filename(config, **kwargs)¶

The path to the HPGe SSD-modeling info sidecar for a detector and voltage.

Return type:: Path

legendsimflow.patterns.output_dtmap_merged_filename(config, **kwargs)¶

The path to the merged HPGe drift-time map file for a runid.

Return type:: Path

legendsimflow.patterns.output_elecmod_filename(config, **kwargs)¶

The path to the per-detector HPGe electronics pulse model parameter file.

Return type:: Path

legendsimflow.patterns.output_elecmod_merged_filename(config, **kwargs)¶

The path to the HPGe electronics model file for a runid.

Return type:: Path

legendsimflow.patterns.output_eresmod_filename(config, **kwargs)¶

The path to the HPGe energy resolution model parameter file for a runid.

Return type:: Path

legendsimflow.patterns.output_ideal_psl_filename(config, **kwargs)¶

The path to the ideal HPGe pulse-shape library for a detector and voltage.

Return type:: Path

legendsimflow.patterns.output_psdcuts_filename(config, **kwargs)¶

The path to the HPGe PSD cut values file for a runid.

Return type:: Path

legendsimflow.patterns.output_realistic_psl_filename(config, **kwargs)¶

The path to the realistic HPGe pulse-shape library for a detector and run.

Return type:: Path

legendsimflow.patterns.output_realistic_psl_merged_filename(config, **kwargs)¶

The path to the merged realistic PSL file for a runid.

Return type:: Path

legendsimflow.patterns.output_simid_filenames(config, n_macros, **kwargs)¶

Returns the full path to n_macros output files for a simid.

legendsimflow.patterns.output_simid_precorr_hit_filenames(config, n_macros, **kwargs)¶

Returns the full path to n_macros temp pre-correction hit files for a simid.

Return type:: list[Path]

legendsimflow.patterns.output_simjob_filename(config, **kwargs)¶

Returns the full path to the output file for a simid, tier and job index.

Return type:: Path

legendsimflow.patterns.output_simjob_precorr_hit_filename(config, **kwargs)¶

The path to a temporary, pre-correction hit output file for a simid/jobid.

Return type:: Path

legendsimflow.patterns.output_simjob_regex(config, **kwargs)¶

A glob-style regex matching all output files for a tier.

Return type:: str

legendsimflow.patterns.output_superpulses_filename(config, build_per_runid=False, **kwargs)¶

The path to the HPGe superpulses file for a detector.

Return type:: Path

legendsimflow.patterns.output_tier_cvt_filename(config, **kwargs)¶

The path to the merged cvt tier output file for a simid.

Return type:: Path

legendsimflow.patterns.output_tier_pdf_filename(config, **kwargs)¶

The path to the merged pdf tier output file for a simid.

Return type:: Path

legendsimflow.patterns.pdf_tarball_filename(config)¶

The path to the pdf tier archive tarball for the current production cycle.

The Simflow has no explicit knowledge of the production cycle name, so the name of the directory where the Simflow lives is used as a proxy.

Return type:: Path

legendsimflow.patterns.plot_aoemeanmod_filename(config, **kwargs)¶

The path to the A/E energy-correction fit validation plot for a detector.

Return type:: Path

legendsimflow.patterns.plot_currmod_filename(config, **kwargs)¶

The path to the current-pulse model fit validation plot for a detector and runid.

Return type:: Path

legendsimflow.patterns.plot_dtmap_filename(config, **kwargs)¶

The path to the drift-time map validation plot for a detector and voltage.

Return type:: Path

legendsimflow.patterns.plot_elecmod_filename(config, **kwargs)¶

The path to the electronics fit validation plot for a detector and runid.

Return type:: Path

legendsimflow.patterns.plot_geom_hpge_mass_filename(config, **kwargs)¶

The path to the simulated-vs-measured HPGe mass plot for a stp simid.

Return type:: Path

legendsimflow.patterns.plot_geom_rendering_filename(config, **kwargs)¶

The path to the geometry rendering for a stp simid.

Return type:: Path

legendsimflow.patterns.plot_realistic_psl_filename(config, **kwargs)¶

The path to the realistic HPGe pulse-shape library plots for a detector and run.

Return type:: Path

legendsimflow.patterns.plot_superpulses_filename(config, build_per_runid=False, **kwargs)¶

The path to the HPGe superpulses diagnostic plot file for a detector.

Return type:: Path

legendsimflow.patterns.plot_superpulses_uniformity_filename(config, build_per_runid=False, **kwargs)¶

The path to the response uniformity plot for a detector.

Return type:: Path

legendsimflow.patterns.plot_tier_cvt_observables_filename(config, **kwargs)¶

The path to the observable validation plot for a cvt simid.

Return type:: Path

legendsimflow.patterns.plot_tier_hit_observables_filename(config, **kwargs)¶

The path to the observable validation plot for a hit simid.

Return type:: Path

legendsimflow.patterns.plot_tier_opt_observables_filename(config, **kwargs)¶

The path to the observable validation plot for an opt simid.

Return type:: Path

legendsimflow.patterns.plot_tier_stp_vertices_filename(config, **kwargs)¶

The path to the primary vertex validation plot for a stp simid.

Return type:: Path

legendsimflow.patterns.plots_dirname(config, tier)¶

Returns the plots directory path for a tier.

Return type:: Path

legendsimflow.patterns.plots_tarball_filename(config)¶

The path to the plots archive tarball for the current production cycle.

The Simflow has no explicit knowledge of the production cycle name, so the name of the directory where the Simflow lives is used as a proxy.

Return type:: Path

legendsimflow.patterns.simjob_base_segment(config, **kwargs)¶

Formats a segment for a path including wildcards simid and jobid.

Return type:: str

legendsimflow.patterns.simstat_part_filename(config, **kwargs)¶

The path to the simulation event statistics partitioning file.

Return type:: Path

legendsimflow.patterns.tier_cvt_base_segment(config, **kwargs)¶

The base filename segment for cvt tier files for a simid.

Return type:: str

legendsimflow.patterns.tier_pdf_base_segment(config, **kwargs)¶

The base filename segment for pdf tier files for a simid.

Return type:: str

legendsimflow.patterns.vtx_filename_for_stp(config, simid, **kwargs)¶

Returns the vertices file needed for the ‘stp’ tier job, if needed.

Used as lambda function in the build_tier_stp Snakemake rule.

Return type:: Path | list

legendsimflow.plot module¶

legendsimflow.plot.decorate(fig, rotate=False)¶

Add the legend-simflow watermark to fig.

By default it sits horizontally at the bottom-right corner; with rotate it is placed vertically (rotated 90°) along the right edge.

legendsimflow.plot.n_nans(array)¶

legendsimflow.plot.plot_hist(h, ax, n_nans=None, **kwargs)¶

legendsimflow.plot.read_concat_wempty(files, table)¶

Return type:: Array | None

legendsimflow.plot.save_page(pdf, make_fig)¶

legendsimflow.plot.set_empty(ax)¶

legendsimflow.profile module¶

legendsimflow.profile._f(x)¶

Return type:: str

legendsimflow.profile._pct(x)¶

Return type:: str

legendsimflow.profile.make_profiler()¶

Return type:: tuple[Callable, Callable, Callable]

legendsimflow.psl module¶

legendsimflow.psl._check_pulse_shape_lib_keys(pulse_shape_lib)¶

Validate that the waveform map contains the required keys with correct types.

Return type:: None

legendsimflow.psl.align_waveforms_to_peak(wf_input, alignment_idx, nsamples_output_wfs, *, peak_indices=None)¶

Align an array of waveforms by shifting their maximum to a fixed index.

No normalization is performed; raw amplitudes are preserved.

Note

The output peak_indices is not the original drift time as the current waveform inherits baseline from convolution.

Parameters:

wf_input (Array | ndarray) – Input array of current waveforms
alignment_idx (int) – The index in the output array where the peak will be placed
nsamples_output_wfs (int) – The total length of the resulting aligned current waveforms
peak_indices (ndarray | None) – If not None use this as the indices for alignment.

Return type:

tuple[ndarray, ndarray]

Returns:

shifted_wfs – 2D array of shifted waveforms
peak_indices – 1D array containing the original peak index for each current waveform

legendsimflow.psl.apply_electronics_response(wf_array, rf_kernel, batch_size=50000)¶

Vectorized convolution using FFT with batching to save memory.

Parameters:

wf_array (Array | ndarray) – Array of waveforms (all of same length)
rf_kernel (ndarray) – The response kernel (gaussian + exponential)
batch_size (int) – Number of waveforms to process at once (default is 50,000)

Return type:

Returns:

convolved_wfs – The convolved waveforms as an Awkward Array

legendsimflow.psl.build_electronics_response_kernel(dt, mu_bandwidth, sigma_bandwidth, tau_rc, gaussian_only=False, *, kernel_length=600, kernel_start=-100)¶

Create the system response kernel (gaussian + exponential decay).

This is obtained by convolving a Gaussian (representing the digitizer bandwidth) with a causal exponential decay (representing the preamplifier response). The kernel is normalized to have a sum of 1.

Note

The ‘full’ mode of convolution results in a length of 2*kernel_length - 1. If gaussian_only is True, the kernel will have a length of kernel_length and will only contain the Gaussian component, since no convolution is performed.

Parameters:

dt (float) – The time step between samples in the waveform (in ns)
mu_bandwidth (float) – The mean of the Gaussian representing the digitizer bandwidth (in ns)
sigma_bandwidth (float) – The standard deviation of the Gaussian representing the digitizer bandwidth (in ns)
tau_rc (float) – The time constant of the exponential decay representing the preamplifier response (in ns)
gaussian_only (bool) – If True, only use the Gaussian component (default is False)
kernel_length (int) – The total length of the response kernel in samples (default is 600)
kernel_start (int) – The starting index of the kernel relative to the waveform (default is -100, meaning the kernel will cover from -100 to 500 samples)

Return type:

ndarray

Returns:

rf – The normalized response kernel

legendsimflow.psl.get_avg_aoe(waveforms)¶

Estimate the average A/E from the PSL.

Estimated as the mode of the distribution of the maximum amplitude of each waveform.

Parameters:

waveforms (list[ndarray]) – List of 3D array of waveforms with shape (n_r, n_z, n_samples)

Return type:

tuple[Hist, float]

Returns:

hist_aoe – Histogram of the maximum amplitude distribution.
avg_aoe – The average A/E value estimated from the waveforms

legendsimflow.psl.make_realistic_pulse_shape_lib(ideal_pulse_shape_lib_obj, rf_kernel, alignment_idx, nsamples_output_current_wfs, mw_pars, dt_data=1.0)¶

Apply the waveform post-processing chain to generate a realistic waveform map.

Starts from an ideal waveform map and performs the following steps:

Converts coordinates (m to mm)
Convolves with system response
Aligns by Peak Time
Calculates compensated Drift Time

Parameters:

ideal_pulse_shape_lib_obj (Mapping[str, Array | Scalar]) –
Mapping containing the ideal waveform map with coordinates and waveforms.

This should have the following format:
- r: 1D array of radial coordinates
- z: 1D array of axial coordinates
- dt: Time step between samples in the waveforms
- waveform_X: 3D array of ideal charge waveforms for angle X (shape: [n_z, n_r, n_samples])
rf_kernel (ndarray) – The system response kernel (from build_electronics_response_kernel())
alignment_idx (int) – The index in the output array where waveform peaks will be aligned
nsamples_output_current_wfs (int) – The total length of the resulting aligned current waveforms
mw_pars (dict[str, float | int]) –
Dictionary of parameters for the moving window average step, with keys:
- length: The length of the moving window (in samples)
- num_mw: The number of moving windows to use in the moving window average
- mw_type: The type of moving window to apply (see dspeed.processors.moving_window_multi for details)
dt_data (float) – The time step of the original data waveforms (in ns), used to scale the derivative.

Return type:

dict[str, Array | Scalar]

Returns:

realistic_pulse_shape_lib – Struct containing the processed realistic waveform map with the following keys:

r: 1D array of radial coordinates
z: 1D array of axial coordinates
t0: Global time offset applied to align waveforms
waveform_X: 3D array of processed current waveforms for angle X (shape: [n_r, n_z, nsamples_output_current_wfs], spatial axes reversed relative to Julia due to HDF5 column-/row-major conversion)
drift_time_X: 2D array of calculated drift times for angle X (shape: [n_r, n_z])

legendsimflow.psl.plot_aoe_rz_map(pulse_shape_lib, detector_id, *, hpge_profile=None)¶

Plot the A/E R/Z map(s) of a pulse-shape library.

For each azimuthal angle present in the library, computes the per-pixel A/E as the maximum of the (energy-normalized) current waveform over the time axis and renders it as a symmetrized R/Z heatmap, styled like the HPGe drift-time-map validation plot. When both the <100> (0 deg) and <110> (45 deg) angles are present, an additional <100>/<110> ratio panel is appended.

Parameters:

pulse_shape_lib (Mapping[str, Array | Scalar]) – Mapping containing the pulse-shape library, as returned by make_realistic_pulse_shape_lib(). Must contain r, z (in mm) and at least one waveform_<angle>_deg key.
detector_id (str) – Detector name, e.g. "V03422A", used in the panel titles.
hpge_profile (object | None) – Optional detector geometry object (as returned by pygeomhpges.make_hpge). When given, its profile is overlaid on every panel.

Return type:

Figure

Returns:

fig (Figure)

legendsimflow.psl.plot_rz_scan(pulse_shape_lib, angle_deg, detector_id, scan='r', step=1, xlim=None)¶

Plot an R or Z scan of waveforms from a pulse-shape library.

For a given azimuthal angle, produces one figure scanning over radial or axial positions at a fixed index on the other axis. Waveforms are color-coded by spatial coordinate.

Parameters:

pulse_shape_lib (Mapping[str, Array | Scalar]) – Mapping containing the pulse-shape library (ideal or realistic), as returned by make_realistic_pulse_shape_lib() or read from the ideal LH5 file. Must contain r, z, dt keys and at least one waveform_<angle>_deg key.
angle_deg (int) – Azimuthal angle in degrees, used to select the waveform_<angle>_deg key.
detector_id (str) – Detector name, e.g. "V03422A".
scan (str) – "r" to scan over radial positions at fixed Z, "z" to scan over axial positions at fixed R.
step (int) – Stride for subsampling spatial positions (default: plot every position).
xlim (tuple[float, float] | None) – x-axis limits in ns. If None, matplotlib auto-scales.

Return type:

tuple[Figure, Axes]

Returns:

fig (Figure)
ax (Axes)

legendsimflow.psl.process_ideal_waveforms(wfs, rf_kernel, dt, alignment_idx, nsamples_output, mw_pars, dt_data, return_mode='current')¶

Apply electronics response and DSP chain to ideal charge waveforms.

Convolve -> differentiate -> MWA -> align to peak.

Parameters:

wfs (ndarray) – Charge waveforms, shape (n_wfs, n_samples).
rf_kernel (ndarray) – System response kernel from build_electronics_response_kernel().
dt (float) – Time step of the ideal waveforms in ns.
alignment_idx (int) – Sample index where the current peak is placed after alignment.
nsamples_output (int) – Length of the output waveforms.
mw_pars (dict[str, int]) – MWA parameters: length, num_mw, mw_type.
dt_data (float) – Data sampling time step in ns.
return_mode (str) – Whether to extract the “current” or the “charge” waveform.

Return type:

tuple[ndarray, ndarray]

Returns:

aligned_currents – Aligned current waveforms, shape (n_wfs, nsamples_output).
current_peak_indices – Index of the current peak for each waveform before MWA and alignment, shape (n_wfs,).

legendsimflow.psl.symmetrize(a)¶

Mirror a half [n_r, n_z] R/Z map about r = 0 into a full image.

Return type:: ndarray

legendsimflow.reboost module¶

legendsimflow.reboost._cluster_photoelectrons_flat(offsets, t, a, thr)¶

Numba-accelerated clustering kernel for innermost list level.

Parameters:

offsets (ndarray) – 1D int64 array of list offsets (length = n_lists + 1).
t (ndarray) – 1D array of sorted times for all elements.
a (ndarray) – 1D array of amplitudes corresponding to times.
thr (float) – Maximum time span within a cluster.

Return type:

tuple[ndarray, ndarray, ndarray]

Returns:

out_t – Clustered times (first time in each cluster).
out_a – Clustered amplitudes (sum of amplitudes in each cluster).
counts – Number of clusters per original list.

legendsimflow.reboost._listoffset_chain(layout)¶

Extract the chain of offsets from nested ListOffsetArrays.

Parameters:

layout (Content) – An awkward array layout.

Return type:

tuple[list[ndarray], NumpyArray]

Returns:

offsets_chain – List of np.int64 arrays, one per nested list depth, from outermost to innermost.
content_numpy_layout – The final NumpyArray content node.

legendsimflow.reboost.cluster_photoelectrons(times, amps, thr)¶

Cluster photoelectrons within the instrument time resolution.

Clusters hits at axis=-1 (innermost lists) such that within each cluster the time span (last_time - first_time) does not exceed thr. This is useful for combining photoelectrons that arrive within the time resolution of the detector, treating them as a single detected event.

The output time is the first time of each cluster; the amplitude is the sum of all amplitudes in the cluster.

Parameters:

times (Array) – Awkward array of hit times. Must be sorted in ascending order within each innermost list. Sorting is the caller’s responsibility; unsorted input produces undefined behavior.
amps (Array) – Awkward array of amplitudes corresponding to times. Must have the same structure (nesting depth and list lengths) as times.
thr (float) – Maximum time span within a cluster (e.g., the detector time resolution).

Return type:

tuple[Array, Array]

Returns:

clustered_times – Awkward array with the same nesting structure, containing the first time of each cluster.
clustered_amps – Awkward array with the same nesting structure, containing the summed amplitude of each cluster.

Raises:

ValueError – If times and amps have different nesting depths or different numbers of elements.

Examples

>>> times = ak.Array([[0.0, 0.6, 1.1, 1.4, 2.3]])
>>> amps = ak.Array([[1.0, 2.0, 3.0, 4.0, 5.0]])
>>> t_out, a_out = cluster_photoelectrons(times, amps, thr=1.0)
>>> ak.to_list(t_out)
[[0.0, 1.1, 2.3]]
>>> ak.to_list(a_out)
[[3.0, 7.0, 5.0]]

legendsimflow.reboost.extract_detailed_psd_observables(chunk, edep_active, energy, dt_map, pulse_shape_lib, det_loc, *, aoe_res, aoe_mean, psdcuts, current_reso=None)¶

Extract PSD observables for a chunk of events in an HPGe detector.

This function calculates the A/E observable, its classifier, and the single-site flag for a chunk of events in an HPGe detector, using the provided drift-time maps and current model parameters.

Parameters:

chunk (Array) – Awkward array containing the events to process. Must have fields ‘xloc’, ‘yloc’, ‘zloc’.
edep_active (Array) – Energy deposited in the active volume per hit, used for A/E calculation.
energy (Array) – Energy deposited in the active volume, used for A/E calculation.
dt_map (dict[str, HPGeRZField]) – Dictionary of drift-time maps for different crystal axes, as returned by load_hpge_dtmaps().
pulse_shape_lib (HPGePulseShapeLibrary) – Dictionary of waveform templates for different crystal axes, as returned by load_hpge_realistic_psl().
det_loc (Position) – Position of the detector in the global coordinate system, used for drift time correction.
det_name – Name of the detector.
aoe_res (ArrayLike) – A/E resolution (sigma) used for A/E classifier calculation, typically determined from data.
aoe_mean (ArrayLike) – A/E mean used in the A/E classifier calculation, typically from fitting simulated data.
psdcuts (Mapping) – Dictionary containing the low and high side cuts for the A/E classifier to determine single-site events.
current_reso (float | None) – Standard deviation of the Gaussian noise to smear the maximum current, representing the current resolution of the detector.

Return type:

Returns:

an ak.Array with fields –

aoe: A/E observable for each event.
aoe_class: A/E classifier (normalized to resolution) for each event.
is_single_site: boolean flag indicating whether the event is classified as single-site based on A/E cuts.
t_max: drift time at the position of maximum current, useful for further PSD or analysis.

legendsimflow.reboost.extract_psd_observables(chunk, edep_active, energy, dt_map, currmod_pars, det_loc, *, aoe_res, aoe_mean, psdcuts, current_reso)¶

Extract PSD observables for a chunk of events in an HPGe detector.

This function calculates the A/E observable, its classifier, and the single-site flag for a chunk of events in an HPGe detector, using the provided drift-time maps and current model parameters.

Parameters:

chunk (Array) – Awkward array containing the events to process. Must have fields ‘xloc’, ‘yloc’, ‘zloc’.
edep_active (Array) – Energy deposited in the active volume per hit, used for A/E calculation.
energy (Array) – Energy deposited in the active volume, used for A/E calculation.
dt_map (dict[str, HPGeRZField]) – Dictionary of drift-time maps for different crystal axes, as returned by load_hpge_dtmaps().
currmod_pars (Mapping) – Dictionary of parameters for the current model, (see reboost.hpge.psd.get_current_template())
det_loc (Position) – Position of the detector in the global coordinate system, used for drift time correction.
det_name – Name of the detector.
aoe_res (ArrayLike) – A/E resolution (sigma) used for A/E classifier calculation, typically determined from data.
aoe_mean (ArrayLike) – A/E mean used in the A/E classifier calculation, typically from fitting simulated data.
psdcuts (Mapping) – Dictionary containing the low and high side cuts for the A/E classifier to determine single-site events.
current_reso (float) – Standard deviation of the Gaussian noise to smear the maximum current, representing the current resolution of the detector.

Return type:

Returns:

an ak.Array with fields –

aoe: A/E observable for each event.
aoe_class: A/E classifier (normalized to resolution) for each event.
is_single_site: boolean flag indicating whether the event is classified as single-site based on A/E cuts.
t_max: drift time at the position of maximum current, useful for further PSD or analysis.

legendsimflow.reboost.gauss_smear(arr_true, arr_reso)¶

Smear values with expected resolution.

Samples from gaussian and shifts negative values to a fixed, tiny positive value.

Return type:: Array

legendsimflow.reboost.get_remage_detector_uids(h5file, *, lh5_table='stp')¶

Get mapping of detector names to UIDs from a remage output file.

The remage LH5 output files contain a link structure that lets the user access detector tables by UID. For example:

├── stp · struct{det1,det2,optdet1,optdet2,scint1,scint2}
└── __by_uid__ · struct{det001,det002,det011,det012,det101,det102}
    ├── det001 -> /stp/scint1
    ├── det002 -> /stp/scint2
    ├── det011 -> /stp/det1
    ├── det012 -> /stp/det2
    ├── det101 -> /stp/optdet1
    └── det102 -> /stp/optdet2

This function analyzes this structure and returns:

{1: 'scint1',
'scint2',
'det1',
'det2',
'optdet1',
'optdet2'}

Parameters:

h5file (str | Path) – Path to remage output file.
lh5_table (str) – Name of the LH5 table group to inspect.

Return type:

legendsimflow.reboost.get_remage_hit_range(tcm, det_name, uid, evt_idx_range)¶

Extract the range of remage output rows for an event range.

Queries the remage TCM (stored below /tcm in stp_file) with the input evt_idx_range = [i, j] to extract the first and last index of rows (hits) in the det_name detector table that correspond to the input event range. Returns the start index and number of rows to read after it as a tuple.

Parameters:

tcm (Array) – Time-coincidence map.
det_name (str) – name of the detector table in stp_file.
uid (int) – remage unique identifier for detector det_name.
evt_idx_range (list[int]) – [first, last] (i.e. first included, last included) index of events of interest present in the remage output file. Only positive indices are supported.

Return type:

tuple[int]

legendsimflow.reboost.get_senstables(geom, det_type=None)¶

Return type:: list[str]

legendsimflow.reboost.hpge_corrected_drift_time(chunk, dt_map, det_loc)¶

HPGe drift time heuristic corrected for crystal axis effects.

Note

This function will be moved to reboost.

Return type:: Array

legendsimflow.reboost.hpge_max_current(edep, drift_time, currmod_pars, **kwargs)¶

Calculate the maximum of the current pulse.

Parameters:

edep (Array) – energy deposited at each step.
drift_time (Array) – drift time of each energy deposit.
currmod_pars (Mapping) – dictionary storing the parameters of the current model (see reboost.hpge.psd.get_current_template())
kwargs – forwarded to reboost.hpge.psd.maximum_current().

Return type:

legendsimflow.reboost.load_hpge_dtmaps(config, det_name, runid)¶

Load HPGe drift-time maps from disk.

Automatically finds and loads drift-time maps for crystal axes <100> <110>. If no map is found, None is returned.

Parameters:

config (AttrsDict) – Simflow configuration object.
det_name (str) – HPGe detector name.
runid (str) – Run identifier.

Return type:

dict[str, HPGeRZField] | None

Note

This function will be moved to reboost.

legendsimflow.reboost.load_hpge_realistic_psl(config, det_name, runid, waveform_angles=('000',))¶

Load HPGe PSL from disk.

Loads the waveforms as well as drift-time maps for both coordinates.

Parameters:

config (AttrsDict) – Simflow configuration object.
det_name (str) – HPGe detector name.
runid (str) – Run identifier.
waveform_angles (Sequence[str]) – Crystal-axis angles for which to load the (large) waveform pulse-shape libraries. Defaults to the 000-degree axis only, which is the single axis consumed by extract_detailed_psd_observables(). The drift-time maps are always loaded for both axes regardless.

Return type:

tuple[dict[str, HPGeRZField], dict[str, HPGePulseShapeLibrary]] | None

Returns:

A tuple of two dictionaries (the first contains the drift-time maps for different crystal axes, keyed by angle.)
The second contains the corresponding pulse-shape libraries.
If no valid maps are found, None is returned for both.

Note

This function will be moved to reboost.

legendsimflow.reboost.make_output_chunk(chunk)¶

Prepare output detector table chunk for the hit tier.

Note

This function will be moved to reboost.

Return type:: Table

legendsimflow.reboost.smear_photoelectrons(array, fwhm_in_pe, rng=None)¶

Smear photoelectron pulse amplitudes.

Returns an array of gaussian distributed single-photoelectron amplitudes with the same shape of the input array.

Return type:: Array

legendsimflow.reboost.write_chunk(chunk, objname, outfile, objuid)¶

Write detector table chunks for the hit tier to disk.

Note

This function will be moved to reboost.

Return type:: None

legendsimflow.spms_pars module¶

legendsimflow.spms_pars._next_rc_evt_file(evt_files, rc_file_state)¶

Return the next evt file, cycling through the list before repeating.

Parameters:

evt_files (Sequence[str | Path]) – Ordered sequence of evt file paths to cycle through.
rc_file_state (dict[str, Any]) – Mutable state dict shared across calls. On the first call it is populated with keys order (the file list), idx (current position, int), and completed_cycle (bool, set to True once the list has been exhausted once). Subsequent calls increment idx and wrap it when all files have been visited.

Return type:

str | Path

Returns:

str | Path – Path to the next evt file to process.

legendsimflow.spms_pars._process_spms_windows(time, energy, win_ranges, time_domain_ns, min_sep_ns)¶

Process SiPM data within specified window ranges.

Each (start, end) range in win_ranges is tiled with non-overlapping windows of length time_domain_ns[1] - time_domain_ns[0], separated by min_sep_ns. PE hits falling inside each window are selected and their times are shifted so that the window start maps to time_domain_ns[0].

The function works on arrays of any rank. For N-D input (e.g. shape (n_events, n_channels, n_pe)), each extracted window produces one output block of the same shape along all but the innermost axis, with only the PE dimension filtered. Blocks from all windows are then concatenated along axis=0, so M source events processed through W windows yield W * M output entries.

Parameters:

time (Array) – PE hit times. Any shape; the innermost axis is the PE axis.
energy (Array) – PE energies, same shape as time.
win_ranges (Sequence[tuple[float, float]]) – List of (start, end) tuples defining the time ranges to tile, in nanoseconds.
time_domain_ns (tuple[float, float]) – Target time range (start, end) for output times in nanoseconds. The window length is end - start. E.g. (-1000, 5000) selects 6000 ns windows and maps their start to -1000 ns.
min_sep_ns (float) – Minimum gap between consecutive windows in nanoseconds.

Return type:

tuple[Array, Array]

Returns:

npe – PE energies extracted from all windows, concatenated along axis=0.
t0 – PE times relative to each window’s start (bounded by time_domain_ns), same shape as npe.

legendsimflow.spms_pars.build_rc_evt_index_lookup(rc_evt_files)¶

Build per-file trigger index lookup for RC extraction.

Parameters:

rc_evt_files (Sequence[str | Path]) – Evt-tier files to index.

Return type:

dict[str, dict[str, ndarray]]

Returns:

dict – Dictionary keyed by file path (as string) with entries:

forced_pulser: row indices of forced/pulser, non-muon events
geds: row indices of HPGe-triggered, non-muon events

legendsimflow.spms_pars.get_chunk_rc_data(rc_evt_files, rc_file_state, chunk_size, rc_index_lookup)¶

Assemble random-coincidence data for one chunk.

Parameters:

rc_evt_files (Sequence[str | Path]) – Ordered sequence of evt files that can provide random-coincidence data. Must not be empty.
rc_file_state (dict[str, Any]) – Mutable state for file cycling and carryover between chunks. Expected keys are created/updated internally (e.g. order, idx, counts, carryover).
chunk_size (int) – Number of random-coincidence events requested for the current chunk. Must be positive.
rc_index_lookup (dict[str, dict[str, ndarray]]) – Precomputed mapping from evt file to trigger-event indices, built with build_rc_evt_index_lookup().

Return type:

Returns:

ak.Array – Random-coincidence data for one chunk with fields rawid (chunk_size, n_channels), npe (chunk_size, n_channels, n_pe) and t0 (same shape as npe).

legendsimflow.spms_pars.get_rc_evt_mask(evt_file)¶

Compute boolean event masks for random-coincidence extraction.

Parameters:

evt_file (str | Path) – Path to the evt-tier LH5 file.

Return type:

tuple[Array, Array]

Returns:

mask_forced_pulser – Boolean mask selecting forced-trigger and pulser events, excluding muon coincidences.
mask_geds – Boolean mask selecting HPGe-triggered events, excluding muon coincidences.

legendsimflow.spms_pars.get_rc_library(evt_file, rc_index_lookup, time_domain_ns=(-1000, 5000), min_sep_ns=6000, ext_trig_range_ns=((1000, 44000), (55000, 100000)), ge_trig_range_ns=((1000, 44000),))¶

Extract a library of random-coincidence (RC) events from an evt file.

To be used in correcting the SiPM photoelectrons with random coincidences.

For each qualifying trigger event, the SiPM waveform is divided into multiple non-overlapping time windows (see _process_spms_windows). Each window yields one independent RC event, so the total number of entries in the returned library is n_source_events x n_windows. The per-channel structure is preserved: npe and t0 have shape (n_rc_events, n_channels, n_pe) and rawid has shape (n_rc_events, n_channels), matching the spms/* layout of the evt tier.

Two trigger categories are processed with different window ranges to avoid contaminating RC events with physics signal:

Forced/pulser triggers: full waveform outside the central trigger window, ((1_000, 44_000), (55_000, 100_000)) ns by default.
HPGe/LAr triggers: first half only (before the trigger), ((1_000, 44_000),) ns by default.

Both categories are filtered to exclude muon coincidences.

Parameters:

evt_file (str | Path) – Event tier data file.
rc_index_lookup (dict[str, dict[str, ndarray]]) – Precomputed mapping from evt file to trigger-event indices, built with build_rc_evt_index_lookup.
time_domain_ns (tuple[float, float]) – Target time range (start, end) for output times in nanoseconds. E.g., (-1000, 5000) means output times will be in [-1000, 5000]. Default: (-1_000, 5_000).
min_sep_ns (float) – Minimal separation time between two windows in a trace, in nanoseconds. Default 6000.
ext_trig_range_ns (Sequence[tuple[float, float]]) – Window ranges for forced/pulser trigger events, as a sequence of (start, end) pairs in nanoseconds. Default: ((1_000, 44_000), (55_000, 100_000)).
ge_trig_range_ns (Sequence[tuple[float, float]]) – Window ranges for HPGe/LAr trigger events, as a sequence of (start, end) pairs in nanoseconds. Default: ((1_000, 44_000),).

Return type:

Returns:

ak.Array – Record array with fields rawid (channel UIDs, shape (n_rc_events, n_channels)), npe (PE energies, shape (n_rc_events, n_channels, n_pe)), and t0 (times relative to each window start, same shape as npe). Channel ordering within each event matches the source spms/rawid ordering in evt_file.

legendsimflow.spms_pars.lookup_evt_files(l200data, runid, evt_tier_name)¶

Look up the evt tier file paths for a given run.

Parameters:

l200data (str | Path) – Root path to the LEGEND-200 data directory.
runid (str) – Run identifier string (e.g. "l200-p16-r008-phy").
evt_tier_name (str) – Name of the evt tier (e.g. "evt").

Return type:

list[Path]

Returns:

list[Path] – Matching evt-tier file paths for the given run.

legendsimflow.superpulses module¶

Module to implement the dataset preparation and average (“superpulse”) construction to characterize the pulse shape response of HPGe detectors.

This is an important step in tuning the pulse shape discrimination (PSD) simulations

class legendsimflow.superpulses.Slice(energy_range, drift_time_range)¶

Bases: object

Defines a 2D slice in the energy-drift-time space.

Parameters:

energy_range (tuple[float, float]) – Lower and upper bounds of the energy slice, in keV. Example: (1500.0, 2000.0)
drift_time_range (tuple[float, float]) – Lower and upper bounds of the drift time slice, in ns. Example: (900.0, 1100.0)

property drift_time_center: float¶: Center of the drift time range, in ns.

drift_time_range: tuple[float, float]¶

property energy_center: float¶: Center of the energy range, in keV.

energy_range: tuple[float, float]¶

class legendsimflow.superpulses.Superpulse(charge_wf, current_wf, charge_time_axis, current_time_axis, slice, detector, n_events_preliminary, n_events_final)¶

Bases: object

Holds the average charge and current waveforms for one slice.

One Superpulse instance is produced per slice per detector, after the full preprocessing and chi2 self-similarity cut. It carries both waveforms together with the metadata needed to identify the slice, write to LH5, and perform the subsequent electronics parameter optimisation.

Examples

Build a superpulse and access its waveforms:

sp = Superpulse(
    charge_wf=avg_charge,
    current_wf=avg_current,
    charge_time_axis=t_charge,
    current_time_axis=t_current,
    slice=Slice((1500., 2000.), (900., 1100.)),
    detector="V03422A",
    n_events_preliminary=120,
    n_events_final=98,
)
sp.charge_wf            # np.ndarray
sp.current_wf           # np.ndarray
sp.slice.drift_time_center  # 1000.0 ns

Use as dict value (Slice is hashable):

superpulses: dict[Slice, Superpulse] = {}
superpulses[sp.slice] = sp

Construct the superpulse.

Parameters:

charge_wf (ndarray) – Average normalised charge waveform, shape (n_charge_samples,). Amplitude is dimensionless (ADC / cuspEmax, normalised to 1 at plateau).
current_wf (ndarray) – Average current waveform, shape (n_current_samples,). Units: (ADC / cuspEmax) / ns * dt_data, matching the convention in psl.py.
charge_time_axis (ndarray) – Time axis for the charge waveform in ns, shape (n_charge_samples,), aligned so that tp_aoe_max = 0.
current_time_axis (ndarray) – Time axis for the current waveform in ns, shape (n_current_samples,), aligned so that tp_aoe_max = 0.
slice (Slice) – The energy-drift-time slice this superpulse represents.
detector (str) – Detector name, e.g. "V03422A".
n_events_preliminary (int) – Number of waveforms used to build the preliminary superpulse (before the chi2 cut).
n_events_final (int) – Number of waveforms surviving the chi2 cut, used to build this superpulse.

to_lgdo()¶

Serialise to an Struct ready for writing to LH5.

Returns:

Struct – Struct with the following fields:

charge_wf : Array, shape (n_charge_samples,)
current_wf : Array, shape (n_current_samples,)
charge_time_axis : Array, shape (n_charge_samples,), attrs {"units": "ns"}
current_time_axis : Array, shape (n_current_samples,), attrs {"units": "ns"}
drift_time_center : Scalar, drift time center [ns]
drift_time_lo : Scalar, drift time lower bound [ns]
drift_time_hi : Scalar, drift time upper bound [ns]
energy_lo : Scalar, energy lower bound [keV]
energy_hi : Scalar, energy upper bound [keV]
detector : Scalar, detector name string
n_events_preliminary : Scalar
n_events_final : Scalar

legendsimflow.superpulses._get_dsp_config(dsp_config)¶

Return type:: dict

legendsimflow.superpulses._get_nested_field(data, field)¶

Return type:: Array

legendsimflow.superpulses._read_and_sel_evts(evt_files, detector, t0_field=None, aoe_low_threshold=-3.0, aoe_high_threshold=3.0)¶

Read evt data and perform basic selections.

Return type:: Array

legendsimflow.superpulses._select_data_in_slice(drift_time, energy, drift_slice)¶

Filter single-detector event data to one energy-drift-time slice.

Return type:: Array

legendsimflow.superpulses.compute_chi2(charge_wfs, superpulse, bl_std, cuspEmax)¶

Compute the reduced chi-squared of each charge waveform vs the superpulse.

The normalised noise is sigma_wf = bl_std / cuspEmax for each event, and the superpulse: sigma is the mean of these values.

Parameters:

charge_wfs (ndarray) – 2D array of shape (n_events, n_samples).
superpulse (Superpulse) – Preliminary superpulse.
bl_std (ndarray) – Per-event baseline standard deviation in ADC units, shape (n_events,).
cuspEmax (ndarray) – Per-event energy estimator in ADC units, shape (n_events,).

Return type:

ndarray

Returns:

1D array of shape (n_events,) with reduced chi-squared values.

Notes

The chi-squared is computed as:

chi2 = nansum((wf - superpulse.charge_wf)**2
              / (sigma_wf**2 + sigma_sp**2))
reduced_chi2 = chi2 / n_dof

where n_dof = n_samples.

legendsimflow.superpulses.get_drift_time(data, end_time_field, t0_field)¶

Get the drift time.

Return type:: Array

legendsimflow.superpulses.get_wfs_for_slice(raw_files, lh5_group, hit_indices, file_indices, *, dsp_config, charge_output='wf_pz_win', current_output='curr_av', bl_output='bl_std_win', energy_output='cuspEmax', align='tp_aoe_max')¶

Extract aligned charge and current waveforms for a set of slice events.

Uses the DSP processing chain via WaveformBrowser. After alignment each event has the same sampling rate and number of samples but a different x-offset. This function collects all events, finds the overlapping time region, and trims every waveform to that common window.

Parameters:

raw_files (list[str]) – List of raw files.
hit_indices (list[int]) – The list of rows in each file to read.
file_indices (list[int]) – List of file indices to read.
lh5_group (str) – HDF5 group containing the waveform table, e.g. "ch1084803/raw".
dsp_config (str | dict | Path) – Path to the production DSP configuration JSON file.
charge_output (str) – DSP output name for the charge waveform.
current_output (str) – DSP output name for the current waveform.
bl_output (str) – DSP output name for the baseline standard deviation.
energy_output (str) – DSP output name for the energy,
align (str) – DSP parameter used to align waveforms on the time axis.

Return type:

AttrsDict | None

Returns:

A dictionary with the following fields (or None if no valid waveforms found) –

charge_times: Common time axis for charge waveforms, shape (n_common_charge,).
current_times: Common time axis for current waveforms, shape (n_common_current,).
charge_wfs: Trimmed charge waveforms, shape (n_valid, n_common_charge).
current_wfs: Trimmed current waveforms, shape (n_valid, n_common_current).
bl_std: Baseline standard deviation for each event, shape (n_valid,).
energy: Energy estimator for each event, shape (n_valid,).

legendsimflow.superpulses.lookup_superpulse_inputs(l200data, metadata, runid, hpge, max_files=None, *, evt_tier_name='evt')¶

Look up all inputs needed to build superpulses for one detector and run.

Parameters:

l200data (str | Path) – Path to the L200 data production directory.
metadata (LegendMetadata) – The metadata instance.
runid (str) – LEGEND run identifier, e.g. "l200-p16-r008-ssc".
hpge (str) – Detector name, e.g. "V03422A".
max_files (int | None) – Limit the number of files per tier. Default: all.
evt_tier_name (str) – Name of the evt tier to look for, e.g. “evt” or “pet”. Default: “evt”.

Return type:

tuple[list[Path], list[Path], Path, dict[str, int], str]

Returns:

raw_files, evt_files – Sorted lists of LH5 file paths for the raw and evt tiers.
dsp_cfg_file – Path to the DSP configuration file.
tab_map – Mapping {detector_name: rawid}.
data_runid – Run whose data was looked up: the reference calibration run for a physics runid, otherwise runid itself.

legendsimflow.superpulses.lookup_wfs_indices(slices, *, evt_files, n_target, detector, t0_field='spms/event_t0', end_time_field='geds/psd/low_aoe/time')¶

Extract the indices of the waveforms to use in superpulse construction.

The accumulation is stopped when either every slice has more than n_target waveforms of the average is more than 3 times n_target.

Parameters:

slices (list[Slice]) – A list of slices to extract wf indices for.
evt_files (list[str]) – List of evt files.
n_target (int) – The maximum number of waveforms to select.
detector (str) – The detector to use.
t0_field (str | None) – Field for the start time, if None will be set to 0.
end_time_field (str) – Field for the end-time of the drift time calculation.

Return type:

list[AttrsDict]

Returns:

a tuple of – a list with the same length as “slices”, each element is an AttrsDict with three fields - “file_idx”: the indice of the file lists containing the selected waveforms, - “hit_idx”: the row of the files, - “n_sel”: the number of selected waveforms. and a list of all drift times (of considered files)

legendsimflow.superpulses.plot_chi2_cut(chi2_values, chi2_threshold, times, wfs, final_superpulse, curve='charge')¶

Two-panel validation plot for the chi2 self-similarity cut.

Left: histogram of reduced chi2 with threshold line. Right: all waveforms color-coded by cut status (blue=pass, red=fail) with the final superpulse overlaid.

Parameters:

chi2_values (ndarray) – Reduced chi2 per waveform, shape (n_events,).
chi2_threshold (float) – Cut threshold.
times (ndarray) – Time axis, shape (n_samples,).
wfs (ndarray) – All waveforms before cut, shape (n_events, n_samples).
final_superpulse (Superpulse) – Superpulse built from golden waveforms (after cut).
curve (str) – "charge" or "current". Default "charge".

Return type:

tuple[Figure, tuple]

Returns:

fig (matplotlib.figure.Figure)
(ax_hist, ax_wfs) (tuple of Axes)

legendsimflow.superpulses.plot_current_superpulses_fwhm_and_amplitude(lh5_file, detector, dt_range_tuning=None)¶

Plot FWHM and peak amplitude of current superpulses vs drift time.

Two-row figure sharing the x-axis (drift time center). Color coding matches plot_superpulses().

Parameters:

lh5_file (str) – Path to the LH5 file produced by write_superpulses().
detector (str) – Detector name (top-level group in the file).
dt_range_tuning (tuple[float, float] | None) – Optional (dt_lo, dt_hi) tuple marking the drift time range used for tuning.

Return type:

Returns:

fig (matplotlib.figure.Figure)
(ax_fwhm, ax_amp) (tuple of matplotlib.axes.Axes)

Notes

dt_range_tuning shows the contiguous span of slices used for the fit; individual points within this range may not all have been used if max_num_superpulses further truncated them.

legendsimflow.superpulses.plot_superpulses(lh5_file, detector, curve='charge')¶

Plot superpulses from all drift-time slices, color-coded by drift time.

Parameters:

lh5_file (str) – Path to the LH5 file produced by write_superpulses().
detector (str) – Detector name (top-level group in the file).
curve (str) – "charge" or "current".

Return type:

Returns:

fig (matplotlib.figure.Figure)
ax (matplotlib.axes.Axes)

legendsimflow.superpulses.plot_wfs_and_superpulse(charge_times, current_times, golden_charge_wfs, golden_current_wfs, superpulse, *, xlims=(-1000, 3000))¶

Plot golden charge and current waveforms with the final superpulse.

Two-panel figure: top = charge waveforms + superpulse, bottom = current waveforms + superpulse. Each panel uses its own time axis, but both are clipped to the intersection of the two time ranges for consistent visualization.

Parameters:

charge_times (ndarray) – Time axes (may have different ranges).
current_times (ndarray) – Time axes (may have different ranges).
golden_charge_wfs (ndarray) – Selected waveform arrays after chi2 cut, shape (n_events, n_samples).
golden_current_wfs (ndarray) – Selected waveform arrays after chi2 cut, shape (n_events, n_samples).
superpulse (Superpulse) – The final superpulse to overlay.

Returns:

fig
(ax_charge, ax_current) axes for the two panels.

legendsimflow.superpulses.read_superpulses(path, detector)¶

Read superpulses written by write_superpulses().

Parameters:

path (str) – Path to the LH5 file.
detector (str) – Detector name (top-level group), e.g. "V03422A".

Return type:

dict[Slice, Superpulse]

Returns:

dict[Slice, Superpulse]

legendsimflow.superpulses.write_superpulses(superpulses, output_path, detector, *, wo_mode='write_safe')¶

Write all per-slice superpulses for one detector to a single LH5 file.

Iterates over superpulses, calls Superpulse.to_lgdo() for each, and writes the result as a named group inside the file.

The LH5 file structure is:

{detector}/
    dt_{lo}_{hi}_ns/
        charge_wf             [Array, n_charge_samples]
        current_wf            [Array, n_current_samples]
        charge_time_axis      [Array, n_charge_samples, units=ns]
        current_time_axis     [Array, n_current_samples, units=ns]
        drift_time_center     [Scalar, ns]
        drift_time_lo         [Scalar, ns]
        drift_time_hi         [Scalar, ns]
        energy_lo             [Scalar, keV]
        energy_hi             [Scalar, keV]
        detector              [Scalar, str]
        n_events_preliminary  [Scalar]
        n_events_final        [Scalar]

Parameters:

superpulses (dict[Slice, Superpulse]) – Dictionary mapping each slice to its final superpulse, as accumulated in the per-slice processing loop.
output_path (str) – Path to the output LH5 file, e.g. "output/V03422A_superpulses.lh5".
detector (str) – Detector name, used as the top-level group name in the LH5 file.
wo_mode (str) – Write mode for lh5 files. Default: “write_safe”.

Return type:

legendsimflow.tcm module¶

legendsimflow.tcm.build_tcm(hit_files, out_file)¶

Re-create the TCM table from remage.

Use remage fields evtid and t0 (the latter is assumed to be in nanoseconds) to build coincidences. The settings are identical to the remage built-in TCM settings.

Return type:: None

legendsimflow.tcm.merge_stp_n_opt_tcms(tcm_stp, tcm_opt, *, scintillator_uid)¶

Merge tcm_opt rows into tcm_stp at the scintillator uid.

For each axis=0 row of tcm_stp, if tcm_stp.table_key contains scintillator_uid, replace that single element by splicing in the next row of tcm_opt.table_key. The same splice is applied to row_in_table using the corresponding tcm_opt.row_in_table, preserving alignment between table_key[i][j] and row_in_table[i][j].

Parameters:

tcm_stp – Awkward record arrays with fields table_key and row_in_table.
tcm_opt – Awkward record arrays with fields table_key and row_in_table.
scintillator_uid – Scalar value in tcm_stp.table_key marking where to splice in tcm_opt, i.e. the UID of the scintillator table.

Returns:

ak.Array – Record array with the same length as tcm_stp.

legendsimflow.tcm.merge_stp_n_opt_tcms_chunk(tcm_stp, tcm_opt, *, scintillator_uid)¶

Chunk-level implementation of merge_stp_n_opt_tcms().

This function assumes tcm_opt contains exactly as many rows as there are rows in tcm_stp that contain scintillator_uid, in the same order.

legendsimflow.tcm.merge_stp_n_opt_tcms_to_lh5(stp_file, opt_file, out_file, *, scintillator_uid, buffer_len='50*MB')¶

Stream-merge STP and OPT TCMs and write unified TCM to disk in chunks.

Iterates over stp_file:/tcm using LH5Iterator. For each chunk, reads only the required number of OPT TCM rows (those corresponding to STP rows containing the scintillator_uid placeholder) via lh5.read_as with explicit indices. The merged output is appended to out_file:/tcm.

Return type:: None

legendsimflow.utils module¶

legendsimflow.utils._curve_fit_popt_to_dict(popt)¶

Get the scipy.optimize.curve_fit() parameter results as a dictionary.

Return type:: dict

legendsimflow.utils._make_path(d)¶

legendsimflow.utils._merge_defaults(user, default)¶

Recursively merge default values into user configuration.

Merges values from default into user without overwriting existing user values. For nested dictionaries, performs recursive merge.

Parameters:

user (dict) – User configuration dictionary.
default (dict) – Default configuration dictionary.

Return type:

Returns:

dict – Merged configuration dictionary with user values taking precedence.

legendsimflow.utils.add_field_string(name, chunk, data)¶

Add a string to the output table.

This is done in an HDF5-friendly way by storing the runid as a fixed-length string.

Return type:: None

legendsimflow.utils.apply_path_defaults(paths)¶

Set default values for optional path keys derived from paths.pars.

The following keys are optional in the Simflow configuration and, if absent, are derived from paths.pars:

geom: defaults to {paths.pars}/geom
dtmaps: defaults to {paths.pars}/hpge/dtmaps

Parameters:: paths (dict) – The paths section of the Simflow configuration, with all values already converted to pathlib.Path objects.
Return type:: None

legendsimflow.utils.check_nans_leq(array, name, less_than_frac=0.1, min_entries=100)¶

Raise an exception if the fraction of NaN values in array is above threshold.

Parameters:

array (ArrayLike) – the array to analyze.
name (str) – array name for exception message.
less_than_frac (float) – raise exception if fraction of NaNs is above this threshold.
min_entries (int) – minimum number of entries required to apply the fraction check. With fewer entries, a warning is logged instead of raising an exception.

Return type:

legendsimflow.utils.get_dict_value(d, field, default=None)¶

Return a value from a nested dictionary using a dot-separated field path.

Parameters:

d (dict) – Dictionary to query.
field (str) – Dot-separated path (e.g. "a.b.c").
default (Any | None) – Value returned if the field is not found. Defaults to None.

Return type:

Any

legendsimflow.utils.get_evt_tier_name(l200data)¶

Extract the name of the evt tier for this production cycle.

If the pet tier is present this is used else the evt tier is used.

Parameters:: l200data (str) – Path to the production cycle of l200 data.
Return type:: str

legendsimflow.utils.get_hit_tier_name(l200data)¶

Extract the name of the hit tier for this production cycle.

If the pht tier is present this is used else the hit tier is used.

Parameters:: l200data (str) – Path to the production cycle of l200 data.
Return type:: str

legendsimflow.utils.hash_dict(d)¶

Compute the hash of a Python dict.

Return type:: str

legendsimflow.utils.init_generated_pars_db(l200data, tier=None, lazy=True)¶

Initializes the pars database from a LEGEND-200 data production.

Parameters:

l200data (str | Path) – path to LEGEND-200 data production cycle.
tier (str | None) – pars subfolder referring to a tier. If None, return the full par database.
lazy (bool) – see TextDB.

Return type:

TextDB

legendsimflow.utils.init_simflow_context(raw_config, workflow=None, logger=None)¶

Pre-process and sanitize the Simflow configuration.

Returns a dictionary with useful objects to be used in the Simflow Snakefiles (i.e. the “context”):

set default configuration fields;
substitute $_ and environment variables;
convert to AttrsDict;
cast filesystem paths to pathlib.Path;
clone and configure legend-metadata;
attach a LegendMetadata instance to the Simflow configuration;
export important environment variables.

Parameters:

raw_config (dict | AttrsDict | str | Path) – Simflow configuration mapping or path to a configuration file.
workflow – Snakemake workflow instance. If None, occurrences of $_ in the configuration will be replaced with the path to the current working directory.
logger (Logger | None) – Logger to use for status messages (e.g. the Snakemake logger when called from a Snakefile). Defaults to the module logger.

Return type:

legendsimflow.utils.link_external_paths(config, workflow_basedir, *, logger=None)¶

Symlink user-overridden paths back into their default locations.

When the user has manually overridden a paths.<key> entry in simflow-config.yaml to point outside the current production cycle (e.g. reusing the hit tier from another production), this function creates a symlink at the canonical default location pointing to the override. Snakemake rules keep reading config.paths.<key> directly; the symlink only exists so the prod cycle’s own generated/ tree shows the external data in the standard layout.

The default locations are computed from the simflow’s own template at <workflow_basedir>/../templates/default.yaml, with $_ substituted to the current working directory (the prod cycle root in the standard Snakemake invocation). Created symlinks are relative to the destination’s parent, keeping the prod cycle portable.

For each supported key:

if config.paths.<key> resolves to the default location, no override is in effect; any stale symlink at that location is removed;
otherwise a symlink is created (or refreshed) at the default location pointing to config.paths.<key>.

Real directories at the default location are never touched. The call is a safe no-op when <workflow_basedir>/../templates/default.yaml does not exist.

Supported keys (relative to paths): every tier.<name>, pars, macros, geom and dtmaps. Default paths for geom and dtmaps fall back to <pars>/geom and <pars>/hpge/dtmaps when absent from the template (mirroring apply_path_defaults()).

Parameters:

config (AttrsDict) – Simflow configuration as returned by init_simflow_context().
workflow_basedir (str | Path) – Snakemake workflow basedir (workflow.basedir in a Snakefile). Used only to locate the simflow’s default template.
logger (Logger | None) – Logger to use for status messages (e.g. the Snakemake logger when called from a Snakefile). Defaults to the module logger.

Return type:

legendsimflow.utils.lookup_dataflow_config(l200data)¶

Finds and loads the dataflow configuration file.

Parameters:

l200data (Path | str) – The path to the L200 data production cycle.

Return type:

Returns:

the dataflow configuration file as a dictionary with substitutions
performed.

legendsimflow.utils.lookup_dsp_config(l200data)¶

Find the ICPC DSP processing-chain configuration file in an l200data production.

Parameters:: l200data (str | Path) – The path to the L200 data production cycle.
Return type:: Path
Returns:: path to the single matching DSP configuration file.

legendsimflow.utils.sanitize_dict_with_defaults(read_dict, defaults)¶

Swap-in defaults when values are illegal.

Return type:: dict

legendsimflow.utils.setup_logdir_link(config)¶

Set up the timestamp-tagged directory for the workflow log files.

Parameters:

config (AttrsDict) – Simflow configuration object.
proctime – Processing time identifier for the log directory.

Return type: