Snakemake Rules Reference

workflow/rules/aux.smk module

print_stats

Prints a table with summary runtime information for each simid.

Can be run with snakemake print_stats. The listed tiers are taken from the Simflow config field make_steps.

Note

The statistics refer to the total job wall time, as measured by Snakemake.

No wildcards are used.

print_benchmark_stats

Prints a table with summary runtime information of a benchmarking run.

Can be run with snakemake print_benchmark_stats. This functionality is useful to tune the number of remage primaries and jobs in the Simflow configuration. After printing the table, also writes an updated generated/benchmarks/generated-simconfig.yaml with suggested primaries_per_job and number_of_jobs values that can optionally be swapped in place of the source simconfig.yaml.

Note

The runtime and the simulation speed are extracted from the event simulation loop statistics reported by remage. These values do not account for other remage steps like initialization or post-processing.

No wildcards are used.

_init_julia_env

No description provided.

cache_detector_usabilities

Cache detector usabilities.

Querying the metadata for detector usability can be slow and constitute the bottleneck in post-processing (opt and hit tiers). This rule caches the mapping run -> detector -> {usability, psd_usability} on disk.

archive_plots

Archive all validation plots into a single tarball.

Must be triggered manually with snakemake archive_plots — it is not part of the default all target. Collects all plots/ subdirectories produced by the Simflow under the generated/ directory and packs them into tarballs/<cycle>-plots.tar.xz, preserving the directory tree structure.

No wildcards are used.

workflow/rules/cvt.smk module

gen_all_tier_cvt

Aggregate and produce all the cvt tier files.

build_tier_cvt

Produce a cvt tier file.

cvt stands for “concatenated evt tier”. evt files for each simulation job are concatenated/aggregated into a single file.

Uses wildcards simid.

plot_tier_cvt_observables

Produce validation plots of observable distributions from the cvt tier.

Generates diagnostic plots from all cvt output files for the given simid.

Uses wildcard simid.

workflow/rules/evt.smk module

gen_all_tier_evt

Aggregate and produce all the evt tier files.

build_tier_evt

Produce an evt tier file.

Event files re-organize the hit and opt tier data into a single, event-oriented table where each row correspond to an event.

  • a unified TCM is built from the opt and hit data. It is different from the stp tier TCM since it includes also the SiPM channels;

  • each chunk of the unified TCM is partitioned according to the livetime span of each run (see the make_simstat_partition_file rule);

  • fields from lower tiers are restructured into events;

  • new event-level fields are computed and stored in the output file;

  • optionally, random-coincidence (RC) SiPM data from real evt files is added as spms/rc_energy and spms/rc_time (controlled by add_random_coincidences in tier/evt/{experiment}/settings.yaml).

A top-level detector_uids struct mapping detector names to reboost UIDs (union of hit and opt tiers) is also written, to enable downstream per-group filtering in the pdf tier.

Uses wildcards simid and jobid.

workflow/rules/hit.smk module

gen_all_tier_hit

Aggregate and produce all the hit tier files.

build_tier_hit

Produce a hit tier file starting from a single stp tier file.

This rule implements the post-processing of the stp tier HPGe data in chunks, in the following steps:

  • each chunk is partitioned according to the livetime span of each run (see the make_simstat_partition_file rule). For each partition:

  • the detector usability and PSD usability are retrieved from legend-metadata and stored in the output;

  • the active volume model is applied based on information from legend-metadata;

  • A/E is simulated based on current signal templates extracted from LEGEND-200 data;

  • energy is smeared according to the measured energy resolution (extracted from the data production parameters database);

  • a new time-coincidence map (TCM) across the processed detectors is created and stored in the output file.

The stp data format is preserved: detector tables are stored separately in the output file below /hit/{detector_name}.

Uses wildcards simid and jobid.

plot_tier_hit_observables

Produce validation plots of observable distributions from the hit tier.

Generates diagnostic plots from all hit output files for the given simid.

Uses wildcard simid.

workflow/rules/opt.smk module

gen_all_tier_opt

Aggregate and produce all the opt tier files.

build_tier_opt

Produce a opt tier file starting from a single stp tier file.

This rule implements the post-processing of the stp tier liquid argon energy depositions in chunks, in the following steps:

  • each chunk is partitioned according to the livetime span of each run (see the make_simstat_partition_file rule). For each partition:

  • the detector usability is retrieved from legend-metadata and stored in the output;

  • scintillation photons are generated corresponding to simulated energy depositions;

  • detected photoelectrons are sampled according to the input optical map;

  • a finite resolution is applied to each photoelectron amplitude (see script);

  • photoelectrons are clustered in time to simulate the effect of finite time resolution of the system;

  • a new time-coincidence map (TCM) across the processed SiPMs is created and stored in the output file.

This rule can sample photoelectrons in each SiPM individually or for all SiPMs at the same time, see relevant param flag.

The stp data format is preserved: SiPM tables are stored separately in the output file below /hit/{sipm_name}.

Uses wildcards simid and jobid.

plot_tier_opt_observables

Produce validation plots of observable distributions from the opt tier.

Generates diagnostic plots from all opt output files for the given simid.

Uses wildcard simid.

workflow/rules/par.smk module

Rules to compute the simulation parameters (par step).

gen_all_tier_par

Produce all par step outputs.

make_simstat_partition_file

Create the simulation event statistics partitioning file.

This rule maps chunks of event indices to partitions associated to the data taking runs specified in the “runlist” (from e.g. config.runlist) and stores them on disk as YAML files. The format is the following:

job_000:
  l200-p03-r001-phy: [0, 300]
  l200-p03-r002-phy: [301, 456]
job_001:
  l200-p03-r002-phy: [0, 200]
  l200-p03-r003-phy: [201, 156]
job_002:
  l200-p03-r003-phy: [0, 50]

The events simulated in job 0 (456) are split between r001 and r002. The partition corresponding to r002 is however incomplete, and 200 events are taken from the simulation job 1.

The fraction of total simulated events (summed over all simulation jobs) that belong to a partition is determined by weighting with the fraction of livetime that belongs to that run.

Uses wildcard simid.

build_hpge_drift_time_map

Produce an HPGe drift time map.

Run a Julia script based on a pulse shape simulation performed with the SolidStateDetectors.jl package, using crystal geometry information from legend-metadata.

Uses wildcards hpge_detector and hpge_voltage.

merge_hpge_drift_time_maps

Merge HPGe drift time maps in a single file.

Copy the top-level LH5 objects from each individual detector drift time map file into a single merged file using h5copy.

Uses wildcard runid.

plot_hpge_drift_time_maps

Produce a validation plot of an HPGe drift time map.

Generates diagnostic plots of the computed drift time map for a single detector at the specified operational voltage.

Uses wildcards hpge_detector and hpge_voltage.

extract_current_pulse_model

Extract the HPGe current signal model.

Perform a fit of current signals recorded in LEGEND-200 and stores the best-fit model parameters in a YAML file.

Warning

This rule does not have the relevant LEGEND-200 data files as input, since they are dynamically discovered and this would therefore slow down the DAG generation. Therefore, remember to force-rerun if the input data is updated!

Uses wildcards runid and hpge_detector.

merge_current_pulse_model_pars

Merge the HPGe current signal model parameters in a single file per runid.

Collect the individual best-fit parameter files (one per detector) and write them into a single YAML file keyed by detector name.

Uses wildcard runid.

extract_hpge_observables_models

Extract and store on disk models of the HPGe observables for a run.

Stores YAML files with a mapping between HPGe detectors and respective information to reconstruct:

  • the energy resolution as a function of energy;

  • the A/E resolution as a function of energy;

as determined during energy calibration. This is done in a separate rule because the data production parameter database is large and we don’t want to use a lot of memory in the build_tier_hit rule.

Design: this rule is a collection step, not a validation step. It gathers what it can from l200data and simprod/config/pars/geds/eresmod/; the output may be incomplete. Completeness is validated downstream in build_tier_hit.

Uses wildcard runid.

workflow/rules/pdf.smk module

gen_all_tier_pdf

Aggregate and produce all the pdf tier files.

No wildcards are used.

archive_pdfs

Archive all pdf tier files into a single tarball.

Must be triggered manually with snakemake archive_pdfs — it is not part of the default all target. Collects all LH5 files produced under tier/pdf/ and packs them into tarballs/<cycle>-pdfs.tar.xz, preserving the directory tree structure.

No wildcards are used.

build_tier_pdf

Produce a pdf tier file.

Reads cvt tier data and bins it into histograms (the PDFs) according to the PDF configuration file.

Uses wildcard simid.

workflow/rules/stp.smk module

Rules to build the stp tier.

gen_all_tier_stp

Build the entire stp tier.

gen_geom_config

Write a geometry configuration file for legend-pygeom-l200.

Start from the template/default geometry configuration file and eventually add extra configuration options in case requested in simconfig.yaml through the geom_config_extra field.

Uses wildcards tier and simid.

build_geom_gdml

Build a concrete geometry GDML file with pygeoml200.

Run legend-pygeom-l200 to convert the geometry configuration file into a GDML file.

Uses wildcards tier and simid.

gen_remage_macro

Write the remage macro file for a stp tier simulation to disk.

Renders the macro template for the given simid using legendsimflow.commands.make_remage_macro() and writes it to the canonical macro path under generated/macros/.

Uses wildcard simid.

build_tier_stp

Run a single simulation job for the stp tier.

Invoke remage using the macro generated by legendsimflow.commands.make_remage_macro() from simconfig.yaml.

Uses wildcards simid and jobid.

Note

The output remage file is declared as protected to avoid accidental deletions, since it typically takes a lot of resources to produce it.

plot_tier_stp_vertices

Produce plots of the primary event vertices of tier stp.

Only the first file of the simulation (i.e. job ID 0) is used. The rule is given a high priority to make sure that the plot is produced early. The maximum number of plotted events is set in the plotting script.

Uses wildcard simid.

workflow/rules/vtx.smk module

build_tier_vtx

Run a single simulation job for the vtx tier.

Run the user-defined vertex generation command from vtx_simconfig.yaml, templating it with the geometry file path, output file path, and number of events.

Uses wildcards simid and jobid.