legend-simflow¶

End-to-end Snakemake workflow to run Monte Carlo simulations of signal and background signatures in the LEGEND experiment and produce probability-density functions (pdfs). Configuration metadata (e.g. rules for generating simulation macros or post-processing settings) is stored at legend-simflow-config.

Key concepts¶

“(Snakemake) workflow”, “legend-simflow”, “Simflow” are used interchangeably throughout the documentation to refer to the same concept.
The Simflow is currently tailored around the remage simulation ecosystem and uses the LEGEND HDF5 (LH5) file format for inputs and outputs.
Simulations are labeled by a unique identifier (e.g. hpge-bulk-2vbb), often referred to as simid (simID). The identifiers are defined in legend-simflow-config through simconfig.yaml files in tier directories stp and vtx.
A simulation (simid) can consist of several jobs (simulation jobs with different random seeds run in parallel). Each job is assigned its own jobid (integer number).
Each simulation is defined by a template macro (also stored as metadata) and by a set of rules (in simconfig.yaml) needed to generate the actual macros (template variable substitutions, number of primaries, number of jobs, etc).
The production is organized in tiers. The state of a simulation in a certain tier is labeled as <tier>.<simid>. Snakemake understands this syntax.
The production can be restricted to a subset of simulations by passing a list of identifiers to Snakemake.
The generated pdfs refer to a user-defined selection of LEGEND data taking runs. Such a list of runs is specified through the configuration file.

Workflow overview¶

        ---
title: DAG
---
flowchart TB
	id0[plot_tier_stp_vertices]
	id1[build_tier_stp]
	id2[build_geom_gdml]
	id3[gen_geom_config]
	id4[gen_remage_macro]
	id5[plot_geom]
	id6[build_tier_vtx]
	id7[cache_detector_usabilities]
	id8[cache_modelable_hpges]
	id9[make_simstat_partition_file]
	id10[extract_hpge_observables_models]
	id11[plot_hpge_drift_time_maps]
	id12[build_hpge_drift_time_map]
	id13[_init_julia_env]
	id14[build_tier_opt]
	id15[plot_tier_opt_observables]
	id16[build_tier_hit]
	id17[plot_tier_hit_observables]
	id18[build_tier_evt]
	id19[build_tier_cvt]
	id20[plot_tier_cvt_observables]
	id21[build_tier_pdf]
	id22[merge_hpge_drift_time_maps]
	id23[merge_current_pulse_model_pars]
	id24[extract_current_pulse_model]
	id25[aggregate_hpge_ssd_modeling_info]
	id26[all]
	style id0 fill:#577AD9,stroke-width:2px,color:#333333
	style id1 fill:#B5D957,stroke-width:2px,color:#333333
	style id2 fill:#D99257,stroke-width:2px,color:#333333
	style id3 fill:#57D9A9,stroke-width:2px,color:#333333
	style id4 fill:#57D9B5,stroke-width:2px,color:#333333
	style id5 fill:#57B5D9,stroke-width:2px,color:#333333
	style id6 fill:#A9D957,stroke-width:2px,color:#333333
	style id7 fill:#9ED957,stroke-width:2px,color:#333333
	style id8 fill:#92D957,stroke-width:2px,color:#333333
	style id9 fill:#57D9C1,stroke-width:2px,color:#333333
	style id10 fill:#63D957,stroke-width:2px,color:#333333
	style id11 fill:#57A9D9,stroke-width:2px,color:#333333
	style id12 fill:#D99E57,stroke-width:2px,color:#333333
	style id13 fill:#D95757,stroke-width:2px,color:#333333
	style id14 fill:#CDD957,stroke-width:2px,color:#333333
	style id15 fill:#5786D9,stroke-width:2px,color:#333333
	style id16 fill:#D9D957,stroke-width:2px,color:#333333
	style id17 fill:#5792D9,stroke-width:2px,color:#333333
	style id18 fill:#D9CD57,stroke-width:2px,color:#333333
	style id19 fill:#D9C157,stroke-width:2px,color:#333333
	style id20 fill:#579ED9,stroke-width:2px,color:#333333
	style id21 fill:#C1D957,stroke-width:2px,color:#333333
	style id22 fill:#57CDD9,stroke-width:2px,color:#333333
	style id23 fill:#57D9CD,stroke-width:2px,color:#333333
	style id24 fill:#7AD957,stroke-width:2px,color:#333333
	style id25 fill:#D96357,stroke-width:2px,color:#333333
	style id26 fill:#D96E57,stroke-width:2px,color:#333333
	id1 --> id0
	id2 --> id1
	id4 --> id1
	id6 --> id1
	id3 --> id2
	id2 --> id4
	id3 --> id5
	id2 --> id6
	id1 --> id9
	id12 --> id11
	id13 --> id12
	id9 --> id14
	id2 --> id14
	id7 --> id14
	id1 --> id14
	id14 --> id15
	id9 --> id16
	id2 --> id16
	id22 --> id16
	id10 --> id16
	id1 --> id16
	id7 --> id16
	id23 --> id16
	id16 --> id17
	id9 --> id18
	id14 --> id18
	id1 --> id18
	id7 --> id18
	id16 --> id18
	id18 --> id19
	id19 --> id20
	id19 --> id21
	id12 --> id22
	id24 --> id23
	id12 --> id25
	id19 --> id26
	id9 --> id26
	id0 --> id26
	id5 --> id26
	id11 --> id26
	id10 --> id26
	id14 --> id26
	id15 --> id26
	id20 --> id26
	id25 --> id26
	id1 --> id26
	id21 --> id26
	id7 --> id26
	id17 --> id26
	id8 --> id26
	id16 --> id26
	id18 --> id26

The graph of rules defined in the Simflow as generated by Snakemake (snakemake --rulegraph).¶

Geometry building: pygeoml200 is used to build, for each simid, a global and shared GDML geometry instance for the workflow.
Tier vtx building: run simulations that generate Monte Carlo event vertices needed to some simulations in the next tier. A classic example is vertices for decays on the surface of HPGe detectors, which are generated by revertex. Simulations that do not need a special event vertices will directly start from tier raw.
Tier stp building: run full event simulations with remage. Simulation macro commands are generated according to rules defined in the metadata.
The LEGEND-200 data is queried to extract several runtime quantities of interest (HPGe energy resolution, electronic noise, hardware status, etc.). When PSL-based pulse-shape simulation is enabled, calibration data are also used to build per-detector data superpulses and fit an electronics-response model, which are then combined with simulation-derived drift-time maps and ideal pulse-shape libraries to produce the realistic pulse-shape library (PSL) for each run. These outputs are collectively referred to as the “pars”.
Run the first (hit-oriented) step of simulation post-processing. Here, a “hit” represents a collection of Geant4 “step” in a single detector (sensitive volume). In these tiers, hit-wise operations like optical map or HPGe detector models application are typically performed. The “run partitioning” is also performed at this stage (see below). Two tiers belong to this group:
- Tier opt: convolution of the optical models (i.e. optical “maps”)
- Tier hit: convolution of HPGe detector models (energy and pulse shape)
In both tiers, a dedicated time-coincidence map (or TCM, see pygama.evt.build_tcm.build_tcm()) is built to ease event reconstruction.
Tier evt building: a unified TCM including all hit-oriented tiers is created and used to organize the data into an event-oriented structure, i.e. a table where every row corresponds to an event.
Tier cvt building: the “concatenated event” tier performs a simple concatenation of tables from all evt files for a simid into a single file.
Validation plots are produced at several stages of the Simflow.
Tier pdf building: summarize evt-tier output into histograms (the pdfs).

Run partitioning¶

“Run partitioning” refers to incorporating information about the experiment’s data taking runs for which the user wants to build pdfs:

Partition the simulated event statistics into fractions corresponding to the actual total livetime fraction spanned by each selected run. This information is extracted from legend-metadata/datasets/runinfo.yaml
For each partition, apply HPGe models such as energy resolution or pulse-shape.
…apply optical models (detection probability lookup tables) for the scintillators.
…apply detector status flags (available in legend-metadata/datasets/statuses)

Next steps¶

User manual

Related projects

remage

Development