legend-simflow¶
End-to-end Snakemake workflow to run Monte Carlo simulations of signal and background signatures in the LEGEND experiment and produce probability-density functions (pdfs). Configuration metadata (e.g. rules for generating simulation macros or post-processing settings) is stored at legend-simflow-config.
Key concepts¶
“(Snakemake) workflow”, “legend-simflow”, “Simflow” are used interchangeably throughout the documentation to refer to the same concept.
The Simflow is currently tailored around the remage simulation ecosystem and uses the LEGEND HDF5 (LH5) file format for inputs and outputs.
Simulations are labeled by a unique identifier (e.g.
hpge-bulk-2vbb), often referred to assimid(simID). The identifiers are defined in legend-simflow-config throughsimconfig.yamlfiles in tier directoriesstpandvtx.A simulation (
simid) can consist of several jobs (simulation jobs with different random seeds run in parallel). Each job is assigned its ownjobid(integer number).Each simulation is defined by a template macro (also stored as metadata) and by a set of rules (in
simconfig.yaml) needed to generate the actual macros (template variable substitutions, number of primaries, number of jobs, etc).The production is organized in tiers. The state of a simulation in a certain tier is labeled as
<tier>.<simid>. Snakemake understands this syntax.The production can be restricted to a subset of simulations by passing a list of identifiers to Snakemake.
The generated pdfs refer to a user-defined selection of LEGEND data taking runs. Such a list of runs is specified through the configuration file.
Workflow overview¶
---
title: DAG
---
flowchart TB
id0[all]
id1[plot_tier_stp_vertices]
id2[build_tier_stp]
id3[build_geom_gdml]
id4[gen_geom_config]
id5[build_tier_vtx]
id6[build_tier_opt]
id7[make_simstat_partition_file]
id8[cache_detector_usabilities]
id9[plot_tier_opt_observables]
id10[build_tier_hit]
id11[merge_hpge_drift_time_maps]
id12[build_hpge_drift_time_map]
id13[_init_julia_env]
id14[merge_current_pulse_model_pars]
id15[extract_current_pulse_model]
id16[extract_hpge_observables_models]
id17[plot_tier_hit_observables]
id18[plot_hpge_drift_time_maps]
id19[build_tier_evt]
id20[build_tier_cvt]
id21[plot_tier_cvt_observables]
style id0 fill:#D96957,stroke-width:2px,color:#333333
style id1 fill:#578DD9,stroke-width:2px,color:#333333
style id2 fill:#CBD957,stroke-width:2px,color:#333333
style id3 fill:#D97B57,stroke-width:2px,color:#333333
style id4 fill:#57D995,stroke-width:2px,color:#333333
style id5 fill:#B9D957,stroke-width:2px,color:#333333
style id6 fill:#D9D457,stroke-width:2px,color:#333333
style id7 fill:#57D9A7,stroke-width:2px,color:#333333
style id8 fill:#A7D957,stroke-width:2px,color:#333333
style id9 fill:#579ED9,stroke-width:2px,color:#333333
style id10 fill:#D9C257,stroke-width:2px,color:#333333
style id11 fill:#57D9CB,stroke-width:2px,color:#333333
style id12 fill:#D98D57,stroke-width:2px,color:#333333
style id13 fill:#D95757,stroke-width:2px,color:#333333
style id14 fill:#57D9B9,stroke-width:2px,color:#333333
style id15 fill:#95D957,stroke-width:2px,color:#333333
style id16 fill:#84D957,stroke-width:2px,color:#333333
style id17 fill:#57B0D9,stroke-width:2px,color:#333333
style id18 fill:#57D4D9,stroke-width:2px,color:#333333
style id19 fill:#D9B057,stroke-width:2px,color:#333333
style id20 fill:#D99E57,stroke-width:2px,color:#333333
style id21 fill:#57C2D9,stroke-width:2px,color:#333333
id19 --> id0
id10 --> id0
id21 --> id0
id15 --> id0
id6 --> id0
id2 --> id0
id17 --> id0
id9 --> id0
id18 --> id0
id20 --> id0
id1 --> id0
id2 --> id1
id5 --> id2
id3 --> id2
id4 --> id3
id3 --> id5
id8 --> id6
id2 --> id6
id7 --> id6
id3 --> id6
id2 --> id7
id6 --> id9
id11 --> id10
id2 --> id10
id14 --> id10
id16 --> id10
id3 --> id10
id7 --> id10
id8 --> id10
id12 --> id11
id13 --> id12
id15 --> id14
id10 --> id17
id12 --> id18
id10 --> id19
id6 --> id19
id19 --> id20
id20 --> id21
The graph of rules defined in the Simflow as generated by Snakemake (snakemake --rulegraph).¶
Geometry building:
pygeoml200is used to build, for eachsimid, a global and shared GDML geometry instance for the workflow.Tier
vtxbuilding: run simulations that generate Monte Carlo event vertices needed to some simulations in the next tier. A classic example is vertices for decays on the surface of HPGe detectors, which are generated byrevertex. Simulations that do not need a special event vertices will directly start from tierraw.Tier
stpbuilding: run full event simulations with remage. Simulation macro commands are generated according to rules defined in the metadata.The LEGEND-200 data is queried to extract several runtime quantities of interest (HPGe energy resolution, electronic noise, hardware status, etc.). These are typically referred to as the “pars”.
Run the first (hit-oriented) step of simulation post-processing. Here, a “hit” represents a collection of Geant4 “step” in a single detector (sensitive volume). In these tiers, hit-wise operations like optical map or HPGe detector models application are typically performed. The “run partitioning” is also performed at this stage (see below). Two tiers belong to this group:
Tier
opt: convolution of the optical models (i.e. optical “maps”)Tier
hit: convolution of HPGe detector models (energy and pulse shape)
In both tiers, a dedicated time-coincidence map (or TCM, see
pygama.evt.build_tcm.build_tcm()) is built to ease event reconstruction.Tier
evtbuilding: a unified TCM including all hit-oriented tiers is created and used to organize the data into an event-oriented structure, i.e. a table where every row corresponds to an event.Tier
cvtbuilding: the “concatenated event” tier performs a simple concatenation of tables from allevtfiles for asimidinto a single file.Validation plots are produced at several stages of the Simflow.
Tier
pdfbuilding: summarizeevt-tier output into histograms (the pdfs).
Run partitioning¶
“Run partitioning” refers to incorporating information about the experiment’s data taking runs for which the user wants to build pdfs:
Partition the simulated event statistics into fractions corresponding to the actual total livetime fraction spanned by each selected run. This information is extracted from
legend-metadata/datasets/runinfo.yamlFor each partition, apply HPGe models such as energy resolution or pulse-shape.
…apply optical models (detection probability lookup tables) for the scintillators.
…apply detector status flags (available in
legend-metadata/datasets/statuses)
Next steps¶
Related projects
Development