legend-simflow

End-to-end Snakemake workflow to run Monte Carlo simulations of signal and background signatures in the LEGEND experiment and produce probability-density functions (pdfs). Configuration metadata (e.g. rules for generating simulation macros or post-processing settings) is stored at legend-simflow-config.

Key concepts

  • “(Snakemake) workflow”, “legend-simflow”, “Simflow” are used interchangeably throughout the documentation to refer to the same concept.

  • The Simflow is currently tailored around the remage simulation ecosystem and uses the LEGEND HDF5 (LH5) file format for inputs and outputs.

  • Simulations are labeled by a unique identifier (e.g. hpge-bulk-2vbb), often referred to as simid (simID). The identifiers are defined in legend-simflow-config through simconfig.yaml files in tier directories stp and vtx.

  • A simulation (simid) can consist of several jobs (simulation jobs with different random seeds run in parallel). Each job is assigned its own jobid (integer number).

  • Each simulation is defined by a template macro (also stored as metadata) and by a set of rules (in simconfig.yaml) needed to generate the actual macros (template variable substitutions, number of primaries, number of jobs, etc).

  • The production is organized in tiers. The state of a simulation in a certain tier is labeled as <tier>.<simid>. Snakemake understands this syntax.

  • The production can be restricted to a subset of simulations by passing a list of identifiers to Snakemake.

  • The generated pdfs refer to a user-defined selection of LEGEND data taking runs. Such a list of runs is specified through the configuration file.

Workflow overview

        ---
title: DAG
---
flowchart TB
	id0[all]
	id1[plot_tier_stp_vertices]
	id2[build_tier_stp]
	id3[build_geom_gdml]
	id4[gen_geom_config]
	id5[build_tier_vtx]
	id6[build_tier_opt]
	id7[make_simstat_partition_file]
	id8[cache_detector_usabilities]
	id9[plot_tier_opt_observables]
	id10[build_tier_hit]
	id11[merge_hpge_drift_time_maps]
	id12[build_hpge_drift_time_map]
	id13[_init_julia_env]
	id14[merge_current_pulse_model_pars]
	id15[extract_current_pulse_model]
	id16[extract_hpge_observables_models]
	id17[plot_tier_hit_observables]
	id18[plot_hpge_drift_time_maps]
	id19[build_tier_evt]
	id20[build_tier_cvt]
	id21[plot_tier_cvt_observables]
	style id0 fill:#D96957,stroke-width:2px,color:#333333
	style id1 fill:#578DD9,stroke-width:2px,color:#333333
	style id2 fill:#CBD957,stroke-width:2px,color:#333333
	style id3 fill:#D97B57,stroke-width:2px,color:#333333
	style id4 fill:#57D995,stroke-width:2px,color:#333333
	style id5 fill:#B9D957,stroke-width:2px,color:#333333
	style id6 fill:#D9D457,stroke-width:2px,color:#333333
	style id7 fill:#57D9A7,stroke-width:2px,color:#333333
	style id8 fill:#A7D957,stroke-width:2px,color:#333333
	style id9 fill:#579ED9,stroke-width:2px,color:#333333
	style id10 fill:#D9C257,stroke-width:2px,color:#333333
	style id11 fill:#57D9CB,stroke-width:2px,color:#333333
	style id12 fill:#D98D57,stroke-width:2px,color:#333333
	style id13 fill:#D95757,stroke-width:2px,color:#333333
	style id14 fill:#57D9B9,stroke-width:2px,color:#333333
	style id15 fill:#95D957,stroke-width:2px,color:#333333
	style id16 fill:#84D957,stroke-width:2px,color:#333333
	style id17 fill:#57B0D9,stroke-width:2px,color:#333333
	style id18 fill:#57D4D9,stroke-width:2px,color:#333333
	style id19 fill:#D9B057,stroke-width:2px,color:#333333
	style id20 fill:#D99E57,stroke-width:2px,color:#333333
	style id21 fill:#57C2D9,stroke-width:2px,color:#333333
	id19 --> id0
	id10 --> id0
	id21 --> id0
	id15 --> id0
	id6 --> id0
	id2 --> id0
	id17 --> id0
	id9 --> id0
	id18 --> id0
	id20 --> id0
	id1 --> id0
	id2 --> id1
	id5 --> id2
	id3 --> id2
	id4 --> id3
	id3 --> id5
	id8 --> id6
	id2 --> id6
	id7 --> id6
	id3 --> id6
	id2 --> id7
	id6 --> id9
	id11 --> id10
	id2 --> id10
	id14 --> id10
	id16 --> id10
	id3 --> id10
	id7 --> id10
	id8 --> id10
	id12 --> id11
	id13 --> id12
	id15 --> id14
	id10 --> id17
	id12 --> id18
	id10 --> id19
	id6 --> id19
	id19 --> id20
	id20 --> id21

    

The graph of rules defined in the Simflow as generated by Snakemake (snakemake --rulegraph).

  1. Geometry building: pygeoml200 is used to build, for each simid, a global and shared GDML geometry instance for the workflow.

  2. Tier vtx building: run simulations that generate Monte Carlo event vertices needed to some simulations in the next tier. A classic example is vertices for decays on the surface of HPGe detectors, which are generated by revertex. Simulations that do not need a special event vertices will directly start from tier raw.

  3. Tier stp building: run full event simulations with remage. Simulation macro commands are generated according to rules defined in the metadata.

  4. The LEGEND-200 data is queried to extract several runtime quantities of interest (HPGe energy resolution, electronic noise, hardware status, etc.). These are typically referred to as the “pars”.

  5. Run the first (hit-oriented) step of simulation post-processing. Here, a “hit” represents a collection of Geant4 “step” in a single detector (sensitive volume). In these tiers, hit-wise operations like optical map or HPGe detector models application are typically performed. The “run partitioning” is also performed at this stage (see below). Two tiers belong to this group:

    • Tier opt: convolution of the optical models (i.e. optical “maps”)

    • Tier hit: convolution of HPGe detector models (energy and pulse shape)

    In both tiers, a dedicated time-coincidence map (or TCM, see pygama.evt.build_tcm.build_tcm()) is built to ease event reconstruction.

  6. Tier evt building: a unified TCM including all hit-oriented tiers is created and used to organize the data into an event-oriented structure, i.e. a table where every row corresponds to an event.

  7. Tier cvt building: the “concatenated event” tier performs a simple concatenation of tables from all evt files for a simid into a single file.

  8. Validation plots are produced at several stages of the Simflow.

  9. Tier pdf building: summarize evt-tier output into histograms (the pdfs).

Run partitioning

“Run partitioning” refers to incorporating information about the experiment’s data taking runs for which the user wants to build pdfs:

  • Partition the simulated event statistics into fractions corresponding to the actual total livetime fraction spanned by each selected run. This information is extracted from legend-metadata/datasets/runinfo.yaml

  • For each partition, apply HPGe models such as energy resolution or pulse-shape.

  • …apply optical models (detection probability lookup tables) for the scintillators.

  • …apply detector status flags (available in legend-metadata/datasets/statuses)

Next steps

Related projects