Data names and paths¶
File naming conventions¶
The file naming convention for simulation output files is:
{experiment}-{simid}-job_{jobid}-tier_{tier}.{extension}
where each label (in curly brackets {}) should be alphanumeric (including
underscores: _) when possible. Avoid dashes (-) and other special characters
unless explicitly supported for that specific field.
experiment— name or label for the experimental configuration being simulated.simid— stands for “simulation identifier”, i.e. a string to uniquely label a simulation. Typically specifies the physical process being simulated and the experiment’s components involved.jobid— stands for “(simulation) job identifier”. It’s a zero-padded integer that labels independent jobs across which the simulation is split.tier— the three-character label of the tier. At the moment the simflow supportsvtx,stp,opt,hit,evt,cvtandpdftiers.extension— file extension.lh5for LEGEND HDF5 files,gdmlfor GDML geometry files,yamlfor plain-text YAML configuration files,logfor log files.
Metadata¶
The workflow configuration metadata is stored in
legend-simflow-config.
Documentation about which metadata is stored there (e.g. which LEGEND
experimental configuration are supported) can be found in the README.md.
In this section, the specification of the metadata format is documented.
tier/ static tier configuration¶
Metadata is organized in this directory by tier (first level) and experimental configuration (second level).
vtx tier¶
This section specifies how to configure simulations for the vtx tier,
consisting of simulated event vertices for the stp tier.
The configuration folder must contain the following files:
tier/vtx/{experiment}/simconfig.yaml
simconfig.yaml¶
This file defines a dictionary of commands used to generate vertices. The keys
in this dictionary can be referenced in the stp tier configuration metadata
(simconfig.yaml).
The configuration block must contain the command key, defining the command
block that should be used to generate the vertices. Then command string must
contain the following variables, that will be automatically substituted by the
simflow at runtime:
GDML_FILE: path to the GDML file defining the geometry that is being simulated. This is typically a required input of vertex generators.OUTPUT_FILE: output file where the vertices will be saved. This file will be an input of consumerstp-tier jobs.N_EVENTS: number of vertices to generate.
Example:
hpge_surface:
command: >-
revertex hpge-surf-pos --detectors [VB]* --surface-type nplus --gdml
{INPUT_FILE} --out-file {OUTPUT_FILE} --n-events {N_EVENTS}
stp tier¶
This section specifies how to configure
remage simulations for the stp tier. The
same conventions also apply to other tiers when relevant.
The configuration folder must contain the following files:
tier/stp/{experiment}/simconfig.yamltier/stp/{experiment}/generators.yamltier/stp/{experiment}/confinement.yaml
simconfig.yaml¶
This is the main configuration file, which defines the set of remage
simulations to run. It is a mapping from simid to a configuration block, which
configures how to generate the remage macro for that simulation.
Important
simid keys must only contain word characters and hyphens, matching the pattern
[-\w]+ (letters a–z, A–Z, digits 0–9, underscores _, and hyphens -).
In particular, dots (.) are forbidden: they are the separator between tier
and simid in the simlist format (<tier>.<simid>), so a dot inside a simid
would break parsing.
In this context hyphens are technically allowed by validation, but naming with
snake case (letters, digits, underscores only) is still recommended for
clarity, because hyphens are field separators in output file names (e.g.
{experiment}-{simid}-job_{jobid}-tier_{tier}.lh5).
Supported fields per simid:
template— path to a macro template file. The template must include placeholders used by the workflow (see Macro templates and substitutions). The special variable$_is substituted with the path to the directory that contains the configuration file.generator— a string:formatted as
~defines:NAME, whereNAMEis defined in generators.yaml.formatted as
~vertices:NAME, whereNAMEreferences a vertices simulation from thevtxtier (see vtx tier). When vertices are used as the generator they carry vertex position and kinematics, so theconfinementkey (see below) is forbidden.
confinement— one of:~defines:NAMEto reference a confinement block in confinement.yaml~volumes.bulk:PATTERNto confine to physical volumes matchingPATTERN~volumes.surface:PATTERNto sample on the surface of volumes matchingPATTERNa list of the above strings to combine multiple volume patterns
~vertices:NAMEto used the vertex positions similated by thevtxtier generatorNAME(see vtx tier).~function:NAMEto use a user-defined function to generate macro commands.NAMEshould be in a format:
module.function(<...>,*args,**kwargs)
see
legendsimflow.commands.get_confinement_from_function()for more details. This function should return a list of the remage macro commands.primaries_per_job— integer, the number of primaries per job; becomesN_EVENTSin the macro file.number_of_jobs— integer, how many jobs to split the simulation into.macro_substitutions: Optional mapping of additional placeholders to values to inject into the template (e.g.HPGE_ENERGY_THRESHOLD: 450 keV).geom_config_extra: Optional nested structure to tweak geometry configuration for thissimid. This configuration block is injected unmodified to the geometry tooling (currently legend-pygeom-l200).
Example:
hpge_bulk_Rn222_to_Po214:
template: $_/template.mac
generator: ~defines:Rn222_to_Po214
confinement: ~defines:hpge_bulk
primaries_per_job: 10_000
number_of_jobs: 4
generators.yaml¶
Defines reusable generator command snippets (Geant4 macro lines). Each key can be:
A list of strings (recommended), or
A single string containing multiple lines.
Example block simulating the \(^{238}\)U decay chain segment from \(^{222}\)Rn to \(^{214}\)Po:
Rn222_to_Po214:
- /RMG/Generator/Select GPS
- /gps/particle ion
- /gps/energy 0 eV
- /gps/ion 86 222
- /process/had/rdm/nucleusLimits 214 222 82 86
In simconfig.yaml, reference this via generator:
~defines:Rn222_to_Po214.
confinement.yaml¶
Defines reusable
remage vertex confinement
snippets. Each key is a list of lines that will be inserted into the macro when
referenced with ~defines:.
Example:
hpge_bulk:
- /RMG/Generator/Confine Volume
- /RMG/Generator/Confinement/Physical/AddVolume V.*
- /RMG/Generator/Confinement/Physical/AddVolume B.*
Alternatively, simconfig.yaml supports direct volume confinement without an entry in confinement.yaml:
~volumes.bulk:REGEXtranslates to:/RMG/Generator/Confine Volume /RMG/Generator/Confinement/Physical/AddVolume REGEX
~volumes.surface:REGEXadditionally sets:/RMG/Generator/Confinement/SampleOnSurface true
A list of such tokens combines multiple volume patterns.
Macro templates and substitutions¶
Template macros are standard remage/Geant4 macro files that can contain variable placeholders that the workflow substitutes:
$GENERATOR: Replaced with the generator block content.$CONFINEMENT: Replaced with the confinement block content.{SEED}: A 32-bit random integer generated per job.{N_EVENTS}: The number of events to simulate for the job, taken fromprimaries_per_job.Additional placeholders may be provided via the simconfig.yaml
macro_substitutionsmapping.
Example template snippets:
/RMG/Manager/Randomization/Seed {SEED}
...
$GENERATOR
$CONFINEMENT
...
/run/beamOn {N_EVENTS}
The workflow renders the template to a concrete macro and writes it to the
canonical input path for the job. It then builds the
remage CLI
either by passing this macro file with --macro-substitutions (SEED and
N_EVENTS), or by inlining the commands directly when using
remage’s “inline” mode.
Run partitioning¶
An important post-processing step of the workflow is to fold detector models with parameters that vary during the livetime of the experiment (across “data taking runs”), satisfying the following requirements:
the contribution of each data taking run over the total in terms of livetime must be represented in the final simulated event sample;
the statistical properties of the simulated event sample must be kept intact.
Ultimately, the simulated event sample must be directly comparable to the
observed event sample. The Simflow will partition the total simulated event
sample (across all simulation jobs) according to the livetime fraction of each
run (taken from
legend-datasets) and apply run
parameters to each partition. The partitions will still all live in the same
table in the output file. A new column named runid holding the run index of
each event is appended for convenience.
Run partitioning is applied to both the opt and hit tiers: the LAr response
is processed with the optical parameters and detector usability of the
corresponding run partition, and the HPGe response with the energy resolution
and PSD parameters of the same partition.
The user selects a list of data taking runs (“run list” or “data set”) that they want to simulate. Runs are specified by run-ids (“run identifiers”) in the format:
l200-<period>-<run>-<datatype>
where
periodisp03,p04, …runisr000,r001, …datatypeisphy,cal,ssc, …
In addition, the Simflow gives the possibility to get the runlist from the
runlists.yaml database file stored in the
legend-datasets repository by
prefixing the runid string with ~runlists:, followed by a dot-separated path
to the database entry. For example:
~runlists:valid.phy.p04 -> [
'l200-p04-r000-phy',
'l200-p04-r001-phy',
'l200-p04-r002-phy',
'l200-p04-r003-phy'
]
Run lists passed to the Simflow can include both runids and runlist-file-queries.
The Simflow supports specifying a global runlist in the main configuration file,
under the field runlist:
runlist:
- l200-p03-r000-phy
- l200-p03-r001-phy
- ~runlists:valid.phy.p04
Note
Per-simid runlist overrides are configured exclusively in the hit-tier
simconfig.yaml, even though the same partition is reused by the opt tier.
There is no separate opt-tier simconfig.yaml for this purpose.
hpge_bulk_Rn222_to_Po214:
runlist:
- l200-p03-r003-phy
- l200-p03-r004-phy
opt tier¶
This section specifies how to configure the opt tier, which processes the
liquid argon scintillation response from the stp tier.
scintillator_volume_name: liquid_argon
optmap_per_sipm: true
optmap_scaling_factor: 0.3
photoelectron_resolution_sigma: 0.3
time_resolution_in_ns: 16
max_pes_per_hit_per_sipm: 5
max_pes_per_hit_combined: 100
buffer_len: "10*MB"
scintillator_volume_name(str) — name of the scintillator volume in the GDML geometry used to identify liquid argon energy depositions (e.g.liquid_argon).optmap_per_sipm(bool) — whentrue, photoelectrons are sampled per SiPM channel using the per-SiPM optical map; whenfalse, the combined map across all SiPMs is used.optmap_scaling_factor(float) — factor multiplied to every map value, globally scaling the photoelectron detection probability. Maps produced with SiPM PDE set to 1 should use a value equal to the true SiPM PDE.photoelectron_resolution_sigma(float) — single-photoelectron amplitude resolution (σ, relative). Applied as Gaussian smearing to each detected photoelectron.time_resolution_in_ns(float) — SiPM time resolution in nanoseconds. Photoelectrons within this window are clustered into a single hit.max_pes_per_hit_per_sipm(int) — maximum number of photoelectrons per hit per SiPM channel (used whenoptmap_per_sipm: true). Limits memory and processing time.max_pes_per_hit_combined(int) — maximum number of photoelectrons per hit across all SiPMs combined (used whenoptmap_per_sipm: false).buffer_len(str) — LH5 read chunk size (e.g."10*MB"). Controls memory usage during processing; does not affect the output.
hit tier¶
This section specifies how to configure the post-processing of the
remage simulations from the stp tier. When
absent, accessing any field raises an error.
dead_layer_fraction: 0.5
buffer_len: "500*MB"
eresmod_default:
expression: FWHMLinear
parameters:
a: 0.5
b: 0.001
aoeresmod_default:
expression: SigmaFit
parameters:
a: 0.0001
b: 0
c: 1
psdcuts_default:
aoe:
low_side: -1.5
high_side: 3
dead_layer_fraction(float) — fraction of the dead layer thickness at which the linear ramp in charge collection efficiency starts, between0(HPGe surface, ramp begins immediately) and1(ramp begins only at the full charge collection depth, i.e. no partial collection).buffer_len(str) — LH5 read chunk size (e.g."500*MB"). Controls memory usage during processing; does not affect the output.eresmod_default— energy resolution model applied to non-ON detectors. See HPGe observable validation (build_tier_hit) for when this fallback is triggered.aoeresmod_default— A/E resolution model applied to detectors without a per-detector entry. See HPGe observable validation (build_tier_hit) for when this fallback is triggered.psdcuts_default— PSD cut values applied to detectors without a per-detector entry. See HPGe observable validation (build_tier_hit) for when this fallback is triggered.
evt tier¶
This section specifies how to configure the evt tier, which builds physics
events from the hit-level data.
add_random_coincidences: false
geds_energy_thr_kev: 25
spms_energy_thr_pe: 0
buffer_len: "50*MB"
skip_opt: false
skip_hit: false
add_random_coincidences(bool) — whentrue, random-coincidence (RC) SiPM data (taken froml200dataevt/pet tiers) is mixed in during event building.geds_energy_thr_kev(int) — HPGe hit energy threshold in keV; hits below this value are discarded.spms_energy_thr_pe(int) — SiPM hit threshold in photoelectrons; hits below this value are discarded.buffer_len(str) — LH5 read chunk size (e.g."50*MB"). Controls memory usage during processing; does not affect the output.skip_opt(bool, defaultfalse) — whentrue, theopt(SiPM/LAr) tier is skipped: the opt Snakemake rule is not run and the evt output contains only HPGe data (nospmsorcoincident/spmstables).skip_hit(bool, defaultfalse) — whentrue, thehit(HPGe) tier is skipped: the hit Snakemake rule is not run and the evt output contains only SiPM data (nogedsorcoincident/gedstables).
Note
Setting both skip_opt and skip_hit to true simultaneously is an error.
cvt tier¶
This section specifies how to configure the cvt tier, which concatenates event
files across simulation jobs.
buffer_len: "500*MB"
buffer_len(str) — LH5 read chunk size (e.g."500*MB"). Controls memory usage during processing; does not affect the output.
pdf tier¶
This section specifies how to configure the pdf tier, which produces
probability density functions from the concatenated event files.
buffer_len: "500*MB"
# optional: split 1-D PDFs by detector group
detector_groups:
icpc: "V.*"
bege: "B.*"
buffer_len(str) — LH5 read chunk size (e.g."500*MB"). Controls memory usage during processing; does not affect the output.detector_groups(mapping, optional) — maps group names to Python regex strings. Each regex is matched against LEGEND-200 detector names usingre.fullmatch, so"V.*"selects all detectors whose names start withV. When this key is absent, only the implicitallgroup is emitted (equivalent todetector_groups: {all: ".*"}). Specifyingdetector_groupsextends the output: every named group is produced in addition toall, which is always emitted regardless of the config. See pdf tier — probability density functions for the resulting output schema.
pars/ — simulation parameters¶
Metadata is organized in this directory by experimental configuration (first
level) and detector type (second level), mirroring the tier/ structure.
Drift time map settings¶
A single shared YAML file (applies to all detectors and voltages) that overrides
simulation control parameters for the
build_hpge_drift_time_map rule. When absent, the
script uses built-in production defaults.
grid_size_in_mm: 0.5
ssd_refinement_limits: [0.2, 0.1, 0.05, 0.02]
padding: 3
Key |
Type |
Default |
Description |
|---|---|---|---|
|
float |
|
Drift time map grid spacing in mm. Execution time scales quadratically with |
|
list of float |
|
SSD adaptive-mesh refinement thresholds. Each entry drives one refinement pass; smaller values give a more accurate electric field at higher cost. Overly coarse values can prevent full detector depletion — change with care. |
|
int |
|
Number of pixel layers padded around the drift time map boundary to avoid grid edge effects. |
Tip
In test or CI environments, setting grid_size_in_mm: 10.0 reduces the number
of grid points by a factor of ~400 compared to the 0.5 mm production default,
cutting script runtime from many minutes to seconds.
Energy resolution model defaults¶
An optional validity-based metadata directory providing HPGe-specific energy
resolution parameters. When present, it can supplement or fully replace
l200data as the source of energy resolution parameters — enabling simulations
for experiments that have not yet collected data (e.g. LEGEND-1000). The
structure follows the same validity-based format as
pars/{experiment}/geds/opv/.
default:
expression: FWHMLinear
parameters:
a: 0.5
b: 0.001
# optional per-detector override
V02160A:
expression: FWHMLinear
parameters:
a: 0.3
b: 0.0009
default(optional) — energy resolution model applied to all HPGe detectors not listed explicitly.<detector>(optional) — per-detector override; key is the detector name as it appears in the channel map (e.g.V02160A).
Each entry must contain:
expression— name of the energy resolution function (e.g.FWHMLinear)parameters— mapping of parameter names to their values
See HPGe energy resolution (extract_hpge_observables_models) for a description of how these files are used at runtime.
A/E resolution model defaults¶
An optional validity-based metadata directory providing HPGe-specific A/E resolution parameters. Follows the same structure and four-case logic as Energy resolution model defaults.
default:
expression: SigmaFit
parameters:
a: 0.0001
b: 0
c: 1
# optional per-detector override
V02160A:
expression: SigmaFit
parameters:
a: 0.0002
b: 0
c: 1
default(optional) — A/E resolution model applied to all HPGe detectors not listed explicitly.<detector>(optional) — per-detector override.
Each entry must contain:
expression— name of the A/E resolution function (e.g.SigmaFit)parameters— mapping of parameter names to their values
See HPGe A/E resolution (extract_hpge_observables_models) for a description of how these files are used at runtime.
PSD cut defaults¶
An optional validity-based metadata directory providing HPGe-specific PSD cut values. Follows the same structure and four-case logic as Energy resolution model defaults.
default:
aoe:
low_side: -1.5
high_side: 3.0
# optional per-detector override
V02160A:
aoe:
low_side: -1.4
high_side: 2.9
default(optional) — PSD cut values applied to all HPGe detectors not listed explicitly.<detector>(optional) — per-detector override.
Each entry must contain:
aoe.low_side— lower A/E classifier cut (in units of A/E σ)aoe.high_side— upper A/E classifier cut (in units of A/E σ)
See HPGe PSD cuts (extract_hpge_observables_models) for a description of how these files are used at runtime.
Current pulse model defaults¶
An optional validity-based metadata directory providing HPGe-specific current
pulse model parameters. Follows the same structure and four-case logic as
Energy resolution model defaults, but applied per-detector rather than per-run (one
output file per (runid, hpge_detector) pair).
default:
current_pulse_pars:
amax: 1.0
mu: 0.0
sigma: 0.1
tail_fraction: 0.5
tau: 0.02
high_tail_fraction: 0.0
high_tau: 0.0
mean_aoe: 1.0
current_reso: 0.01
# optional per-detector override
V02160A:
current_pulse_pars:
amax: 1.0
mu: 0.0
sigma: 0.12
tail_fraction: 0.55
tau: 0.025
high_tail_fraction: 0.0
high_tau: 0.0
mean_aoe: 0.98
current_reso: 0.012
default(optional) — current pulse model applied to all HPGe detectors not listed explicitly.<detector>(optional) — per-detector override.
Each entry must contain:
current_pulse_pars— mapping of parameter names to their values for the current pulse model (amax,mu,sigma,tail_fraction,tau,high_tail_fraction,high_tau; the last two default to0if omitted)mean_aoe— mean A/E valuecurrent_reso— current resolution (σ) from the noise-fit
See HPGe current pulse model (extract_current_pulse_model) for a description of how these files are used at runtime.
Manual HPGe skip-list¶
An optional validity-based metadata directory listing HPGe detectors that should be excluded from drift-time map and current pulse model generation, regardless of their status in the channel map.
V02160A: "broken impurity profile in legend-metadata"
B00091B: "asymmetric geometry not yet supported"
Each entry is a mapping of detector name (as it appears in the channel map, e.g.
V02160A) to a free-form reason string describing why the detector is excluded.
The reason is written to the workflow log as a WARNING when the skip is applied.
Detectors listed here are removed from the “modelable” HPGe list for the
matching runs. They will not get a drift-time map nor a current pulse model
produced. The validity rules are the same as those of the other geds/
parameter directories (see Energy resolution model defaults).
A detector that is manually skipped is treated identically to one that lacks a
drift-time map or current pulse model for any other reason: PSD output columns
are filled with NaN and the fallback A/E resolution and PSD cuts
(aoeresmod_default / psdcuts_default) are used. No hard error is raised. See
HPGe observable validation (build_tier_hit) for the full fallback policy.