Installation and configuration¶
Clone legend-simflow and give it a custom name:
> git clone git@github.com:legend-exp/legend-simflow <path/to/prod/cycle>
We recommend tagging the production cycle with a version number to be used as
folder name (e.g. path/to/productions/v1.0.0).
Before a simulation production can be run, the user must configure the run with a dedicated file and install the required software dependencies.
The configuration file¶
The simflow-config.yaml file resides in the production directory (the root of
the GitHub repository) and allows to customize the workflow in great detail.
Here’s a basic description of its fields:
experiment: labels the experiment to be simulated. The same name is used in the Simflow metadata to label the corresponding configuration files. See legend-simflow-config for a list of currently supported experiment labels.simlist: list of simulation identifiers to be processed by Snakemake. Can be a list of strings or a path to a text file. If*orall, will process all simulations defined in the metadata.runlist: list of LEGEND data taking runs to build pdfs for, in the standard format<experiment>-<period>-<run>-<type>(e.g.l200-p03-r000-phy)make_steps: list the workflow steps to include in the DAG. Only the Snakemake rule files for the listed steps are loaded, so Snakemake will treat outputs of excluded steps as pre-existing source files rather than targets to (re)build. This is the primary mechanism for protecting existing production outputs from accidental re-execution and for speeding up DAG generation. Available steps (in canonical order):vtx,stp,par,opt,hit,evt,cvt. If omitted, all steps are included. The order in the list does not matter — steps are always sorted canonically. A few caveats apply:Steps
vtxandpardo not produce simid-scoped outputs and cannot appear as prefixes insimlistitems.The
parstep builds the parameters consumed by bothoptandhit(HPGe drift-time maps, current-pulse models, energy-resolution models, run-statistics partitioning files). Ifoptorhitis listed withoutpar, those parameter files must already exist on disk — otherwise Snakemake will fail at runtime trying to locate them.When a
simlistitem such ashit.myidis specified, the Simflow builds outputs cumulatively for all steps up tohitthat are present inmake_steps. Steps not listed are skipped even in the cumulative pass.
legend_metadata_version: optionally specify a revision (anything thatgit checkoutaccepts) for the legend-metadata instance used by the simflow. If you are developing metadata, comment this option.benchmark: section used to configure a benchmarking run:enabled: boolean flag to enable/disable the featuren_primaries: number of primary events to be simulated in the lower tiersverandraw.
paths: paths to Simflow input or output files (or folder hosting them).generated(output): root folder for all workflow outputsl200data(input): LEGEND-200 data production cycle (e.g.<...>/public/prodenv/prod-blind/ref-v1.0.0) used to extract production parameters (e.g. energy resolution)benchmarks(output): Snakemake rule benchmark fileslog(output): Snakemake rule log filesmetadata(input): Simflow input metadata. This is a clone of the legend-metadata repository. If not present at runtime, the Simflow will attempt a fresh clone.config(input): clone of legend-simflow-config. This is distributed as a submodule of legend-metadata.optical_maps(dict, input): photoelectron detection probability maps for various scintillators used in theopttier. These maps are currently not produced by the Simflow and therefore supplied as external input.lar: the liquid argon optical map file.
pars(output): root folder for all generated parameter files (e.g. YAML files storing parameters extracted from the LEGEND-200 data, geometry files, drift time maps).macros(output): generated remage macro filesgeom(optional output): generated simulation geometry files. Defaults to{paths.pars}/geomif not set.dtmaps(optional output): generated HPGe drift time maps. Defaults to{paths.pars}/hpge/dtmapsif not set.tier(dict, output): generated outputs for each tier, keyed by tier name (e.g.tier.stp,tier.hit, …). Validation plots for each tier are stored in aplots/subdirectory (e.g.tier.stp/plots/).
runcmd: command overridesremage: remage command. Useful when not working in a Pixi environment and theremagecommand is not available.
nersc: see I/O optimization.
Tip
All these configuration parameters can be overridden at runtime through
Snakemake’s --config option.
For a quick start, just copy over the default configuration file from the templates:
cp templates/default.yaml simflow-config.yaml
and customize it.
Software dependencies¶
The first step is obtaining the software, which is fully specified by the
pyproject.toml file. Since some dependencies (notably,
remage) are only available on
conda-forge, we recommend using
pixi, which can be easily installed following instructions on
the official documentation.
To load the software, just execute pixi shell. Alternatively, use the provided
pixi tasks to run the workflow without entering a shell:
pixi run prod [--profile <name>]— run a full production (recommended)pixi run dry— preview what Snakemake would do without executingpixi run touch— mark all outputs as up-to-date without re-running jobspixi run snakemake [ARGS]— pass arbitrary arguments directly to Snakemake
If you prefer, you can use a Python-only package manager of your choice, for example uv:
> uv venv
> source .venv/bin/activate
> uv pip install .
Tip
Do you want to use a remage container instead of the Conda package? Add to the configuration file, for example:
runcmd:
remage: apptainer run --cleanenv /.../remage_latest.sif