Data Design

Data Concierge. Calm Computation. Clean Data.

Data Extraction for CMS Open Data

(NanoAOD · Run2016H · SinglePhoton)

NanoAOD / Run2016H SinglePhoton

We help researchers turn raw NanoAOD ROOT files into clean, analysis-ready tables (CSV / slim ROOT) with reproducible logs and optional Certified JSON labeling (Certified / Alternative by run–lumi).

 

You keep your brain for physics — we handle the plumbing.

 

If your dataset is close (other NanoAOD eras / other CMS channels), ask — we may already support it.

What we do (starting narrow)

Scope (v1): /SinglePhoton/Run2016H-UL2016 (CMS Open Data, NanoAOD).

 

The scope can be expanded later (all NanoAOD → CMSSW pipelines → other CERN domains) once the first workflow is proven.

🌀 How we work

1) Trial / Diagnostic (fast start)

If you’re unsure what’s inside your files or how hard the extraction will be, start here.

€49–€79 (up to ~2 hours)

Deliverables:

  • Branch-map (what’s where, types, vector vs scalar)
  • One demo output (slim ROOT or small CSV)
  • Short QC note (gotchas, missing branches, compression issues, pitfalls)

If you like the result → we move to the full task.

2) Full task in stages (no big-bang risk)

You provide a short technical brief (what objects, what columns, what selection).

We reply with scope + deliverables + timeline.

For complex tasks, we split into 2–4 stages.

Each stage is paid after you receive the result for that stage.

Typical stages:

  1. Discovery: confirm branches, types, event keys, small sample output
  2. Extraction: produce final slim ROOT / flat CSV schema
  3. Labeling (optional): Certified JSON → Orthodox/Heretic labels
  4. QC + plots (optional): sanity checks, distributions, quick validation

 

🌱 Packages

Trial / Diagnostic (up to ~2 hours)

€49–€79

Best for sanity checks and pilot runs. Validate your branch list before committing to full processing.

Output: branch-map + 1 demo file (slim ROOT/CSV) + QC note

Single-dataset Extract

€150–€300

Includes: 1 TTree, agreed selection, one fixed column schema, 1 revision round

Output: flat CSV (or slim ROOT) + minimal QC note + reproducible run command(s)

Pipeline + QC

€400–€900

Includes: multiple input files, consistent schema, Certified JSON labeling (Orthodox/Heretic), reproducible scripts/macros, QC + basic plots

Output: dataset bundle + logs + quick-read documentation

🪄 Payment & invoicing

We work as a registered business and issue an invoice.

 

Payment in EUR or USD to an IBAN account (for international clients: SEPA/SWIFT).

 

For EU, US, and UK clients, we provide an invoice suitable for accounting purposes. Payment details are specified on the invoice.

 

Payment confirms acceptance of Terms of Service.

👁️ About Us

We’re a team of engineering-driven enthusiasts with roots in web development since 2006. Over time, we shifted toward harder, high-friction problems where discipline, reproducibility, and systematic thinking matter more than “quick hacks”.

 

Along the way, we started exploring fundamental physics and quickly learned how painful it is to work with raw datasets. We built a strict extraction workflow (see the Demo) and now offer it to anyone who needs clean, reliable data outputs.

🧪 Demo / Sample (downloadable)

CMS_2016H_photons_flat_v1 — “Minimal Photon Flat Table”

 

A ready-to-use event-level CSV extracted from CMS Open Data (Run2016H SinglePhoton), flattened to a compact schema:

  • Keys: run, luminosityBlock, event
  • Event context: nPhoton, PV_npvs, rho
  • Leading photon features: pho0_pt, pho0_eta, pho0_phi, pho0_sieie, pho0_r9, pho0_hoe, iso_all, iso_chg, energyErr
  • Includes: photons_flat.csv + photons_flat.meta.json + README.txt (repro notes & validation pointers)

 

The dataset contains over 1.2 million events, each represented as a single row with event-level identifiers and leading-photon observables extracted from NanoAOD.extraction.

The compressed archive size is 50.3 MB and expands to 132.4 MB after

c57e8e374f9739451ac11d327158166690a1e9500a0c1880196b7016548ff83c demo.tar.gz

🖥️ Contact

We’re a team of engineering-minded enthusiasts with roots in web development since 2006. Over time, we shifted toward harder, high-friction problems where discipline, reproducibility, and systematic thinking matter more than “quick hacks”.

 

In our free time we started exploring fundamental physics and quickly learned how painful it is to work with raw datasets. We built a strict extraction workflow (see the Demo) and now offer it to anyone who needs clean, reliable data outputs.

By submitting you agree to Terms of Service.

© Yurii Karapetian, 2025

Independent Practice

Terms of Service

Not affiliated with CERN.

Calm Design

Website Development