Benchmarks¶

This page summarizes the benchmark scripts and recent local results. Numbers will vary by machine and solver versions.

Benchmark Environment¶

Use a dedicated virtual environment under benchmark/ to avoid polluting your global Python:

python3 -m venv benchmark/.venv
source benchmark/.venv/bin/activate
python -m pip install --upgrade pip

# ILPQEC + optional solver backends
python -m pip install -e ".[pyomo,gurobi]"

# Benchmark dependencies
python -m pip install stim pymatching ldpc

Notes: - Gurobi requires a valid license and Python < 3.13. - If you do not need Gurobi, drop gurobi from the extras.

Requirements¶

pip install stim pymatching ldpc tesseract-decoder

Non-HiGHS solvers require the Pyomo extra (pip install ilpqec[pyomo]), except for the direct Gurobi backend (pip install ilpqec[gurobi]).

Notes:

Direct backends: HiGHS, Gurobi.
Pyomo backends: HiGHS, SCIP, CBC, GLPK, Gurobi, CPLEX.
BPOSD runs with max_iter=50, osd_order=0, and bp_method=minimum_sum.
Tesseract runs with det_beam=50 by default (adjustable via --tesseract-beam).

Circuit-level rotated surface code memory¶

benchmark/.venv/bin/python benchmark/benchmark_decoders.py \
  --compare-ilp-solvers --ilp-solvers highs,scip,gurobi,cbc,glpk \
  --shots 10000 --distance 3 --rounds 3 --noise 0.01

Results from a local macOS arm64 run (shots=10000):

Decoder	Time (ms/shot)	Logical Error Rate
ILP[highs] (direct)	2.7514	1.640%
ILP[gurobi] (direct)	0.6403	1.650%
ILP[scip] (Pyomo)	28.2160	1.670%
ILP[cbc] (Pyomo)	14.9315	1.670%
ILP[glpk] (Pyomo)	8.6292	1.670%
MWPM (pymatching)	0.0035	2.150%
BPOSD (ldpc)	0.0308	7.680%
Tesseract	0.1602	1.640%

Code-capacity surface code (data errors only, perfect syndrome)¶

benchmark/.venv/bin/python benchmark/benchmark_decoders.py \
  --noise-model code_capacity --compare-ilp-solvers \
  --ilp-solvers highs,scip,gurobi,cbc,glpk \
  --shots 10000 --distance 3 --rounds 1 --noise 0.01

Results from a local macOS arm64 run (shots=10000):

Decoder	Time (ms/shot)	Logical Error Rate
ILP[highs] (direct)	3.2321	0.070%
ILP[gurobi] (direct)	0.0838	0.070%
ILP[scip] (Pyomo)	23.4834	0.070%
ILP[cbc] (Pyomo)	10.4697	0.070%
ILP[glpk] (Pyomo)	5.0085	0.070%
MWPM (pymatching)	0.0036	0.070%
BPOSD (ldpc)	0.0028	0.070%
Tesseract	0.0093	0.070%

Color code (`color_code:memory_xyz`)¶

benchmark/.venv/bin/python benchmark/benchmark_decoders.py \
  --code-task color_code:memory_xyz --compare-ilp-solvers \
  --ilp-solvers highs,scip,gurobi,cbc,glpk \
  --shots 10000 --distance 3 --rounds 3 --noise 0.01

Results from a local macOS arm64 run (shots=10000):

Decoder	Time (ms/shot)	Logical Error Rate
ILP[highs] (direct)	2.0226	4.450%
ILP[gurobi] (direct)	0.3184	4.420%
ILP[scip] (Pyomo)	24.9402	4.420%
ILP[cbc] (Pyomo)	11.6961	4.450%
ILP[glpk] (Pyomo)	6.0799	4.420%
MWPM (pymatching)	0.0034	13.420%
BPOSD (ldpc)	0.0114	9.830%
Tesseract	0.0600	4.450%

DEM equivalent distance on rotated surface-code memory¶

Use the dedicated exact-analysis benchmark when you want to measure how large a Stim DEM instance can be solved to proven optimality:

benchmark/.venv/bin/python benchmark/benchmark_dem_distance.py \
  --distances 3,5,7,9 --solver highs --time-limit 300

This benchmark currently targets only the exact direct-HiGHS path. Keep --solver highs; other solver names are rejected at the CLI layer.

The script generates one circuit per requested distance, converts it to a decomposed DEM, parses the DEM with Decoder()._parse_dem(...), and then runs dem_distance(...) on that exact model using the same merge_parallel_edges=True and flatten_dem=True settings as the mechanism count path.

Each output row reports:

distance: the generated circuit distance
rounds: the number of syndrome-extraction rounds used for that instance
detectors: the DEM detector count
mechanisms: the number of parsed DEM error mechanisms after flattening and parallel-edge merging
equivalent_distance: the solved exact minimum logical-fault weight when the solve succeeds
time: wall-clock runtime for the dem_distance(...) call
status: optimal when HiGHS proves optimality, otherwise the solver error text for failures or time-limited runs

This is a local exact benchmark, so the largest solvable distance depends on your machine, solver version, noise model, and time limit.

Benchmarks¶

Benchmark Environment¶

Requirements¶

Circuit-level rotated surface code memory¶

Code-capacity surface code (data errors only, perfect syndrome)¶

Color code (color_code:memory_xyz)¶

DEM equivalent distance on rotated surface-code memory¶

Color code (`color_code:memory_xyz`)¶