Accuracy
Buildable designs. Full coverage.
Footage you can defend.
Every MACRILON release is benchmarked against 174 real, human-engineered as-built FTTH
designs across 3 US markets — the networks an experienced OSP engineering team actually built. We lead
on the two things you can measure directly and absolutely — buildability and serviceable-home
coverage — and we publish footage as an honest engineering estimate band. Here is the current result,
including the misses, and exactly how it is measured.
70%of corpus designs score CONSTRUCTABLE on the buildability audit (C1–C8: in-ROW, offset-from-road, perpendicular crossings, no through-buildings)
97.1%of designs land within ±5% on serviceable homes passed
vs the design of record
±15–20%footage delivered as an AACE estimate band benchmarked to the 174-design human-variance envelope — not to one engineer
Footage: an honest estimate band, not a single-engineer match
Total route footage is the dominant cost driver, so we keep it accurate for costing, BOM, and BEAD
cost-per-location — and we report it the way the construction industry reports estimates: as a tolerance
band, benchmarked to the spread that real human designs show against each other, not to one
engineer's line. Two professionals designing the same serving area routinely differ 15–20% in footage; we
grade against that envelope. Share of benchmark designs whose total footage falls within each AACE band of
the human as-built (160 scoreable designs of 174; the remainder lack a comparable footage baseline).
Deterministic run — re-running the benchmark reproduces these numbers exactly.
| Tolerance band | Designs passing | Rate | Framing |
| ±10% | 74 / 160 | 46.2% | Tighter than the self-consistency of the human corpus itself |
| ±15% | 105 / 160 | 65.6% | Estimate-grade — AACE 18R-97 Class 2 lower tolerance; near the floor of human-vs-human variance |
| ±20% | 115 / 160 | 71.9% | Estimate-grade — AACE 18R-97 Class 2 tolerance (−15/+20%) |
| ±25% | 125 / 160 | 78.1% | |
| ±30% | 132 / 160 | 82.5% | Budget-grade — AACE 18R-97 Class 3 tolerance (−20/+30%) |
±10%46.2%
±15%65.6%
±20%71.9%
±25%78.1%
±30%82.5%
These bands count boundary cases as failures (strict inequality). In the current
release no design sits exactly on any band boundary, so strict and inclusive counting agree at every row.
When a boundary case exists we publish the strict figure and footnote the inclusive one here rather than
rounding the difference away. The ±15% column above reflects our imitation-learned research routing; the
footage configuration shipped in delivered packages scores a touch more conservatively (~62% at ±15%) and
is tuned to over-build rather than under-build — see the near-miss split below.
The near-miss band, published with signs
10 designs sit in the 15–20% band: 4 over-builds (+15.7% to +18.3%) and 6 under-builds
(−16.2% to −19.8%). Median absolute footage error across all 160 scoreable designs is 11.2%.
The current release's routing is imitation-learned from the human as-builts themselves; it pulled most of
the previous release's over-build near-misses inside the ±15% gate, leaving a smaller, two-sided band.
We publish the signed split rather than keep a directional claim that no longer holds.
Why footage is a band, not a single-engineer match
Context for reading the table honestly — grading a design to one human's exact footage is, on its own, the wrong test.
- Human designs disagree with each other by 15–20%. Our own scoring code measures that two
legitimate, correct designs of the same serving area differ by ~11% in footage for identical geometry,
and more once routing choices diverge. A reference line that disagrees with itself 15–20% cannot be a
pass/fail truth — so we benchmark to that variance envelope, not to one engineer.
- AACE International 18R-97 — the construction-industry standard for estimate accuracy — expects
−20%/+30% from a Class 3 (budget-authorization) estimate and −15%/+20% from a Class 2 (bid/tender)
estimate, at 80% confidence. A pre-construction automated design is a Class 3/Class 2-maturity artifact;
our ±15–20% band sits inside what the industry accepts from human estimators — and our misses skew to
conservative over-builds, the safe direction for a budget.
- The leading incumbent (incumbent tools) publishes its own variance — comparing its rapid designs to its own
downstream detailed design, not to as-builts — at 18% on new conduit and 10% on access cable.
Quantity deltas of 10–18% are the best public number the market leader claims against itself, which is
exactly the envelope our band lives in.
- Realized construction costs vary more than any design: the 2025 industry cost survey reports an
interquartile range of roughly ±40% on per-foot underground deployment cost across real projects, and
standard practice adds ≥10% contingency even to final engineered estimates.
- To our knowledge, no other vendor publishes design-vs-as-built accuracy at all. This page exists
because we own the only paired corpus that can measure it — and because you should not have to take a
design tool's accuracy on faith.
Methodology
- Benchmark corpus: 174 real subdivision-scale FTTH designs across 3 US markets — Holly Springs
NC (80 designs), Chesterfield VA (57), and Hampton VA (37) — engineered and built by professional OSP
teams. We disclose the concentration plainly: the corpus comes from a single engineering lineage (one
firm's design standards), so published accuracy is measured on that distribution; a fourth market
(Greensboro NC) is being onboarded and is not yet in the benchmark. For each design we hold the human
design of record (routes, splitters, workbook) as ground truth. 160 designs have a comparable
total-footage baseline and are scored on footage; all 174 are scored on homes passed and design rules.
- Procedure: the engine receives only the boundary and the design standards — never the human
answer. It produces its full package; we then compare total constructed footage, homes passed, and
rule compliance (split architecture, port budgets, spare capacity) against the as-built.
- Footage score: signed percent difference of total design footage vs the human design.
A band "passes" when |difference| is within tolerance.
- Homes passed: 97.1% of designs match the engineer-of-record serviceable-home count within
±5% tolerance, using only open authoritative data — no licensed location datasets. This is a direct,
absolute measurement against ground truth — not a comparison to a single subjective line.
- Buildability: every design is scored C1–C8 by our constructability audit — offset-from-road,
in-ROW, no through-buildings, correct side, perpendicular crossings, coverage, access, documentation —
producing a 0–100 score and a band (CONSTRUCTABLE / NEEDS-REVIEW / NOT-BUILDABLE). About 70% of the
corpus scores CONSTRUCTABLE. This is an advisory field check, not yet a hard release gate, and
the auditor itself is still maturing (a 2026-06-18 fix corrected an over-penalty that had wrongly
flagged buildable jobs). We report it because it measures the thing that actually matters in the field —
can a crew build this as drawn — without needing the human's answer key.
- Cost: the costed workbook is built line-item from the customer's own disclosed unit rates
(template rates are the fallback only when no rate card is present). On our calibrated benchmark build
(Holly Springs) it landed within ~2% of the engineer-of-record estimate for the same scope. On our
first never-seen-market verification run (Hampton VA), the customer-rates estimate priced
+10.9% above the engineer of record ($334,013 vs $301,106, like-for-like scope including
permits and provisioning), with the residual itemized — about half route-geometry difference, half
BOM semantics on splitter hardware. We measure and publish that gap per market rather than quote the
calibrated figure everywhere — rate-card calibration is part of onboarding.
- Reproducibility: the benchmark pipeline is deterministic — the same release re-scored against
the corpus produces byte-identical results. Numbers on this page update only when a release moves them,
in either direction.
Page last updated June 18, 2026, from the benchmark run of record for the current release.
Audit us against your own network
The benchmark proves we build to one team's standard. Send a boundary your team has built, and check the buildability, coverage, and footage band against yours.