docs(benchmarks): rewrite report-v1.md as Draft skeleton (Phase 5 deliverable #5) #90

Merged
navigator merged 1 commit from auto/issue-86-20260526T040124Z_issue86 into main 2026-05-26 01:08:50 -03:00
Owner

Summary

Promote docs/benchmarks/report-v1.md from Stub to Draft skeleton, the Phase 5 deliverable #5 in PLAN.md Section 10. Adds the canonical landing spot for head-to-head benchmark results so Phase 2 (baseline upstream Gemmini) and Phase 3 (custom multi-Gemmini cluster) sim runs have a place to drop measured rows.

Also adds docs/benchmarks/README.md as the directory index — lists baseline-gemmini-rocket.md and report-v1.md with status taxonomy and the sim-only / honest-reporting conventions that apply to every row added later.

Sections in report-v1.md

  1. Overview — sim-only scope statement
  2. Methodology — Verilator harness, ~10 kHz sim clock budget, what counts as one inference
  3. Baselines — pins GemminiRocketConfig reference (PLAN.md Section 7)
  4. Single-chip FluidPopSoC results — placeholder table, acceptance band 2.5–3.5x (PLAN.md Section 8.3)
  5. 8-chip board results — placeholder table, acceptance band 4–6x over single-chip (PLAN.md Section 8.3)
  6. Workloads — ResNet-50 single-chip + Llama-7B 4-bit 8-chip partitioned
  7. Caveats — explicit sim-only / analytic-vs-measured / PDK-not-production reporting bounds (PLAN.md Section 9.3, 14.6)
  8. Open questions — secondary metrics (power, BW util, scratchpad hit rate) parked here

Acceptance criteria — checklist

  • docs/benchmarks/report-v1.md exists with the 8 required sections
  • Status: Draft skeleton + Owner: TBD header
  • Each section has TODO marker and 2-3 line intent description
  • Placeholder tables present with header rows; all data cells TBD
  • Explicitly states sim-only scope and references Section 9.3 honest-reporting language
  • References Sections 7, 8.3, 9.3, 14.6 in the header and inline
  • docs/benchmarks/README.md created listing this deliverable

No fabricated numbers — tables stay empty until Phase 2/3 produce measurements.

Closes #86

## Summary Promote `docs/benchmarks/report-v1.md` from `Stub` to `Draft skeleton`, the Phase 5 deliverable #5 in PLAN.md Section 10. Adds the canonical landing spot for head-to-head benchmark results so Phase 2 (baseline upstream Gemmini) and Phase 3 (custom multi-Gemmini cluster) sim runs have a place to drop measured rows. Also adds `docs/benchmarks/README.md` as the directory index — lists `baseline-gemmini-rocket.md` and `report-v1.md` with status taxonomy and the sim-only / honest-reporting conventions that apply to every row added later. ## Sections in report-v1.md 1. Overview — sim-only scope statement 2. Methodology — Verilator harness, ~10 kHz sim clock budget, what counts as one inference 3. Baselines — pins `GemminiRocketConfig` reference (PLAN.md Section 7) 4. Single-chip FluidPopSoC results — placeholder table, acceptance band 2.5–3.5x (PLAN.md Section 8.3) 5. 8-chip board results — placeholder table, acceptance band 4–6x over single-chip (PLAN.md Section 8.3) 6. Workloads — ResNet-50 single-chip + Llama-7B 4-bit 8-chip partitioned 7. Caveats — explicit sim-only / analytic-vs-measured / PDK-not-production reporting bounds (PLAN.md Section 9.3, 14.6) 8. Open questions — secondary metrics (power, BW util, scratchpad hit rate) parked here ## Acceptance criteria — checklist - [x] `docs/benchmarks/report-v1.md` exists with the 8 required sections - [x] `Status: Draft skeleton` + `Owner: TBD` header - [x] Each section has `TODO` marker and 2-3 line intent description - [x] Placeholder tables present with header rows; all data cells TBD - [x] Explicitly states sim-only scope and references Section 9.3 honest-reporting language - [x] References Sections 7, 8.3, 9.3, 14.6 in the header and inline - [x] `docs/benchmarks/README.md` created listing this deliverable No fabricated numbers — tables stay empty until Phase 2/3 produce measurements. Closes #86
docs(benchmarks): rewrite report-v1.md as Draft skeleton (Phase 5 deliverable #5)
All checks were successful
build / scalafmt-check (pull_request) Successful in 3s
build / sbt-compile (pull_request) Successful in 3s
build / shell-lint (pull_request) Successful in 15s
fb610b6a2c
Promote docs/benchmarks/report-v1.md from Stub to Draft skeleton so the
Phase 5 head-to-head benchmark deliverable (PLAN.md Section 10, #5) has
its canonical landing spot before Phase 2/3 simulation runs start
producing numbers. Skeleton enumerates: Overview, Methodology, Baselines,
single-chip FluidPopSoC, 8-chip board, Workloads, Caveats, Open
questions — each with TODO + intent and placeholder tables (header rows,
TBD data cells). No fabricated numbers per PLAN.md Section 9.3.

Add docs/benchmarks/README.md as the index for the directory, listing
report-v1 and the baseline doc with status taxonomy and the sim-only /
honest-reporting conventions that apply to every numbers row added
later.

References Sections 7 (baseline), 8.3 (testbench plan + 2.5-3.5x /
4-6x acceptance bands), 9.3 (honest reporting / PDK-scaling caveat),
14.6 (analytic performance model >=80% target).
Author
Owner

VERDICT: PASS

PR #90 adds the Phase 5 deliverable #5 skeleton (docs/benchmarks/report-v1.md) plus a companion docs/benchmarks/README.md directory index, both scoped cleanly to Issue #86. The mechanical guardrails all line up: SPDX CC-BY-SA-4.0 headers correct on both files (new README and preserved on the existing report), single directory touched, no off-limits paths, no AI/Anthropic attribution, no Chisel module so rules 4–6 don't apply. Rule 7 is the one that matters most here, and the PR handles it exceptionally carefully: every result-table cell is literally TBD, every section is a TODO, and the doc carries three separate "no fabricated numbers" callouts — the top-of-file sim-only block, the §7 Caveats list (sim-only Phase 1, analytic-model-is-not-measurement, PDK-is-not-production, baseline-freshness), and the closing paragraph that mandates citing the producing sim run for any row added later. The numeric claims that do appear are all attribution to PLAN.md sections or already-established project documents: the 2.5–3.5× single-chip acceptance band matches what MultiGemminiCluster.SPEC.md cites for §8.3 (consistent with PR #81); the 4–6× 8-chip band cites §8.3; the ~10 kHz Verilator ceiling cites §8.3; the ≥80% analytic-model regression target cites §14.6; "Sky130 or IHP130" is consistent with the project's two-tier process plan. The companion README sets up sensible conventions for docs/benchmarks/ — sim-only scope, reproducibility (Chipyard/Gemmini/Verilator revision capture), honest reporting, the same Status taxonomy used in docs/spec/ — and includes a forward link to baseline-gemmini-rocket.md as the speedup denominator without committing baseline numbers. Tables present have header rows but data cells are uniformly TBD, exactly as the acceptance criteria require.

Findings

None.

<!-- agent:review verdict=PASS ts=1779768419 --> **VERDICT:** PASS PR #90 adds the Phase 5 deliverable #5 skeleton (`docs/benchmarks/report-v1.md`) plus a companion `docs/benchmarks/README.md` directory index, both scoped cleanly to Issue #86. The mechanical guardrails all line up: SPDX `CC-BY-SA-4.0` headers correct on both files (new README and preserved on the existing report), single directory touched, no off-limits paths, no AI/Anthropic attribution, no Chisel module so rules 4–6 don't apply. Rule 7 is the one that matters most here, and the PR handles it exceptionally carefully: every result-table cell is literally `TBD`, every section is a `TODO`, and the doc carries three separate "no fabricated numbers" callouts — the top-of-file sim-only block, the §7 Caveats list (sim-only Phase 1, analytic-model-is-not-measurement, PDK-is-not-production, baseline-freshness), and the closing paragraph that mandates citing the producing sim run for any row added later. The numeric claims that *do* appear are all attribution to PLAN.md sections or already-established project documents: the 2.5–3.5× single-chip acceptance band matches what `MultiGemminiCluster.SPEC.md` cites for §8.3 (consistent with PR #81); the 4–6× 8-chip band cites §8.3; the ~10 kHz Verilator ceiling cites §8.3; the ≥80% analytic-model regression target cites §14.6; "Sky130 or IHP130" is consistent with the project's two-tier process plan. The companion README sets up sensible conventions for `docs/benchmarks/` — sim-only scope, reproducibility (Chipyard/Gemmini/Verilator revision capture), honest reporting, the same Status taxonomy used in `docs/spec/` — and includes a forward link to `baseline-gemmini-rocket.md` as the speedup denominator without committing baseline numbers. Tables present have header rows but data cells are uniformly `TBD`, exactly as the acceptance criteria require. ## Findings _None._
fluidpop-bot approved these changes 2026-05-26 01:07:22 -03:00
Dismissed
fluidpop-bot left a comment
Collaborator

CI green (head fb610b6a2c), auto-approving

CI green (head fb610b6a2c311edf19878fb2f25468a36bfad512), auto-approving
navigator force-pushed auto/issue-86-20260526T040124Z_issue86 from fb610b6a2c
All checks were successful
build / scalafmt-check (pull_request) Successful in 3s
build / sbt-compile (pull_request) Successful in 3s
build / shell-lint (pull_request) Successful in 15s
to 80ce6e8ef6
All checks were successful
build / scalafmt-check (pull_request) Successful in 3s
build / sbt-compile (pull_request) Successful in 3s
build / shell-lint (pull_request) Successful in 1m7s
2026-05-26 01:07:26 -03:00
Compare
fluidpop-bot left a comment
Collaborator

CI green (head 80ce6e8ef6), auto-approving

CI green (head 80ce6e8ef6f7d428eb16d547596b5b586567059d), auto-approving
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Fluid/fluidpop-v1!90
No description provided.