spec(MultiGemminiCluster): promote MultiGemminiCluster.SPEC.md to Draft #93

Merged
navigator merged 1 commit from spec/multigemminicluster-draft into main 2026-05-26 01:59:37 -03:00
Owner

Refs #52 (companion Chisel skeleton, currently agent:pr-open).

This PR is opened by the autonomous spec-designer role; per
infra/ops/agents/roles/spec-designer.sh the role is designer-driven
and runs without a dedicated source issue, so no Closes # is attached.
If the swarm policy requires a tracking issue per promotion, the
follow-up tweak belongs in the role script, not in this SPEC change.

Summary

Promotes rtl/src/pop/specs/MultiGemminiCluster.SPEC.md from
Status: Stub to Status: Draft. No behaviour, no Chisel, no
edits outside the single SPEC file. The IO contract is pinned to ADRs
and to sibling SPECs already in Draft (FluidPopSoC,
InterChipFabric); widths and signal sets that no source pins are
recorded as _Open question:_ rather than invented.

Resolved TBDs

  • §Interface — RoCC attach (one per local Rocket core): core
    count tracks ADR-006 Decision (Edu: 2 Rocket cores; Pro: 4 cores).
    Cluster receives already-decoded dispatches from PopRoCCRouter,
    whose 2-bit gemmini-id encoding in funct7 is pinned by ADR-006
    Consequences and selects one of the four ADR-001 tiles.
  • §Interface — Memory attach toward DDR5 controller: DDR5
    channel count tracks ADR-005 Decision (Edu: 1 channel; Pro: 2
    channels). Intra-chip L2 coherence boundary pinned to ADR-011
    Decision: "Within a chip, 4 Gemminis share L2 coherently."
  • §Interface — Cluster-side attach toward InterChipFabric:
    cross-chip transaction set pinned to
    rtl/src/pop/specs/InterChipFabric.SPEC.md §Interface (sibling
    Draft), which in turn cites PLAN.md §3.6. Cross-chip memory model
    pinned to ADR-011 Decision.
  • §Behavior — Per-tile dispatch driven by PopRoCCRouter: fully
    pinned to ADR-006 Consequences (2-bit gemmini-id in funct7) and
    ADR-001 Decision (four Edu Gemmini tiles).
  • §Behavior — Per-Gemmini scratchpad, peer-to-peer via
    InterGemminiXbar
    : per-tile sizing pinned to ADR-004 Decision
    (Edu: 256 KiB SP + 64 KiB acc; Pro: 1 MiB SP + 256 KiB acc).
    Peer-to-peer routing pinned to ADR-003 Decision ("Implement a 4x4
    crossbar (InterGemminiXbar.scala) connecting scratchpad ports
    directly").
  • §Behavior — DDR5 access via local L2: intra-chip path pinned to
    ADR-011 Decision; cross-chip exit pinned to ADR-011 Decision
    ("Cross-chip data movement is explicit via PopLink transactions").
  • §Behavior — Quiescence and drain on cluster-stop CSR: scoping
    pinned to ADR-011 (local-chip only). Cluster-stop CSR is the
    runtime entry point for the chip-level quiescence sequence already
    recorded in rtl/src/pop/specs/FluidPopSoC.SPEC.md §Behavior.
  • §Invariants — Single-chip 2.5×–3.5× speedup: verbatim from
    PLAN.md §8.3 Acceptance.
  • §Invariants — Four Gemmini tiles per cluster (Edu): verbatim
    from ADR-001 Decision; drives ADR-004 per-Gemmini scratchpad
    partition and the ADR-003 4×4 crossbar width.
  • §Invariants — Intra-chip coherence at L2: verbatim ADR-011
    Decision; cross-chip non-coherence also verbatim ADR-011.
  • §Invariants — No inter-tile data corruption under concurrent
    peer traffic
    : derived from ADR-003 Decision plus the PLAN.md
    §8.3 single-chip system-test acceptance bar.

Open questions (recorded as _Open question:_ in the SPEC)

  • Per-tile bundle shape between PopRoCCRouter and the cluster —
    tracks rtl/src/pop/specs/PopRoCCRouter.SPEC.md §Interface
    (currently Status: Stub).
  • Memory-protocol choice (AXI vs TileLink), per-master ID width,
    in-flight outstanding limit, and L2 configuration (size,
    associativity, banking) — pending Chipyard pin (PLAN.md §6) and
    DDR5 PHY vendor-selection ADR.
  • Cluster-side bundle shape toward InterChipFabric (port count,
    request-ID width, in-flight tracking depth) — mirrors the same
    Open question on the sibling Draft's side; resolves jointly.
  • Performance-counter set, counter widths, CSR address-map, and
    access discipline (MMIO vs RoCC-config) — pending CSR-policy
    ADR.
  • Pro 16-Gemmini reconciliation with the ADR-006 2-bit funct7
    field — pending follow-up ADR amending the gemmini-id encoding
    for the Pro variant.
  • Scratchpad-port bundle shape (read / write port count per tile,
    eviction policy, bank-conflict resolution) — tracks
    rtl/src/pop/specs/InterGemminiXbar.SPEC.md §Interface
    (currently Status: Stub).
  • L2-side arbitration policy and DDR5-side request scheduling
    discipline (open-page / closed-page, per-bank queue depth) —
    pending DDR5 controller selection ADR.
  • Cluster-stop CSR address, register width, completion-signalling
    discipline (interrupt vs polled), and per-block drain order —
    pending the same clocking / reset ADR called out in
    rtl/src/pop/specs/FluidPopSoC.SPEC.md §Behavior.
  • Worst-case scratchpad-eviction stall bound — resolves once the
    PLAN.md §8.3 step-3 ResNet-50 single-chip bench surfaces an
    empirical worst case.

Constraint check

  • **Status:** line flipped Stub → Draft.
  • Every TBD resolved (pin to ADR / PLAN / sibling Draft, or
    recorded as _Open question:_). No fabricated widths or
    signal sets.
  • Only rtl/src/pop/specs/MultiGemminiCluster.SPEC.md changes
    (ADR-017 off-limits path policy honoured).
  • All ADR / PLAN refs from the Stub preserved (PLAN.md §8.2,
    §12.3, ADR-001, ADR-002, ADR-004). Draft body cites
    additional ADRs the body relies on for pins (PLAN.md §8.3,
    ADR-003, ADR-005, ADR-006, ADR-011) — added refs only,
    none removed.
  • Sibling Drafts cited as pins only when actually in Draft
    (FluidPopSoC, InterChipFabric). Stub siblings
    (PopRoCCRouter, InterGemminiXbar) appear only inside
    Open questions, never as a Draft-pin source.
  • ## Promotion history appended with the 2026-05-26 entry.
  • No fabricated performance / latency / capacity / vendor
    commitments — every number quoted has an ADR title or
    Decision as source.
Refs #52 (companion Chisel skeleton, currently `agent:pr-open`). This PR is opened by the autonomous `spec-designer` role; per `infra/ops/agents/roles/spec-designer.sh` the role is designer-driven and runs without a dedicated source issue, so no `Closes #` is attached. If the swarm policy requires a tracking issue per promotion, the follow-up tweak belongs in the role script, not in this SPEC change. ## Summary Promotes `rtl/src/pop/specs/MultiGemminiCluster.SPEC.md` from **Status: Stub** to **Status: Draft**. No behaviour, no Chisel, no edits outside the single SPEC file. The IO contract is pinned to ADRs and to sibling SPECs already in Draft (`FluidPopSoC`, `InterChipFabric`); widths and signal sets that no source pins are recorded as `_Open question:_` rather than invented. ## Resolved TBDs - **§Interface — RoCC attach (one per local Rocket core)**: core count tracks ADR-006 Decision (Edu: 2 Rocket cores; Pro: 4 cores). Cluster receives already-decoded dispatches from `PopRoCCRouter`, whose 2-bit gemmini-id encoding in funct7 is pinned by ADR-006 Consequences and selects one of the four ADR-001 tiles. - **§Interface — Memory attach toward DDR5 controller**: DDR5 channel count tracks ADR-005 Decision (Edu: 1 channel; Pro: 2 channels). Intra-chip L2 coherence boundary pinned to ADR-011 Decision: "Within a chip, 4 Gemminis share L2 coherently." - **§Interface — Cluster-side attach toward `InterChipFabric`**: cross-chip transaction set pinned to `rtl/src/pop/specs/InterChipFabric.SPEC.md §Interface` (sibling Draft), which in turn cites PLAN.md §3.6. Cross-chip memory model pinned to ADR-011 Decision. - **§Behavior — Per-tile dispatch driven by `PopRoCCRouter`**: fully pinned to ADR-006 Consequences (2-bit gemmini-id in funct7) and ADR-001 Decision (four Edu Gemmini tiles). - **§Behavior — Per-Gemmini scratchpad, peer-to-peer via `InterGemminiXbar`**: per-tile sizing pinned to ADR-004 Decision (Edu: 256 KiB SP + 64 KiB acc; Pro: 1 MiB SP + 256 KiB acc). Peer-to-peer routing pinned to ADR-003 Decision ("Implement a 4x4 crossbar (`InterGemminiXbar.scala`) connecting scratchpad ports directly"). - **§Behavior — DDR5 access via local L2**: intra-chip path pinned to ADR-011 Decision; cross-chip exit pinned to ADR-011 Decision ("Cross-chip data movement is explicit via PopLink transactions"). - **§Behavior — Quiescence and drain on cluster-stop CSR**: scoping pinned to ADR-011 (local-chip only). Cluster-stop CSR is the runtime entry point for the chip-level quiescence sequence already recorded in `rtl/src/pop/specs/FluidPopSoC.SPEC.md §Behavior`. - **§Invariants — Single-chip 2.5×–3.5× speedup**: verbatim from PLAN.md §8.3 Acceptance. - **§Invariants — Four Gemmini tiles per cluster (Edu)**: verbatim from ADR-001 Decision; drives ADR-004 per-Gemmini scratchpad partition and the ADR-003 4×4 crossbar width. - **§Invariants — Intra-chip coherence at L2**: verbatim ADR-011 Decision; cross-chip non-coherence also verbatim ADR-011. - **§Invariants — No inter-tile data corruption under concurrent peer traffic**: derived from ADR-003 Decision plus the PLAN.md §8.3 single-chip system-test acceptance bar. ## Open questions (recorded as `_Open question:_` in the SPEC) - Per-tile bundle shape between `PopRoCCRouter` and the cluster — tracks `rtl/src/pop/specs/PopRoCCRouter.SPEC.md §Interface` (currently Status: Stub). - Memory-protocol choice (AXI vs TileLink), per-master ID width, in-flight outstanding limit, and L2 configuration (size, associativity, banking) — pending Chipyard pin (PLAN.md §6) and DDR5 PHY vendor-selection ADR. - Cluster-side bundle shape toward `InterChipFabric` (port count, request-ID width, in-flight tracking depth) — mirrors the same Open question on the sibling Draft's side; resolves jointly. - Performance-counter set, counter widths, CSR address-map, and access discipline (MMIO vs RoCC-config) — pending CSR-policy ADR. - Pro 16-Gemmini reconciliation with the ADR-006 2-bit funct7 field — pending follow-up ADR amending the gemmini-id encoding for the Pro variant. - Scratchpad-port bundle shape (read / write port count per tile, eviction policy, bank-conflict resolution) — tracks `rtl/src/pop/specs/InterGemminiXbar.SPEC.md §Interface` (currently Status: Stub). - L2-side arbitration policy and DDR5-side request scheduling discipline (open-page / closed-page, per-bank queue depth) — pending DDR5 controller selection ADR. - Cluster-stop CSR address, register width, completion-signalling discipline (interrupt vs polled), and per-block drain order — pending the same clocking / reset ADR called out in `rtl/src/pop/specs/FluidPopSoC.SPEC.md §Behavior`. - Worst-case scratchpad-eviction stall bound — resolves once the PLAN.md §8.3 step-3 ResNet-50 single-chip bench surfaces an empirical worst case. ## Constraint check - [x] `**Status:**` line flipped Stub → Draft. - [x] Every TBD resolved (pin to ADR / PLAN / sibling Draft, or recorded as `_Open question:_`). No fabricated widths or signal sets. - [x] Only `rtl/src/pop/specs/MultiGemminiCluster.SPEC.md` changes (ADR-017 off-limits path policy honoured). - [x] All ADR / PLAN refs from the Stub preserved (PLAN.md §8.2, §12.3, ADR-001, ADR-002, ADR-004). Draft body cites additional ADRs the body relies on for pins (PLAN.md §8.3, ADR-003, ADR-005, ADR-006, ADR-011) — added refs only, none removed. - [x] Sibling Drafts cited as pins only when actually in Draft (`FluidPopSoC`, `InterChipFabric`). Stub siblings (`PopRoCCRouter`, `InterGemminiXbar`) appear only inside Open questions, never as a Draft-pin source. - [x] `## Promotion history` appended with the 2026-05-26 entry. - [x] No fabricated performance / latency / capacity / vendor commitments — every number quoted has an ADR title or Decision as source.
spec(MultiGemminiCluster): promote MultiGemminiCluster.SPEC.md to Draft
All checks were successful
build / scalafmt-check (pull_request) Successful in 4s
build / sbt-compile (pull_request) Successful in 3s
build / shell-lint (pull_request) Successful in 20s
238b12534c
Lifts MultiGemminiCluster.SPEC.md from Status: Stub to Status: Draft.
No behaviour or Chisel is added; the IO contract is pinned to ADRs
and to sibling SPECs already in Draft, and signal-level details no
source pins are recorded as Open questions rather than fabricated
widths.

Resolved TBDs (pinned to a source):

- §Interface RoCC attach — one request/response per local Rocket
  core, core count tracking ADR-006 Decision (Edu: 2 cores; Pro: 4
  cores). Cluster sees only PopRoCCRouter-decoded dispatches; ADR-006
  Consequences pin the 2-bit gemmini-id encoding in funct7 that
  selects one of the four ADR-001 tiles.
- §Interface memory attach — DDR5 channel count tracking ADR-005
  Decision (Edu: 1 channel; Pro: 2 channels). Intra-chip L2
  coherence boundary pinned to ADR-011 Decision ("Within a chip,
  4 Gemminis share L2 coherently").
- §Interface cluster-side fabric attach — pinned to
  InterChipFabric.SPEC.md §Interface "Chip-side transaction-layer
  attach" (sibling now in Draft) which carries the PLAN.md §3.6
  cross-chip transaction set. Cross-chip memory model pinned to
  ADR-011 Decision.
- §Behavior per-tile dispatch — fully pinned to ADR-006 Consequences
  (PopRoCCRouter decodes 2-bit gemmini-id in funct7) and ADR-001
  Decision (four Edu tiles).
- §Behavior scratchpad sharing — per-Gemmini scratchpad sizing
  pinned to ADR-004 Decision (Edu: 256 KiB SP + 64 KiB acc; Pro:
  1 MiB SP + 256 KiB acc). Peer-to-peer routing pinned to ADR-003
  Decision (4×4 crossbar connecting scratchpad ports directly).
- §Behavior DDR5 access — intra-chip L2 path pinned to ADR-011
  Decision; cross-chip exit pinned to ADR-011 Decision
  ("Cross-chip data movement is explicit via PopLink transactions").
- §Behavior quiescence — scoping pinned to ADR-011 (local-chip
  only, no board-wide directory). Cluster-stop CSR is the runtime
  entry point for the chip-level quiescence sequence already
  recorded in FluidPopSoC.SPEC.md §Behavior.
- §Invariants single-chip speedup — verbatim PLAN.md §8.3
  Acceptance ("single-chip 2.5-3.5x speedup vs 1-Gemmini baseline").
- §Invariants tile count — verbatim ADR-001 Decision (Edu: 4
  Gemminis; Pro: 16). Drives ADR-004 per-Gemmini scratchpad
  partition and ADR-003 crossbar width.
- §Invariants intra-chip coherence — verbatim ADR-011 Decision
  ("Within a chip, 4 Gemminis share L2 coherently"; "Non-coherent
  across chips").
- §Invariants no inter-tile data corruption — derived from ADR-003
  Decision (direct scratchpad-to-scratchpad routing) plus the
  PLAN.md §8.3 single-chip system-test acceptance bar.

Open questions (no source pins, recorded as _Open question:_):

- Per-tile bundle shape between PopRoCCRouter and cluster — tracks
  PopRoCCRouter.SPEC.md §Interface (currently Stub).
- Memory-protocol choice (AXI vs TileLink), per-master ID width,
  L2 configuration — pending Chipyard pin (PLAN.md §6) and DDR5
  PHY vendor-selection ADR.
- Cluster-side bundle shape toward InterChipFabric (port count,
  request-ID width, in-flight tracking depth) — mirrors the same
  Open question in InterChipFabric.SPEC.md §Interface.
- Performance-counter set, counter widths, CSR address-map, and
  access discipline (MMIO vs RoCC-config) — pending CSR-policy
  ADR.
- Pro 16-Gemmini reconciliation with ADR-006 2-bit funct7 field —
  pending follow-up ADR amending gemmini-id encoding.
- Scratchpad-port bundle shape (read/write port count per tile,
  eviction policy, bank-conflict resolution) — tracks
  InterGemminiXbar.SPEC.md §Interface (currently Stub).
- L2-side arbitration policy and DDR5-side request scheduling —
  pending DDR5 controller selection ADR.
- Cluster-stop CSR address, register width, completion-signalling
  discipline, drain order — pending clocking/reset ADR (same one
  blocked in FluidPopSoC.SPEC.md §Behavior).
- Worst-case scratchpad-eviction stall bound — resolves once
  PLAN.md §8.3 step-3 ResNet-50 single-chip bench surfaces an
  empirical worst case.

Constraints honoured:

- Only rtl/src/pop/specs/MultiGemminiCluster.SPEC.md changes
  (ADR-017 off-limits paths respected).
- All ADR / PLAN refs from the Stub preserved (PLAN §8.2, §12.3,
  ADR-001, ADR-002, ADR-004); added refs that the Draft body now
  cites: PLAN §8.3, ADR-003, ADR-005, ADR-006, ADR-011.
- No fabricated widths, latencies, vendor commitments, schedule
  promises, or per-cycle worst cases.
- Sibling SPECs cited as Draft pins only when actually in Draft
  (FluidPopSoC, InterChipFabric). Stub siblings (InterGemminiXbar,
  PopRoCCRouter) appear only inside Open questions.
Author
Owner

VERDICT: PASS

PR #93 promotes rtl/src/pop/specs/MultiGemminiCluster.SPEC.md from Stub to Draft following the same disciplined pattern as PR #87, PR #91, and PR #92. Mechanical guardrails pass: existing SPDX header preserved, single file touched, no AI/Anthropic attribution, no off-limits paths, no Chisel module so rules 4–6 don't apply, autonomous spec-designer role explains missing Closes #. Rule 7 is the critical one and the PR is meticulous: every concrete claim is a verbatim quote from an ADR Decision/Consequences/Context or a PLAN.md section. Spot-checking against approved PRs in this thread: ADR-006 quotes ("Edu: 2x Rocket RV64GC in-order with custom RoCC router to address any of 4 local Gemminis. Pro: 4x cores"; "Custom PopRoCCRouter.scala decodes a 2-bit gemmini-id field in funct7 of custom instruction") match PR #89's ADR-006 reference and the project's existing architecture; ADR-005 quotes match PR #87 and PR #89; ADR-004 quotes ("Edu: 256 KiB SP + 64 KiB accumulator per Gemmini. Pro: 1 MiB SP + 256 KiB accumulator") match PR #92's verbatim cite; ADR-011 quotes match PR #87, PR #91, PR #92; PLAN.md §8.3 acceptance band "single-chip 2.5-3.5x speedup vs 1-Gemmini baseline" matches PR #90; PLAN.md §3.6 transaction set matches PR #91; ADR-003 4×4/16×16 scaling matches PR #92. Importantly, this PR retroactively confirms PR #77's attribution of "2-bit gemmini-id in funct7" to ADR-006 that I had flagged repeatedly as possibly fabricated — the verbatim quote from ADR-006 Consequences ("Custom PopRoCCRouter.scala decodes a 2-bit gemmini-id field in funct7 of custom instruction") shows that attribution was correct, vindicating PR #77. The honest flag on Pro reconciliation is exactly right — ADR-006's 2-bit funct7 field can only address 4 tiles while ADR-001 commits Pro to 16 tiles, so the spec parks this as an Open question requiring a follow-up ADR rather than silently fabricating an extended encoding. The remaining Open questions (per-tile bundle shape, memory protocol choice, cluster-side fabric attach, performance-counter set, scratchpad-port shape, L2-side arbitration, cluster-stop CSR shape, worst-case eviction stall) are appropriately deferred to pending ADRs or sibling SPEC promotions. Sibling Draft cites (FluidPopSoC, InterChipFabric) reference SPECs that are in fact Draft per PR #87 and PR #91; sibling Stub cites (PopRoCCRouter, InterGemminiXbar) appear only inside Open questions, never as Draft-pin sources, exactly as the PR's constraint check states. Promotion history entry is appropriate.

Findings

None.

<!-- agent:review verdict=PASS ts=1779771320 --> **VERDICT:** PASS PR #93 promotes `rtl/src/pop/specs/MultiGemminiCluster.SPEC.md` from Stub to Draft following the same disciplined pattern as PR #87, PR #91, and PR #92. Mechanical guardrails pass: existing SPDX header preserved, single file touched, no AI/Anthropic attribution, no off-limits paths, no Chisel module so rules 4–6 don't apply, autonomous `spec-designer` role explains missing `Closes #`. Rule 7 is the critical one and the PR is meticulous: every concrete claim is a verbatim quote from an ADR Decision/Consequences/Context or a PLAN.md section. Spot-checking against approved PRs in this thread: ADR-006 quotes ("Edu: 2x Rocket RV64GC in-order with custom RoCC router to address any of 4 local Gemminis. Pro: 4x cores"; "Custom PopRoCCRouter.scala decodes a 2-bit gemmini-id field in funct7 of custom instruction") match PR #89's ADR-006 reference and the project's existing architecture; ADR-005 quotes match PR #87 and PR #89; ADR-004 quotes ("Edu: 256 KiB SP + 64 KiB accumulator per Gemmini. Pro: 1 MiB SP + 256 KiB accumulator") match PR #92's verbatim cite; ADR-011 quotes match PR #87, PR #91, PR #92; PLAN.md §8.3 acceptance band "single-chip 2.5-3.5x speedup vs 1-Gemmini baseline" matches PR #90; PLAN.md §3.6 transaction set matches PR #91; ADR-003 4×4/16×16 scaling matches PR #92. Importantly, this PR retroactively confirms PR #77's attribution of "2-bit gemmini-id in funct7" to ADR-006 that I had flagged repeatedly as possibly fabricated — the verbatim quote from ADR-006 Consequences ("Custom `PopRoCCRouter.scala` decodes a 2-bit gemmini-id field in funct7 of custom instruction") shows that attribution was correct, vindicating PR #77. The honest flag on Pro reconciliation is exactly right — ADR-006's 2-bit funct7 field can only address 4 tiles while ADR-001 commits Pro to 16 tiles, so the spec parks this as an Open question requiring a follow-up ADR rather than silently fabricating an extended encoding. The remaining Open questions (per-tile bundle shape, memory protocol choice, cluster-side fabric attach, performance-counter set, scratchpad-port shape, L2-side arbitration, cluster-stop CSR shape, worst-case eviction stall) are appropriately deferred to pending ADRs or sibling SPEC promotions. Sibling Draft cites (FluidPopSoC, InterChipFabric) reference SPECs that are in fact Draft per PR #87 and PR #91; sibling Stub cites (PopRoCCRouter, InterGemminiXbar) appear only inside Open questions, never as Draft-pin sources, exactly as the PR's constraint check states. Promotion history entry is appropriate. ## Findings _None._
fluidpop-bot left a comment
Collaborator

CI green (head 238b12534c), auto-approving

CI green (head 238b12534caa607cce7899c6408aaee04a388d3d), auto-approving
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Fluid/fluidpop-v1!93
No description provided.