feat(agents): Resolver Dispatcher + ephemeral per-issue Resolver #26

Merged
navigator merged 1 commit from feature/swarm-resolver-dispatcher-on-25 into main 2026-05-24 14:50:13 -03:00
Owner

Summary

Second wave of the swarm scaffolding. Builds on PR #25 (lib + Operator + Opener). Lands the long-running dispatcher and the ephemeral per-issue resolver it spawns.

What it does

resolver-dispatcher.shType=simple, Restart=always, 5-min poll loop:

  • Reaps finished children; transitions issue labels based on resolver exit code + presence of <!-- agent:pr --> marker comment
  • Atomic claim: swaps agent:queuedagent:in-progress + posts claim marker BEFORE creating the worktree
  • git worktree add -b auto/issue-<N>-<runid> origin/main under /var/lib/fluidpop-worktrees/ (root created by PR #27's install-swarm.sh)
  • Per-worktree identity: FluidPop Swarm <swarm@pop.coop> (per ADR-017)
  • Cap: $MAX_CONCURRENT_RESOLVERS (default 2)
  • git worktree prune on startup catches orphans
  • SIGTERM-safe: traps reap-before-exit

resolver.sh — ephemeral one-shot, invoked by dispatcher (not by systemd):

  • Background heartbeat poster comments <!-- agent:heartbeat run=… ts=… --> every 5 min on source issue (Operator uses staleness to detect dead resolvers)
  • sandbox_claude envelope with cgroup caps (MemoryMax=8G, CPUQuota=400%, TasksMax=512)
  • Post-mortem scan_escalation parses stream.jsonl for any tool call touching ADR-017 off-limits paths → quarantines + Telegram high
  • claude itself does the work: branch → commit → push → open PR → post <!-- agent:pr --> marker on source issue

resolver.md prompt — hard constraints (1 PR per issue, ≤600 LOC, off-limits paths, no --no-verify, no AI attribution); step-by-step commands; explicit blocker-handling protocol.

Validation

  • bash -n green on both .sh
  • systemd-analyze --user verify green on the .service
  • End-to-end test gated by Stage 4 cold start

Notes

This PR does NOT activate the dispatcher — install/start happens in PR #27.

Depends on PR #25 (merged) for lib/*.

## Summary Second wave of the swarm scaffolding. Builds on PR #25 (lib + Operator + Opener). Lands the long-running dispatcher and the ephemeral per-issue resolver it spawns. ## What it does **`resolver-dispatcher.sh`** — `Type=simple`, `Restart=always`, 5-min poll loop: - Reaps finished children; transitions issue labels based on resolver exit code + presence of `<!-- agent:pr -->` marker comment - Atomic claim: swaps `agent:queued` → `agent:in-progress` + posts claim marker BEFORE creating the worktree - `git worktree add -b auto/issue-<N>-<runid> origin/main` under `/var/lib/fluidpop-worktrees/` (root created by PR #27's `install-swarm.sh`) - Per-worktree identity: `FluidPop Swarm <swarm@pop.coop>` (per ADR-017) - Cap: `$MAX_CONCURRENT_RESOLVERS` (default 2) - `git worktree prune` on startup catches orphans - SIGTERM-safe: traps reap-before-exit **`resolver.sh`** — ephemeral one-shot, invoked by dispatcher (not by systemd): - Background heartbeat poster comments `<!-- agent:heartbeat run=… ts=… -->` every 5 min on source issue (Operator uses staleness to detect dead resolvers) - `sandbox_claude` envelope with cgroup caps (MemoryMax=8G, CPUQuota=400%, TasksMax=512) - Post-mortem `scan_escalation` parses `stream.jsonl` for any tool call touching ADR-017 off-limits paths → quarantines + Telegram `high` - claude itself does the work: branch → commit → push → open PR → post `<!-- agent:pr -->` marker on source issue **`resolver.md` prompt** — hard constraints (1 PR per issue, ≤600 LOC, off-limits paths, no `--no-verify`, no AI attribution); step-by-step commands; explicit blocker-handling protocol. ## Validation - [x] `bash -n` green on both `.sh` - [x] `systemd-analyze --user verify` green on the `.service` - [ ] End-to-end test gated by Stage 4 cold start ## Notes This PR does NOT activate the dispatcher — install/start happens in PR #27. Depends on PR #25 (merged) for `lib/*`.
feat(agents): Resolver Dispatcher + ephemeral per-issue Resolver
All checks were successful
build / scalafmt-check (push) Successful in 3s
build / sbt-compile (push) Successful in 3s
build / shell-lint (push) Successful in 10s
build / scalafmt-check (pull_request) Successful in 3s
build / sbt-compile (pull_request) Successful in 4s
build / shell-lint (pull_request) Successful in 9s
b0fa96f872
Second wave of the autonomous swarm. Adds the long-running dispatcher
that consumes agent:queued issues and the per-issue ephemeral resolver
it spawns inside isolated git worktrees.

resolver-dispatcher.sh (Type=simple, Restart=always):
- 5-min poll loop with circuit-breaker guard
- Reaps finished children, transitions issue labels based on resolver
  exit code + presence of <!-- agent:pr --> marker comment
- Atomic claim: swap agent:queued -> agent:in-progress + post claim
  marker BEFORE creating worktree (single-instance via flock means no
  real race, belt+braces)
- git worktree add -b auto/issue-<N>-<runid> origin/main
  WORKTREE_ROOT=/var/lib/fluidpop-worktrees (writable by navigator;
  install-swarm.sh in PR #27 will create it with correct perms)
- Per-worktree git identity = FluidPop Swarm <swarm@pop.coop>
- Max parallel = $MAX_CONCURRENT_RESOLVERS (default 2, override via env)
- Worktree prune on startup catches orphaned worktrees from prior crashes
- SIGTERM-safe: traps to reap before exit

resolver.sh (ephemeral, invoked by dispatcher — not by systemd):
- Inherits ISSUE_N RUN_ID WORKTREE BRANCH EXIT_FILE LOG_DIR via env
- Starts a 5-min heartbeat poster in background that comments
  <!-- agent:heartbeat run=<id> ts=<utc> --> on the source issue
  (Operator uses heartbeat staleness to detect dead resolvers)
- Composes prompt from issue body + last 5 comments
- Invokes sandbox_claude (systemd-run --user --scope cgroup-capped)
- Post-mortem: scan_escalation parses stream.jsonl for any tool call
  touching ADR-017 off-limits paths; if found, labels swarm:quarantined
  and Telegram high-pri (no PR push)
- claude itself is responsible for: branch, commit, push, open PR via
  Forgejo REST, post <!-- agent:pr pr=#N --> marker comment on source
  issue. Resolver doesn't do any of that on claude's behalf.
- trap EXIT writes exit code to file for dispatcher reap

resolver.md prompt (roles/prompts/):
- Hard constraints: 1 PR per issue, <=600 LOC, no off-limits paths,
  no --no-verify, no AI attribution in artifacts
- Step-by-step: read issue -> code -> validate -> commit -> push ->
  open PR -> apply ready-for-review label -> post <!-- agent:pr --> marker
- Blocker handling: if stuck, post <!-- agent:fail reason=... --> and exit
  non-zero (dispatcher transitions issue to agent:blocked-resolver)

Test:
- bash -n green on both .sh files
- systemd-analyze --user verify green on the .service
fluidpop-bot left a comment
Collaborator

CI green (head b0fa96f872), auto-approving

CI green (head b0fa96f87279cef54cfcfac1ef898a155b4738df), auto-approving
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Fluid/fluidpop-v1!26
No description provided.