feat(ops): risk dashboard + CI runner liveness monitor (Issue #9) #14

Merged
navigator merged 1 commit from feature/risk-dashboard into main 2026-05-24 11:31:13 -03:00
Owner

Closes Issue #9 (risk-tracking dashboard).

What this PR adds:

  • docs/risk-dashboard.md (CC BY-SA 4.0): live mirror of PLAN.md Section 0.1 risk register with all 21 failure IDs (F-T1..F-M3) and 6 tripwires (TW-1..TW-6). Updated every Friday alongside weekly checkpoint per PLAN.md Section 17.
  • infra/ops/runner-monitor.sh (AGPL v3): polls Forgejo admin runners API daily, tracks state in /var/lib/fluidpop-monitor/last-status, exit 2 with stderr alarm if fluidpop-ci-01 status != online for > 1h. Optional Mastodon post if MASTODON_TOKEN set. Silent on success (cron-safe).
  • infra/ops/runner-monitor.timer + .service: systemd timer+service for daily 08:00 polling with 15min randomization, hardened with ProtectSystem=strict + PrivateTmp.

Closes the TW-3-adjacent gap (CI infra failure tripwire) noted in PLAN.md Section 0.5. See also ADR-016 for the runner inventory and ownership.

Tests: docs+shell only. CI runs shell-lint on the new monitor script. Per GitOps workflow, will be auto-merged via infra/forgejo/auto-merge.sh once CI green.

Closes Issue #9 (risk-tracking dashboard). What this PR adds: - docs/risk-dashboard.md (CC BY-SA 4.0): live mirror of PLAN.md Section 0.1 risk register with all 21 failure IDs (F-T1..F-M3) and 6 tripwires (TW-1..TW-6). Updated every Friday alongside weekly checkpoint per PLAN.md Section 17. - infra/ops/runner-monitor.sh (AGPL v3): polls Forgejo admin runners API daily, tracks state in /var/lib/fluidpop-monitor/last-status, exit 2 with stderr alarm if fluidpop-ci-01 status != online for > 1h. Optional Mastodon post if MASTODON_TOKEN set. Silent on success (cron-safe). - infra/ops/runner-monitor.timer + .service: systemd timer+service for daily 08:00 polling with 15min randomization, hardened with ProtectSystem=strict + PrivateTmp. Closes the TW-3-adjacent gap (CI infra failure tripwire) noted in PLAN.md Section 0.5. See also ADR-016 for the runner inventory and ownership. Tests: docs+shell only. CI runs shell-lint on the new monitor script. Per GitOps workflow, will be auto-merged via infra/forgejo/auto-merge.sh once CI green.
feat(ops): risk dashboard + CI runner liveness monitor (Issue #9)
All checks were successful
build / scalafmt-check (push) Successful in 4s
build / sbt-compile (push) Successful in 5s
build / shell-lint (push) Successful in 19s
build / scalafmt-check (pull_request) Successful in 4s
build / sbt-compile (pull_request) Successful in 5s
build / shell-lint (pull_request) Successful in 8s
e63072b7d6
docs/risk-dashboard.md (CC BY-SA 4.0): live mirror of PLAN.md Section 0.1
risk register, updated every Friday alongside weekly checkpoint. All 21
failure IDs (F-T1 to F-M3) listed with Status/Trend/P/I/Mitigation/Owner.
6 tripwires from Section 0.5 listed with armed status.

infra/ops/runner-monitor.sh (AGPL v3): polls Forgejo admin runners API
daily, tracks state in /var/lib/fluidpop-monitor/last-status, exit 2
with stderr alarm if fluidpop-ci-01 status != online for > 1h. Optional
Mastodon post to social.pop.coop if MASTODON_TOKEN set. Silent on
success (cron-safe).

infra/ops/runner-monitor.timer: systemd timer, daily 08:00 with 15min
randomization, Persistent=true.

infra/ops/runner-monitor.service: companion oneshot service with
EnvironmentFile=settings.env, hardening (ProtectSystem=strict,
PrivateTmp, NoNewPrivileges).

Closes the TW-3-adjacent gap noted in PLAN.md Section 0.5 (CI infra
failure tripwire). See also ADR-016.
fluidpop-bot left a comment
Collaborator

CI green (head e63072b7d6), auto-approving

CI green (head e63072b7d628a2fedc709d34c72fdd54016516d3), auto-approving
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Fluid/fluidpop-v1!14
No description provided.