fix(ops): make telegram-claude-bridge daemon resilient #21
No reviewers
Labels
No labels
adr
agent:blocked-ci
agent:blocked-human
agent:blocked-resolver
agent:done
agent:in-progress
agent:no-touch
agent:pinged
agent:pr-open
agent:queued
agent:wip
area:board
area:funding
area:infra
area:phy
area:poplink
area:rtl
area:software
area:supply-chain
area:verification
ci-failed
ci-timeout
docs
do-not-merge
human-approved
needs-human-approval
needs-rebase
needs-triage
phase:1
ready-for-review
review:findings
review:pass
risk:tripwire
swarm:quarantined
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Fluid/fluidpop-v1!21
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/bridge-resilience-fix"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Three fixes after PR #20 daemon crashed in first production deploy on agent host.
Symptoms (from agent journalctl):
Root causes:
set -ekilled the daemon on any non-zero exit anywhere in the per-message path. Removed (kept -u, pipefail).save_offsetran AFTERprocess_message; if process crashed first, offset lost, infinite loop on the same msg.... | jq | while read, creating a subshell that lost state (offset writes from subshell visible but parent died first).Fix:
-e, keep-u -o pipefailsave_offsetBEFOREprocess_message|| fallbackon jq calls so malformed Telegram responses don't kill the loopVerified locally:
claude -p reply okreturns in ~3s on agent (Xeon E5-2630 v2 with AVX). The failing component is the daemon shell, not claude or systemd.Tests: shell only. shell-lint validates the daemon. Auto-merge via auto-merge.sh.
CI green (head
975dbee2e5), auto-approving