Bug: Sub-Agents Persist After Reboot and Never Time Out
A bug in OpenClaw causes sub-agent sessions to survive host reboots indefinitely, cluttering your session list and potentially causing confusion about which agents are actually running.
The Problem
When you spawn sub-agents using sessions_spawn, OpenClaw creates persistent session entries that track the spawned agent's state. These entries are written to disk so they survive gateway restarts鈥攚hich is normally desirable behavior.
However, there's a gap in the cleanup logic: the timeout/reaper mechanism only runs for active gateway processes, not on startup. So if your host machine goes down (crash, reboot, power loss) while a sub-agent is mid-run, that session entry never gets cleaned up.
The result? Ghost sub-agents that appear in sessions_list forever, showing as running even though their underlying processes died with the reboot.
Why This Matters
-
Session list pollution: Over time, stale entries accumulate and make it harder to see which agents are actually active.
-
Potential resource confusion: If you're monitoring session counts or building automation around sub-agent state, stale entries throw off your numbers.
-
Memory overhead: Each stale session entry consumes memory in the gateway process, though this is typically minimal.
Workarounds (Until a Fix Lands)
Manual Cleanup
Sub-agent state is persisted in ~/.openclaw/subagents/runs.json. You can safely remove entries with timestamps older than your last reboot:
# Check the file
cat ~/.openclaw/subagents/runs.json
# Back it up first
cp ~/.openclaw/subagents/runs.json ~/.openclaw/subagents/runs.json.bak
# Then edit to remove stale entriesGateway Restart Trick
Running openclaw gateway restart after a host reboot sometimes triggers a re-scan that clears stale sessions. This behavior is inconsistent, but worth trying before manual cleanup.
openclaw gateway restartAutomation Option
If you're running OpenClaw via systemd or launchd, you could add a post-boot hook that clears the runs.json file or marks all entries as timed out.
The Fix
The proposed solution is straightforward: add a startup sweep that either marks all running sub-agents as timed-out, or checks if their underlying processes still exist before treating them as active.
This would need to account for edge cases like legitimate sub-agents that are genuinely still running (though this shouldn't happen after a full reboot).
Related Issue
This is being tracked in openclaw/openclaw#29795. If you're experiencing this, adding your logs and reproduction steps would help the maintainers prioritize the fix.
Has anyone else run into this? Share your workarounds in the comments.
Comments (0)
No comments yet. Be the first to comment!