# Telegram Watchdog — Log-Based Recovery

## The Problem

Telegram connection drops → gateway retries 10x → platform enters `paused` state → messages stop flowing silently. Gateway itself stays alive (cron jobs, CLI, other platforms unaffected). Only Telegram is down.

## Log Signatures

```
# Connection failure begins
WARNING [Telegram] Primary api.telegram.org connection failed
WARNING [Telegram] Fallback IP X.X.X.X failed: All connection attempts failed
WARNING gateway.run: Reconnect telegram error: telegram connect timed out after 30s

# Retry loop (exponential backoff: 5s, 10s, 20s, 40s, 80s, 160s, 300s...)
WARNING [Telegram] Telegram network error (attempt N/10), reconnecting in Xs

# Platform pauses (last resort after 10 failures)
WARNING gateway.run: telegram paused after 10 consecutive failures
  — fix the underlying issue then run `/platform resume telegram` to retry,
    or `hermes gateway restart` to restart the gateway.

# Successful recovery
INFO gateway.platforms.telegram: [Telegram] Connected to Telegram (polling mode)
```

## Detection via Log Analysis

```bash
# Check last Connected vs last paused
grep -E 'Connected to Telegram|telegram paused' ~/.hermes/logs/agent.log | tail -5

# If paused exists and is newer than last Connected → Telegram is down
```

## Recovery Commands

```bash
# Option 1: Restart gateway (reconnects Telegram automatically)
hermes gateway restart

# Option 2: From Telegram chat (if reachable) — NOT usable in cron
/platform resume telegram

# Option 3: Via systemctl directly
systemctl --user restart hermes-gateway
```

## Why /platform Resume Telegram Fails in Cron

The `/platform resume telegram` command is a Telegram message slash command — it requires an active Telegram connection to receive. If Telegram is paused, the bot can't receive ANY messages, including slash commands from the user. Cron sessions have no Telegram connection at all.

**Therefore:** Always use `hermes gateway restart` for automated recovery from cron.

## Cron Watchdog Prompt Template

```
你是 Telegram 看门狗。检查 Telegram 连接是否正常，如果异常就自动恢复。

## 检测步骤
1. 读取日志文件 ~/.hermes/logs/agent.log
2. 查找最近一条 "Connected to Telegram" 的时间戳
3. 查找最近一条 "telegram paused" 的时间戳

## 判断逻辑
- 如果日志中最近有 "telegram paused" 且之后没有 "Connected to Telegram"，
  说明 Telegram 断开且未恢复
- 或者距离最后一条 "Connected to Telegram" 超过 30 分钟，且期间没有新消息活动

## 执行恢复（如需要）
hermes gateway restart
等待 10 秒后再次检查日志确认恢复。

## 输出
- 正常：回复 "Telegram watchdog: OK"
- 已恢复：回复 "Telegram watchdog: 已重启并恢复连接"
- 其他错误：描述问题
```

## Recommended Schedule

- **High availability:** `0 8 * * 1-5` (every weekday 08:00)
- **Normal:** `0 8 * * 1,3,5` (Mon/Wed/Fri 08:00)
- **Low priority:** `0 9 * * *` (daily 09:00, same as morning briefing)
