# Vision / Image Handling Notes

## Problem: vision_analyze returns "看不到图片" with local cache files

When `vision_analyze(image_url="/home/kuhnn/.hermes/image_cache/img_xxx.jpg")` is called, even valid JPEG files (960×1280) return "没有看到图片".

**Root cause:** MiniMax provider's `/anthropic/v1/messages` interface does not support passing image base64 in the tool result context. The `auxiliary_client.py` MiniMax vision implementation fails to embed the image correctly.

**Immediate workaround — send via Telegram:**
```python
send_message(
    message="MEDIA:/home/kuhnn/.hermes/image_cache/img_40fd30c92feb.jpg",
    target="telegram:7862937585"
)
```

Other approaches that do NOT work:
- `browser_navigate(url="file:///path/to/image.jpg")` → blank page
- `execute_code` + PIL → verifies file validity only, no visual analysis
- `terminal + file/exiftool` → metadata only

## Definitive fix: configure OpenAI vision provider

Switch `auxiliary.vision` to OpenAI (which natively supports image inputs):

**Option A — CLI:**
```bash
hermes config set auxiliary.vision.provider openai
hermes config set auxiliary.vision.model gpt-4o
```

**Option B — Edit `~/.hermes/config.yaml`:**
```yaml
auxiliary:
  vision:
    provider: openai
    model: gpt-4o
    api_key: ''   # leave empty; reads from .env
```

**Then add to `~/.hermes/.env`:**
```
OPENAI_API_KEY=sk-...
```

**Restart required:** `/restart` (gateway) or start a new session.

## openai-codex OAuth — broken for API calls (2026-05-19)

`openai-codex` OAuth token (from `hermes auth add openai-codex`) refreshes successfully, but the actual Codex API endpoint (`https://chatgpt.com/backend-api/codex/models`) rejects the token with:

```
{"detail":"Invalid client_version format"}
```

**Diagnosis steps:**
```bash
# 1. Check token claims — account type, expiry
cat ~/.hermes/auth.json | python3 -c "
import json,sys,base64
d=json.load(sys.stdin)
tok = d['credential_pool']['openai-codex'][0]['access_token']
payload = tok.split('.')[1] + '=' * (4 - len(tok.split('.')[1]) % 4)
print(json.dumps(json.loads(base64.b64decode(payload)), indent=2))
"

# 2. Probe the Codex endpoint directly
ACCESS_TOKEN=$(cat ~/.hermes/auth.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['credential_pool']['openai-codex'][0]['access_token'])")
curl -s "https://chatgpt.com/backend-api/codex/models" \
  -H "Authorization: Bearer $ACCESS_TOKEN"
```

**The token is valid (exp 2027, ChatGPT Plus account) but the endpoint requires a `client_version` param the OAuth flow doesn't provide.** This is a backend incompatibility — the OAuth token was designed for the ChatGPT web UI, not the Codex API.

**MCP tool (`mcp_minimax_understand_image`) — "login fail" with correct env vars (2026-05-23):**
- config.yaml `mcp_servers.minimax.env.MINIMAX_API_KEY` is set correctly (masked as `sk-cp-...VxYz`)
- The MCP server process DOES receive the env var
- Tool still returns: `login fail: Please carry the API secret key in the 'Authorization' field of the request header`
- Root cause: the MCP server's internal HTTP client for `/v1/coding_plan/vlm` does not correctly attach the `Authorization: Bearer` header — it may be sending `X-Api-Key` or missing the header entirely
- **Status: known bug, no config fix available** — do NOT rely on this tool for local image files

**Diagnosis path used (2026-05-23):**
```bash
# 1. Verify image file exists and is valid
file ~/.hermes/image_cache/img_xxx.jpg
# → JPEG image data, JFIF standard, 960x1280, progressive, 185KB

# 2. Check gateway image routing log
grep -i "image\|vision" ~/.hermes/logs/gateway.log | tail -10
# → "Image routing: text (mode=text). Pre-analyzing 1 image(s) via vision_analyze."

# 3. Check errors.log for vision failures
grep -i "vision\|image\|understand" ~/.hermes/logs/errors.log | tail -10

# 4. Verify MiniMax API auth header — confirm header format mismatch
curl -s -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -H "X-Api-Key: $MINIMAX_API_KEY" \
  https://api.minimaxi.com/v1/coding_plan/vlm \
  -d '{"prompt":"test","image_url":""}' -X POST
# → confirms API rejects when wrong header format used

# 5. Check MCP server config
grep -A5 "mcp_servers:" ~/.hermes/config.yaml
# → env.MINIMAX_API_KEY is set, but MCP server HTTP client ignores it
```

**Key distinctions**

- **ChatGPT Plus** subscription ≠ **OpenAI API Key** — these are independent products
- **OpenAI Codex OAuth** is sufficient for vision (uses the ChatGPT OAuth token)
- **OpenAI API Key** (`sk-...`) also works — set in `.env` as `OPENAI_API_KEY`
- Codex OAuth tokens are stored in `auth.json` credential pool, NOT in `.env`

## MiniMax official image understanding — direct API

MiniMax provides image understanding via `/v1/coding_plan/vlm` — a **separate endpoint** from `/anthropic/v1/messages` or `/v1/chat/completions`. The MCP tool `mcp_minimax_understand_image` calls this internally.

**Correct endpoint:** `https://api.minimaxi.com/v1/coding_plan/vlm`

**Known working call pattern:**
```python
import urllib.request, json, base64

with open('/path/to/image.jpg', 'rb') as f:
    img_b64 = base64.b64encode(f.read()).decode()

payload = {
    "prompt": "描述这张图片",
    "image_url": f"data:image/jpeg;base64,{img_b64}"
}

req = urllib.request.Request(
    'https://api.minimaxi.com/v1/coding_plan/vlm',
    data=json.dumps(payload).encode(),
    headers={
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json'
    },
    method='POST'
)
with urllib.request.urlopen(req, timeout=60) as resp:
    result = json.loads(resp.read())
    print(result['content'])
```

**Common errors when calling the wrong endpoint:**
- `404 page not found` on `/anthropic/v1/messages` or `/anthropic/v1/chat/completions` → wrong path
- `HTTP 400` on `/v1/chat/completions` → wrong endpoint for VL
- `login fail: Please carry the API secret key` → MCP auth env issue (check `MINIMAX_API_KEY` in MCP server env, not just `.env`)

**MCP tool and local file paths:** `mcp_minimax_understand_image` supports local file paths natively (converts to base64 internally via `process_image_url()` in utils.py). If it fails on local paths, the MCP server process may be missing env vars — check `~/.hermes/config.yaml` → `mcp_servers.minimax.env`.
