# Agent Detection Kit

Drop-in starter for catching unexpected Claude behavior at the three tiers that matter:

1. **CI / machine detectors** (`.github/workflows/`) — catches pattern-level issues on every PR
2. **Hooks** (`.claude/hooks/`) — catches issues in flight, as Claude works
3. **Subagents** (`.claude/agents/`) — read-only reviewers for periodic audits

Plus supporting scripts for log rotation and weekly review generation.

---

## What this kit does NOT include

**Observability** (error tracking, metrics, dashboards) is the fourth detection tier, but it depends entirely on your specific stack (Sentry vs Rollbar, Datadog vs Prometheus, etc.). Wire that up separately against whatever you already run. The outcomes to watch for are listed at the end of this README.

---

## Quick install

From the root of your repo:

```bash
# Assuming you've cloned or downloaded the kit to ./detection-kit
cp -r detection-kit/.github/* .github/
cp -r detection-kit/.claude/* .claude/
cp -r detection-kit/scripts/* scripts/
chmod +x .claude/hooks/*.sh
chmod +x .claude/scripts/*.sh
chmod +x scripts/ci/checks/*.sh

# Verify
ls -la .claude/hooks/
ls -la .claude/agents/
ls -la scripts/ci/checks/
```

Commit everything. On the next PR, you should see the new CI checks running.

---

## First-run setup

### 1. Generate baselines

Before CI can check "did coverage drop?" or "did tests slow down?" it needs a baseline. On your main branch:

```bash
npm test -- --coverage --coverageReporters=json-summary
mkdir -p docs
cp coverage/coverage-summary.json docs/coverage-baseline.json

# Time the test suite
START=$(date +%s); npm test --silent > /dev/null 2>&1; END=$(date +%s)
echo $((END - START)) > docs/test-runtime-baseline.txt

git add docs/coverage-baseline.json docs/test-runtime-baseline.txt
git commit -m "chore: add CI baselines"
```

### 2. Tune the permission rules

Open `.claude/settings.json` and review:

- **`allow` list:** Add commands specific to your stack. If you use `make` heavily, add `Bash(make:*)`. If you use a specific test runner, add it.
- **`ask` list:** Anything you want Claude to confirm before running. Defaults cover package installs, pushes, and docker.
- **`deny` list:** Absolutely-never actions. Defaults cover `rm -rf`, force push to protected branches, reading `.env` files.

### 3. Tune the custom checks

The scripts in `scripts/ci/checks/` are examples for common patterns. Review each and adapt:

- `no-direct-env-access.sh` — change `src/config/` to wherever your config module lives, or delete if you don't have one yet
- `migrations-reversible.sh` — change `MIGRATIONS_DIR` if your migrations live elsewhere
- `routes-have-auth.sh` — change `ROUTES_DIR` and the auth-marker patterns to match your framework
- `no-silent-errors.sh` — usually works as-is; customize if your codebase has legitimate empty-catch patterns
- `openapi-matches-routes.sh` — only relevant if you maintain an OpenAPI spec; delete otherwise

Add your own custom checks as you notice patterns. The template is simple:

```bash
#!/usr/bin/env bash
# What this checks and why
set -euo pipefail
# ... check logic ...
# exit 0 if pass, exit 1 if fail
```

### 4. Install the weekly review cron (optional)

```bash
# Run weekly review every Friday at 4pm
(crontab -l 2>/dev/null; echo "0 16 * * 5 cd $(pwd) && ./.claude/scripts/weekly-review.sh") | crontab -
```

Or just run it manually every Friday:
```bash
./.claude/scripts/weekly-review.sh
```

### 5. Test the hooks

Start Claude Code and try each tier:

```bash
cd /your/repo
claude
```

Then in the Claude conversation:

- Ask Claude to `rm -rf /tmp/test` → should be blocked by `block-dangerous.sh`
- Ask Claude to write a file with a `TODO:` comment → should be blocked by `detect-antipatterns.sh`
- Ask Claude to edit a file in `src/auth/` → should be flagged in `.claude/logs/sensitive.log`
- End the session and check `.claude/logs/session-summaries.md` → should have a new entry

If any of these fail, the hook script probably needs `chmod +x`. Run `chmod +x .claude/hooks/*.sh` and retry.

---

## Daily workflow

### As Claude works

Hooks fire automatically. You'll see:

- `.claude/logs/bash.log` — every command Claude runs
- `.claude/logs/sensitive.log` — every time Claude edits a sensitive file
- `.claude/logs/session-summaries.md` — a summary at the end of each session

### When a PR opens

CI runs the agent gauntlet automatically. The PR gets a "Change Summary" comment listing new files, deleted files, sensitive-area touches, and top modified files. This is your 30-second read.

### Before merging a significant PR

Invoke the PR reviewer:

```
In Claude Code:
> Use the pr-reviewer agent to review PR #42
```

Get a severity-rated report. Address critical findings; decide on the others.

### For anything security-sensitive

Invoke the security reviewer:

```
> Use the security-reviewer agent to review the changes in src/auth/
```

### Weekly

Run the review generator and then invoke the architecture auditor:

```bash
./.claude/scripts/weekly-review.sh
```

Then in Claude Code:
```
> Use the architecture-auditor agent to audit the last 7 days of changes on main
```

Read the report. For each finding, decide:
- Fix now? → File an issue or open a PR
- Update rules to prevent recurrence? → Update CLAUDE.md, settings.json, a skill, or a hook
- Accept? → Note it in the weekly review file

### Monthly

- Audit `CLAUDE.md` for bloat (target: <200 lines)
- Audit skills — are any duplicative or never-activating? Retire them
- Audit `.claude/logs/` — rotate with `./.claude/scripts/rotate-logs.sh`
- Review `ask` rules — are you approving the same things constantly? Promote to `allow`

---

## What outcomes to watch for (observability tier)

This kit doesn't set up observability. But these are the signals that indicate something got through the first three tiers:

| Signal | What it means |
|--------|---------------|
| Sentry error rate spike within 1 hour of deploy | New code is throwing in production |
| p99 latency increase on endpoints touched in recent PRs | Performance regression (usually N+1 or missing index) |
| Test suite runtime grew > 2x in a week | Slow tests added, or parallelization broken |
| New slow queries in DB slow log | New query pattern that needs an index |
| Bundle size grew > 10% in a week | Heavy dependency or large import added |
| Memory or CPU trending up without feature launches | Leak introduced |
| API error rate by endpoint (p50/p99/max) | Behavior changed on an endpoint |
| New dependencies added this sprint (count) | Dependency creep |
| Alert rate (pages per week) | System stability |

Set up dashboards / alerts for these in whatever tool you use (Datadog, Grafana, New Relic, Sentry, etc.). When one fires, the investigation is: "what was in the most recent deploy?" → "what changes did Claude make that might cause this?" → root cause.

---

## Directory structure reference

```
.
├── .github/
│   └── workflows/
│       └── agent-gauntlet.yml        # CI: 10-job detection pipeline
│
├── .claude/
│   ├── settings.json                 # Permissions + hook wiring
│   ├── hooks/
│   │   ├── log-bash.sh               # Audit trail for every command
│   │   ├── block-dangerous.sh        # Block dangerous command patterns
│   │   ├── flag-sensitive.sh         # Log edits to sensitive files
│   │   ├── detect-antipatterns.sh    # Block bad code at write time
│   │   ├── auto-format.sh            # Format files after write
│   │   └── session-summary.sh        # One-page summary at session end
│   ├── agents/
│   │   ├── architecture-auditor.md   # Weekly architectural audit
│   │   ├── security-reviewer.md      # Security review for sensitive changes
│   │   └── pr-reviewer.md            # Pre-merge PR review
│   ├── scripts/
│   │   ├── rotate-logs.sh            # Weekly log rotation
│   │   └── weekly-review.sh          # Weekly review report generator
│   └── logs/                         # Generated at runtime, gitignored
│
└── scripts/
    └── ci/
        └── checks/
            ├── no-direct-env-access.sh
            ├── migrations-reversible.sh
            ├── routes-have-auth.sh
            ├── no-silent-errors.sh
            └── openapi-matches-routes.sh
```

---

## Maintenance

Every piece of this kit should evolve with your codebase. Treat it like code:

- Commit changes to the kit through PRs, not directly
- Review hooks and checks periodically — false positives train Claude (and humans) to ignore them
- When you discover a new class of mistake Claude makes, add detection for it in the appropriate tier
- Retire checks that haven't fired in months — they're probably no longer relevant

The kit is a starting point, not an end state. After three months of use, it should look meaningfully different from what you installed — more checks specific to your patterns, tightened rules based on observed mistakes, retired checks that became obsolete.

---

## `.gitignore` additions

Add these to your `.gitignore`:

```
.claude/logs/bash.log
.claude/logs/bash.jsonl
.claude/logs/sensitive.log
.claude/logs/sudo.log
.claude/logs/archive/
```

Keep `session-summaries.md` and `weekly-reviews/` in git — they're useful for team review and historical context.