| Offline evals | Online evals | |
|---|---|---|
| Tool | nao test | Recommendations |
| When | Before you ship, in CI | Continuously, on production usage |
| Signal | A fixed suite of questions you wrote | What real users actually asked and where the agent struggled |
| Answers | ”Did I regress against my benchmark?" | "Where is my context missing, wrong, or unclear?” |
nao test catches regressions against a benchmark you control. Recommendations catch the gaps you didn’t think to test, by mining what users do once the agent is live. Together they close the loop between shipping context and improving it.
How it works
On a schedule you choose (daily, weekly, or monthly), or on demand with Run now, nao runs an analysis agent that audits your own project context against real usage. The agent reads your usage data only through a read-only, project-scoped sandbox (auth and PII columns are excluded, and it can only see the current project). It produces a ranked list of context improvements, each pointing at the file to change and what is missing.What it scans
The agent mines nao’s own usage to find friction, then reads your context files to locate where each fix belongs. Usage signal (read-only SQL over nao’s usage views):- Tool errors - queries or tools that fail repeatedly
- Negative feedback - messages users downvoted
- Regenerations and corrections - answers users had to redo or correct
- Coverage gaps - questions the agent couldn’t answer, or recurring analysis patterns it handles inconsistently
RULES.md, semantics/*.md, databases/**, docs/, and synced repositories under repos/<name>/.
What it outputs
Review recommendations as an admin under Settings -> Recommendations:- Impact-ordered cards - the highest-friction issues first.
- A concrete target - each recommendation names the file to edit and what is missing, wrong, or unclear.
- Provenance - the chat the friction came from, and the model that produced the recommendation.
- Lifecycle actions - acknowledge, snooze, or dismiss each one as you work through them.
What you can configure
From the Recommendations settings page you control:- Analysis model - which LLM provider and model run the audit.
- Run frequency - daily, weekly, or monthly, at a fixed time of day. You can also trigger a run any time with Run now.
- Custom system prompt instructions - extra guidance appended to the built-in audit instructions on every run. For example: “Spot recurring analysis patterns and propose new skills to structure them” or “Scan the conversations of the last 3 weeks”.
- GitHub repository - the
owner/namerepo where fixes are opened as pull requests (see below). Project files are not synced from this repo; it is only the target for PRs. - YOLO mode - when on, nao opens pull requests automatically after each run, without human review, and marks those recommendations as applied. Leave it off to review each recommendation before opening a PR.
BETA_CONTEXT_RECOMMENDATIONS_ENABLED=true.
Ways to use it
- Catch tools that fail regularly - a metric the agent keeps querying wrong usually means a definition is missing or ambiguous in
semantics/. - Act on new user feedback - downvotes and corrections surface as recommendations instead of sitting unread in the feedback log.
- Encode recurring analysis patterns - when many users ask the same kind of question, the agent suggests documenting it once in your context so every answer is consistent.
Turn a recommendation into a pull request
When a GitHub repository is connected, you can open the fix as a pull request directly from a recommendation - nao writes the edit to the right context file and opens the PR for your review.- Connect GitHub to your nao account. nao uses the same GitHub OAuth app as GitHub SSO for repository access. Connect it once at the account level.
- Choose the target repository on the Recommendations settings page, in
owner/nameformat. nao suggests the repositories already declared in yournao_config.yaml. - Open the PR. From a recommendation, nao proposes the concrete edit (for human-written files like
RULES.mdorsemantics/**, or for upstream files underrepos/<name>/when that repo maps to a GitHub URL) and opens a pull request. You review and merge it like any other change, keeping your context under version control.
databases/**, which nao sync rewrites) or in unconnected sources, nao instead hands you clear guidance to apply yourself rather than opening a PR.
To skip the manual step entirely, turn on YOLO mode (see What you can configure) and nao opens these PRs automatically after each run for you to review and merge.
Next steps
Evaluation
Build the offline test suite that complements recommendations
Playbook
See where recommendations fit in the end-to-end workflow