What these skills are
nao publishes six context-engineering skills β standard SKILL.md files (YAML frontmatter + markdown) that any agentic CLI (Claude Code, Codex, Cursor, the Claude Agent SDK, β¦) can load and invoke.
Each skill is a self-contained workflow that automates one stage of the Context Engineering lifecycle: scoping a project, writing rules, building a test suite, auditing whatβs there, adding a semantic layer once tests show metric gaps, and deploying context to production via CI.
These skills are for the human/agent driving nao (in their IDE or terminal). They are different from the runtime skills the chat agent calls at query time β see Tools, MCPs, Skills for those.
Install
The published source of truth lives at github.com/getnao/nao/tree/main/skills. Install all five into the current projectβs .claude/skills/ with:
nao skills add getnao/nao
nao skills is a thin wrapper around the open-source skills CLI from Vercel Labs, so the equivalent direct call also works:
npx skills add getnao/nao
Run from the root of the project where nao_config.yaml lives. Re-run any time to pick up updates; pass through --force to overwrite local edits.
Once installed, the skills are auto-discovered by Claude Code, Codex, and other agentic CLIs that load .claude/skills/. Trigger one by name (e.g. βuse the setup-context skillβ) or let the agent route on the skill description.
The six skills
setup-context
Takes the user from pip install nao-core to a synced project with a starter RULES.md. First-time install only β for editing rules, generating tests, or reviewing an existing context, use the other skills below.
Steps
- Ask everything in one round β warehouse + auth, scope (which tables, β€100 with 20 as the target), extra context (dbt / ETL / BI repos, Notion, internal docs), LLM provider.
- Look up the warehouse-specific config from docs.getnao.io/nao-agent/context-builder/databases, write
nao_config.yaml, run nao init, then print a summary for the user to confirm before continuing.
nao sync β populate databases/, repos/, docs/, semantics/. Donβt move on until sync is clean.
- Generate
RULES.md by handing off to write-context-rules.
- Wire up the LLM key via
${ANTHROPIC_API_KEY} (or equivalent) β never paste keys into chat.
- Recommend next steps β smoke test with
nao chat, review RULES.md, then create-context-tests.
write-context-rules
Owns RULES.md. Generates the six standard sections, section by section, showing the user each block before moving on. If RULES.md already has content, runs an audit-and-fill flow that fills only whatβs missing.
Steps
## Business overview β Product + Business model (sourced from web search + databases/ + dbt repo).
## Data architecture β Warehouse, data stack, layers, sources.
## Core data models β ### Most Used Tables (one-line pointers) + ### Tables detail (Purpose, Granularity, Key Columns β€10, Use For).
## Key Metrics Reference β grouped by category; **metric** β table, column, formula.
## Date filtering β three example formulas (last X weeks / last X days / current month) keyed off the userβs week-boundary and current-period-inclusion conventions.
## Analysis Process β five subsections: Understand β Select Table β Write Query β Validate β Context.
- Validate metrics with the user β confirm every source-of-truth pointer in the metrics reference.
- Date filtering, with the user β pick week boundary (Sunday vs Monday) and current-period inclusion.
Template β templates/RULES.md, the six-section scaffold:
# RULES.md
> Included with every message sent to the nao agent. Keep it lean.
> Per-table detail belongs in `databases/<table>.md`, not here.
## Business overview
**Product**: β¦
**Business model**: β¦
## Data architecture
**Warehouse:** β¦
**Data stack:** β¦
**Data layers:** β¦
**Data sources:** β¦
## Core data models
### Most Used Tables
- `<table>` β one-line purpose. See `databases/.../table=<table>/` folder.
### Tables detail
#### `<table>`
**Purpose**: β¦
**Granularity**: One row per β¦
**Key Columns**: (β€10)
**Use For**: β¦
## Key Metrics Reference
### <Category>
- **<metric>** β `<table>.<column>`, `<formula>`
## Date filtering
> Convention: e.g. "Week starts Monday; 'last X weeks' excludes the current incomplete week."
### Last X weeks
```sql
β¦
```
### Last X days
### Current month
## Analysis Process
### 1. Understand the Question
### 2. Select the Right Table(s)
### 3. Write Efficient Queries
### 4. Validate Results
### 5. Provide Context
create-context-tests
Generates a test suite of natural-language β SQL pairs that becomes the reliability benchmark. nao test runs each prompt through the agent, executes both the agentβs SQL and the testβs expected SQL, and diffs the result data row-by-row. See Evaluation for the scoring model.
Two authoring rules
- Prompts read like real chat. Short, vague, no table / column / method hints.
"How's churn looking this quarter?", not "What was the churn rate from fct_subscriptions in Q1?".
- Output column names encode format / unit, not source.
churn_rate_float_0_1, not churn_rate_from_fct_subscriptions.
Steps
- Ask once β does the user have trusted source-of-truth queries (Looker, dashboards, prior benchmarks)? Transform each into a test; for metrics without a trusted query, draft new ones.
- Save flat under
tests/ (no subfolders), one YAML file per test.
- Have the user validate β prompts match their teamβs phrasing, SQL matches their definition of truth.
- Run
nao test -m <model_id> -t 10 β recap pass rate, token cost, wall-clock time as the baseline.
- Diagnose failures β read
tests/outputs/, identify the rule gap, route to write-context-rules for the smallest fix. Re-run between fixes so impact is attributable.
Template β templates/test.yaml:
name: churn_rate_last_quarter
prompt: How's churn looking this quarter?
sql: |
SELECT
SAFE_DIVIDE(churned, total) AS churn_rate_float_0_1
FROM (
SELECT
COUNTIF(churned_at IS NOT NULL) AS churned,
COUNT(*) AS total
FROM <project>.<schema>.fct_subscriptions
WHERE started_at < DATE_TRUNC(CURRENT_DATE, QUARTER)
);
# Optional:
# category: revenue | activity | conversion | churn | retention | β¦
# difficulty: easy | medium | hard
# notes: why this test matters / what failure mode it catches
audit-context
Diagnoses a nao context. Finds gaps, MECE violations, failure root causes, and bloat. Output is a short in-conversation report ending in a prioritized plan. Diagnose only β never fixes. Routes fixes to write-context-rules / add-semantic-layer / create-context-tests. Run any time: right after setup-context, mid-build, before a release, or when behavior gets surprising.
Steps β six checks in order
- Synced context β whatβs wired in (warehouse, repos, Notion, semantic layer, MCPs) vs missing. Has
nao sync run? Scope check: β€100 tables hard ceiling, β€20 ideal. Oversized scope is the biggest predictor of reliability failure β flag it explicitly.
RULES.md vs target structure β six sections from write-context-rules. Per section, mark present / missing / thin. Flag placeholders, TODO: markers, and metric entries with no source-of-truth pointer.
- Per-table coverage β every table in
databases/: is it in ## Most Used Tables? Has a ## Tables detail block? dbt context (schema.yml)? Per-table gaps: undocumented columns, calculated fields with no explanation, foreign keys with no relation.
- Data model consistency (MECE) β two tables computing the same metric differently? Asked metrics no in-scope table can answer? Duplicated columns under different names? Ambiguous columns (
amount without unit, status without enum values)?
- Test coverage β if
tests/ is empty, recommend create-context-tests. Otherwise read tests/outputs/ and categorize each failure (data model / date selection / test issue / interpretation / metric definition) with the smallest rule change per failure.
- Token optimization β files >40KB,
## Tables detail blocks past the 10-column cap, duplication between RULES.md and databases/<table>.md, in-scope tables with no mention in any test.
Output
Lead with a one-paragraph summary (sync state | scope wideness | rules quality (N/6 sections substantive) | test coverage), deep-dive only sections with findings, end with a prioritized plan that names the skill that does each fix:
## Plan
1. (easy / 5 min) β¦ β write-context-rules
2. (small / 30 min) β¦ β create-context-tests
3. (medium / 1-2 hr) β¦ β audit-context (rerun after)
4. (large / multi-session) β¦ β add-semantic-layer
add-semantic-layer
Wires a semantic layer into the agent so metric queries go through a single canonical definition. Only after nao test shows metric-reliability failures β not before. A semantic layer reduces the scope of answerable questions; the trade-off only pays off when reliability is the bottleneck. Schema gaps or date logic failures are rule-fixes, not semantic-layer fixes.
Steps
-
Pick the tool
| Option | Type | When |
|---|
| dbt MetricFlow | Metric store | Already running dbt Cloud with the Semantic Layer enabled. |
| Snowflake views / semantic | Semantic layer | Snowflake; using curated views or native semantic views. |
| nao semantic files | Semantic layer | No existing layer. Want a lightweight in-repo YAML. |
| Other (Looker, Cube, β¦) | Varies | Search the MCP registry; otherwise fall back to nao YAML. |
-
Install the matching MCP under
.claude/mcp.json (dbt-mcp / mcp-server-snowflake / Cortex MCP, etc.). Credentials via ${ENV_VAR} only β never inline.
-
Hand off to
write-context-rules to route every metric in ## Key Metrics Reference through the new layer (e.g. MRR β query via dbt MCP query_metric (semantic layer)).
-
Validate β
nao chat one of the userβs top questions, confirm the agent uses the semantic layer, then nao test and compare to the pre-semantic-layer baseline pass rate. Reliability is the only reason to do this β measure it.
Template β nao YAML option (templates/semantic.yaml):
dimensions:
- name: date
type: date
description: Calendar date. Use this for any time-based slicing.
- name: plan
type: categorical
description: Subscription plan tier.
values: [free, pro, enterprise]
metrics:
- name: mrr
definition: Monthly Recurring Revenue from active paying subscriptions, in USD.
source:
table: fct_stripe_mrr
column: mrr_amount
aggregation: SUM # SUM | COUNT | COUNT_DISTINCT | AVG | MIN | MAX
grain: month # day | week | month | quarter | year
dimensions: [date, plan, country]
filters:
- "status = 'active'"
For the dbt MetricFlow / Snowflake / Cortex paths, the metric definitions live upstream β this skill installs the MCP and routes RULES.md to it instead of writing local YAML.
deploy-context
Wires a GitHub Actions workflow that runs nao deploy on every push to main, so context changes go live automatically. Handles API key creation, GitHub Secrets setup, .naoignore hardening, and the workflow file itself.
Steps
- Prerequisites check β remote URL, repo,
project_name, confirm who can mint an API key.
- API key creation β walk through Settings -> Organization -> API Keys. The key is shown once; store it as a GitHub Secret, never in the workflow or
nao_config.yaml.
- GitHub Secrets β
NAO_URL and NAO_API_KEY, plus optional warehouse/Notion secrets if the user also wants nao sync in CI. For multi-environment teams, use GitHub Environments.
.naoignore hardening β review the always-excluded set (.git, .venv, .env, node_modules, __pycache__, repos, *.pyc) and add project-specific patterns.
- Workflow file β
.github/workflows/nao-deploy.yml with on: push: branches: [main], workflow_dispatch, concurrency serialization, and the API key passed via env: (never a CLI literal).
- Optional
nao sync step β most teams keep sync on a cron and let its commit trigger the deploy workflow.
- End-to-end verification β trigger a push, watch for the success message, confirm the remote project matches
main.
Guardrails: never commit secrets; one API key per repo/environment; no pull_request trigger (forks could exfiltrate the key); every deploy is a full replacement.
When to use which
setup-context β write-context-rules β create-context-tests β audit-context (anytime)
(first time only) (any rules change) (benchmark + extend) (diagnose, never fix)
β
βΌ
tests reveal metric
reliability gaps?
β
βΌ
add-semantic-layer
(then back to write-context-rules)
Ready to ship?
β
βΌ
deploy-context
(CI/CD to production)
| If you want to⦠| Use |
|---|
| Set up nao on a brand-new project | setup-context |
Generate or rewrite RULES.md | write-context-rules |
| Add tests for a new metric, or build the first benchmark | create-context-tests |
| Find out whatβs missing, broken, or bloated | audit-context |
| Make metric calculations consistent across questions | add-semantic-layer (only after tests show the gap) |
Refine RULES.md because the agent keeps making the same miss | write-context-rules (preceded by audit-context if unsure) |
| Auto-deploy context to production on every push | deploy-context |
Source material
The skills are distilled from the public Context Engineering content:
Read each skillβs source SKILL.md in the repo: github.com/getnao/nao/tree/main/skills.