Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getnao.io/llms.txt

Use this file to discover all available pages before exploring further.

Connect your databases to give your agent access to table schemas, descriptions, and sample rows.

Supported Databases

The current nao CLI supports these database types in nao_config.yaml:
  • Athena
  • BigQuery
  • ClickHouse
  • Databricks
  • DuckDB
  • Fabric
  • MSSQL
  • MySQL
  • Postgres
  • Redshift
  • Snowflake
  • Trino

Common Parameters

Every database entry supports these shared fields:
databases:
  - name: warehouse_prod
    type: snowflake
    include:
      - analytics.*
    exclude:
      - analytics.tmp_*
    templates:
      - columns
      - how_to_use
      - preview
  • name: Friendly connection name
  • type: One of athena, bigquery, clickhouse, databricks, duckdb, fabric, mssql, mysql, postgres, redshift, snowflake, trino
  • include: Optional glob patterns for schema.table values to include
  • exclude: Optional glob patterns for schema.table values to exclude
  • templates: Optional list of rendered context files
The templates field used to be called accessors. The old key still works (nao will read it and migrate automatically), but new configs should use templates.

Templates

These are the built-in templates nao can render per table:
  • columns
  • description
  • how_to_use (default, AI-friendly per-table usage context built from query history)
  • preview
  • profiling
  • indexes (optional, currently used for ClickHouse table/index metadata)
  • ai_summary (optional, AI-generated table summary)
If you omit templates, nao renders columns, how_to_use, preview, and profiling by default. how_to_use replaces description as the default narrative file because it carries more usable signal for the agent (metadata, partitioning, common joins, frequent queries).

how_to_use

how_to_use produces a single markdown file per table that combines:
  • Row count, column count, and partitioning information
  • The table description (when one is available)
  • Usage statistics pulled from your warehouse’s query history:
    • How often the table is queried
    • The tables it is most often joined with
    • The most frequently run queries against it
To enable query-history-based context, set query_history_days on the database (defaults to 0, meaning no history is fetched):
databases:
  - name: warehouse_prod
    type: snowflake
    query_history_days: 30
    templates:
      - columns
      - how_to_use
      - preview
Query history is supported on BigQuery, Snowflake, Databricks, Postgres, and Redshift. On warehouses without history support, how_to_use still renders the metadata + description sections.

profiling

Use the optional profiling config block to control profiling refresh behavior:
databases:
  - name: warehouse_prod
    type: snowflake
    templates:
      - columns
      - description
      - preview
      - profiling
    profiling:
      refresh_policy: <policy>
      interval_days: <days>
  • refresh_policy: Profiling refresh strategy.
  • interval_days: Optional refresh interval in days.
Profiling works across all supported warehouses for primitive columns and for complex column types (array, struct, map, json, row, tuple, variant, object, super). For array columns, nao unpacks the values before computing distinct counts and top values; other complex types are stringified before profiling.

ai_summary

ai_summary is opt-in. To use it, add ai_summary to templates and configure llm.annotation_model in nao_config.yaml. When enabled, nao renders databases/ai_summary.md.j2 and calls prompt("...") inside that template to generate LLM-based summaries during nao sync.

Database Parameters

Athena

databases:
  - name: athena_prod
    type: athena
    s3_staging_dir: s3://my-query-results/athena/
    region_name: us-east-1
    schema_name: analytics
    work_group: primary
    profile_name: default
Optional auth fields:
  • profile_name
  • aws_access_key_id
  • aws_secret_access_key
  • aws_session_token

BigQuery

databases:
  - name: bigquery_prod
    type: bigquery
    project_id: my-gcp-project
    dataset_id: analytics
    credentials_path: /path/to/service-account.json
    sso: false
    location: US
    max_query_size: 5
    partition_filters:
      events: "event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)"
You can authenticate with either:
  • credentials_path
  • credentials_json
  • sso: true for ADC / browser auth
Optional for partitioned tables:
  • partition_filters: map of table_name: SQL filter used for preview queries on tables that enforce require_partition_filter = TRUE

Limit query size

Use max_query_size (in GB) to cap how much data a single query can scan. When set, nao runs a BigQuery dry run before every SQL execution and rejects the query if the estimated bytes processed exceed the limit.
databases:
  - name: bigquery_prod
    type: bigquery
    project_id: my-gcp-project
    dataset_id: analytics
    max_query_size: 5  # GB
  • The check applies to every query the agent runs - chat, stories, evaluations, anything going through nao chat or nao test.
  • The limit is enforced before BigQuery scans the data, so blocked queries cost nothing.
  • Errors include the estimated bytes and the configured limit so the agent can suggest a tighter filter and retry.
  • Leave the field unset (or set it to 0) to disable the check.
When you create a BigQuery connection through nao init, the CLI prompts for a maximum query size as part of the interactive setup. The same field is available in the IDE under Settings -> Warehouse Connections for the cloud and IDE flows.

ClickHouse

databases:
  - name: clickhouse_prod
    type: clickhouse
    host: your-instance.clickhouse.cloud
    port: 8443
    database: analytics
    user: default
    password: "{{ env('CLICKHOUSE_PASSWORD') }}"
    secure: true
    templates:
      - columns
      - description
      - preview
      - indexes

Databricks

databases:
  - name: databricks_prod
    type: databricks
    server_hostname: adb-1234567890123456.7.azuredatabricks.net
    http_path: /sql/1.0/warehouses/abc123
    access_token: "{{ env('DATABRICKS_TOKEN') }}"
    catalog: main
    schema_name: analytics

DuckDB

databases:
  - name: duckdb_local
    type: duckdb
    path: ./jaffle_shop.duckdb

Fabric

databases:
  - name: fabric_prod
    type: fabric
    server: myworkspace.datawarehouse.fabric.microsoft.com
    database: analytics
    schema_name: dbo
    auth_method: azure_cli
Fabric supports these authentication modes:
  • sql_password (SQL username/password)
  • azure_cli (az login token)
  • azure_interactive (browser login)
  • azure_service_principal (client ID and secret)

MSSQL

databases:
  - name: mssql_prod
    type: mssql
    host: sqlserver.example.com
    port: 1433
    database: analytics
    user: "{{ env('MSSQL_USER') }}"
    password: "{{ env('MSSQL_PASSWORD') }}"
    driver: FreeTDS
    schema_name: dbo

MySQL

databases:
  - name: mysql_prod
    type: mysql
    host: mysql.example.com
    port: 3306
    database: analytics
    user: "{{ env('MYSQL_USER') }}"
    password: "{{ env('MYSQL_PASSWORD') }}"

Postgres

databases:
  - name: postgres_prod
    type: postgres
    host: postgres.example.com
    port: 5432
    database: analytics
    user: "{{ env('POSTGRES_USER') }}"
    password: "{{ env('POSTGRES_PASSWORD') }}"
    schema_name: public

Redshift

databases:
  - name: redshift_prod
    type: redshift
    host: cluster.region.redshift.amazonaws.com
    port: 5439
    database: analytics
    user: "{{ env('REDSHIFT_USER') }}"
    password: "{{ env('REDSHIFT_PASSWORD') }}"
    schema_name: public
    sslmode: require
Optional SSH tunnel:
databases:
  - name: redshift_prod
    type: redshift
    host: cluster.region.redshift.amazonaws.com
    port: 5439
    database: analytics
    user: "{{ env('REDSHIFT_USER') }}"
    password: "{{ env('REDSHIFT_PASSWORD') }}"
    ssh_tunnel:
      ssh_host: bastion.example.com
      ssh_port: 22
      ssh_username: ec2-user
      ssh_private_key_path: ~/.ssh/id_rsa
      ssh_private_key_passphrase: "{{ env('SSH_KEY_PASSPHRASE') }}"

Snowflake

databases:
  - name: snowflake_prod
    type: snowflake
    username: "{{ env('SNOWFLAKE_USER') }}"
    account_id: xy12345.us-east-1
    password: "{{ env('SNOWFLAKE_PASSWORD') }}"
    database: ANALYTICS
    warehouse: COMPUTE_WH
    schema_name: PUBLIC
Snowflake also supports:
  • private_key_path
  • passphrase
  • authenticator
For SSO, use:
databases:
  - name: snowflake_prod
    type: snowflake
    username: "{{ env('SNOWFLAKE_USER') }}"
    account_id: xy12345.us-east-1
    database: ANALYTICS
    warehouse: COMPUTE_WH
    authenticator: externalbrowser

Trino

databases:
  - name: trino_prod
    type: trino
    host: trino.example.com
    port: 8080
    catalog: iceberg
    user: "{{ env('TRINO_USER') }}"
    password: "{{ env('TRINO_PASSWORD') }}"
    schema_name: analytics

SQL dialect handling

When the agent generates SQL, nao auto-detects the warehouse dialect from the target database’s type and injects extra rules into the system prompt so the query uses the right syntax:
  • T-SQL (MSSQL, Fabric): use TOP N instead of LIMIT.
  • BigQuery: quote identifiers with backticks and use SAFE_DIVIDE(a, b) instead of a / b to avoid divide-by-zero errors.
  • MySQL: quote identifiers with backticks and use IFNULL instead of COALESCE for null handling.
PostgreSQL, Snowflake, Redshift, Databricks, and other standard SQL warehouses do not get extra dialect rules - the agent falls back to standard SQL. If a chat uses several databases of different types, the rules for each are scoped to queries targeting that database.

Synchronization

Once configured, sync your database schemas:
nao sync
This will:
  1. Connect to each database
  2. Extract schema information
  3. Render the configured templates
  4. Save the output in databases/

Context Files

After syncing, you’ll see a structure like:
databases/
└── type=bigquery/
    └── database=my-gcp-project/
        └── schema=analytics/
            └── table=dim_users/
                β”œβ”€β”€ columns.md
                β”œβ”€β”€ description.md
                β”œβ”€β”€ preview.md
                └── profiling.md
Example generated files: columns.md
# dim_users

**Dataset:** `prod_silver`

## Columns (5)

- user_id (int64)
- email (string)
- username (string)
- created_at (timestamp('UTC'))
- is_paying (boolean)
description.md
# dim_users

**Dataset:** `prod_silver`

## Table Metadata

| Property | Value |
|----------|-------|
| **Row Count** | 2,198 |
| **Column Count** | 5 |
preview.md
# dim_users - Preview

**Dataset:** `prod_silver`

## Rows (10)

- {"user_id": 101, "email": "user_101@example.com", "username": "user_101", "created_at": "2026-03-01 10:15:00+00:00", "is_paying": true}
- {"user_id": 102, "email": "user_102@example.com", "username": "user_102", "created_at": "2026-03-02 08:05:42+00:00", "is_paying": false}
profiling.md
# dim_users - Profiling

**Dataset:** `prod_silver`

**Computed at:** `2026-03-14T18:57:58.672988+00:00`

**Columns:** 12

## Column Profiles (JSONL)

- {"column": "user_id", "type": "int64", "total_count": 2198, "null_count": 0, "null_percentage": 0.0, "distinct_count": 2198}
- {"column": "is_paying", "type": "boolean", "total_count": 2198, "null_count": 0, "null_percentage": 0.0, "distinct_count": 2, "top_values": [{"value": false, "count": 698}, {"value": true, "count": 1500}]}
- {"column": "country", "type": "string", "total_count": 1598, "null_count": 600, "null_percentage": 27.29, "distinct_count": 56, "top_values": [{"value": "US", "count": 500}, {"value": "FR", "count": 300}, {"value": "ES", "count": 200}]}

Table Selection

Control which tables are synced with include and exclude. Use glob patterns on schema.table:
  • analytics.orders
  • analytics.*
  • *.orders
  • *_staging
  • test_*
  • *
If both are set, nao applies include first and then removes matches from exclude.
databases:
  - name: warehouse_prod
    type: snowflake
    include:
      - analytics.*
      - marts.fct_*
    exclude:
      - analytics.tmp_*
      - marts.fct_*_backup

Best Practices

  • Start with your core schemas only
  • Keep templates small if token usage matters
  • Use include and exclude to avoid temp, backup, and test tables

Context Engineering Principles

Learn how to find the optimal balance between comprehensiveness and efficiency