Skip to main content
Connect your databases to give your agent access to table schemas, descriptions, and sample rows.

Supported Databases

The current nao CLI supports these database types in nao_config.yaml:
  • Athena
  • BigQuery
  • ClickHouse
  • Databricks
  • DuckDB
  • Fabric
  • MSSQL
  • Postgres
  • Redshift
  • Snowflake
  • Trino

Common Parameters

Every database entry supports these shared fields:
databases:
  - name: warehouse_prod
    type: snowflake
    include:
      - analytics.*
    exclude:
      - analytics.tmp_*
    accessors:
      - columns
      - description
      - preview
  • name: Friendly connection name
  • type: One of athena, bigquery, clickhouse, databricks, duckdb, fabric, mssql, postgres, redshift, snowflake, trino
  • include: Optional glob patterns for schema.table values to include
  • exclude: Optional glob patterns for schema.table values to exclude
  • accessors: Optional list of rendered context files

Accessors

These are the actual built-in accessors:
  • columns
  • description
  • preview
  • profiling
  • indexes (optional, currently used for ClickHouse table/index metadata)
  • ai_summary (optional, AI-generated table summary)
If you omit accessors, nao renders all four standard accessors (columns, description, preview, and profiling) by default. Use the optional profiling config block to control profiling refresh behavior:
databases:
  - name: warehouse_prod
    type: snowflake
    accessors:
      - columns
      - description
      - preview
      - profiling
    profiling:
      refresh_policy: <policy>
      interval_days: <days>
  • refresh_policy: Profiling refresh strategy.
  • interval_days: Optional refresh interval in days.
ai_summary is opt-in. To use it, add ai_summary to accessors and configure llm.annotation_model in nao_config.yaml. When enabled, nao can render databases/ai_summary.md.j2 and call prompt("...") inside that template to generate LLM-based summaries during nao sync.

Database Parameters

Athena

databases:
  - name: athena_prod
    type: athena
    s3_staging_dir: s3://my-query-results/athena/
    region_name: us-east-1
    schema_name: analytics
    work_group: primary
    profile_name: default
Optional auth fields:
  • profile_name
  • aws_access_key_id
  • aws_secret_access_key
  • aws_session_token

BigQuery

databases:
  - name: bigquery_prod
    type: bigquery
    project_id: my-gcp-project
    dataset_id: analytics
    credentials_path: /path/to/service-account.json
    sso: false
    location: US
    partition_filters:
      events: "event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)"
You can authenticate with either:
  • credentials_path
  • credentials_json
  • sso: true for ADC / browser auth
Optional for partitioned tables:
  • partition_filters: map of table_name: SQL filter used for preview queries on tables that enforce require_partition_filter = TRUE

ClickHouse

databases:
  - name: clickhouse_prod
    type: clickhouse
    host: your-instance.clickhouse.cloud
    port: 8443
    database: analytics
    user: default
    password: "{{ env('CLICKHOUSE_PASSWORD') }}"
    secure: true
    accessors:
      - columns
      - description
      - preview
      - indexes

Databricks

databases:
  - name: databricks_prod
    type: databricks
    server_hostname: adb-1234567890123456.7.azuredatabricks.net
    http_path: /sql/1.0/warehouses/abc123
    access_token: "{{ env('DATABRICKS_TOKEN') }}"
    catalog: main
    schema_name: analytics

DuckDB

databases:
  - name: duckdb_local
    type: duckdb
    path: ./jaffle_shop.duckdb

Fabric

databases:
  - name: fabric_prod
    type: fabric
    server: myworkspace.datawarehouse.fabric.microsoft.com
    database: analytics
    schema_name: dbo
    auth_method: azure_cli
Fabric supports these authentication modes:
  • sql_password (SQL username/password)
  • azure_cli (az login token)
  • azure_interactive (browser login)
  • azure_service_principal (client ID and secret)

MSSQL

databases:
  - name: mssql_prod
    type: mssql
    host: sqlserver.example.com
    port: 1433
    database: analytics
    user: "{{ env('MSSQL_USER') }}"
    password: "{{ env('MSSQL_PASSWORD') }}"
    driver: FreeTDS
    schema_name: dbo

Postgres

databases:
  - name: postgres_prod
    type: postgres
    host: postgres.example.com
    port: 5432
    database: analytics
    user: "{{ env('POSTGRES_USER') }}"
    password: "{{ env('POSTGRES_PASSWORD') }}"
    schema_name: public

Redshift

databases:
  - name: redshift_prod
    type: redshift
    host: cluster.region.redshift.amazonaws.com
    port: 5439
    database: analytics
    user: "{{ env('REDSHIFT_USER') }}"
    password: "{{ env('REDSHIFT_PASSWORD') }}"
    schema_name: public
    sslmode: require
Optional SSH tunnel:
databases:
  - name: redshift_prod
    type: redshift
    host: cluster.region.redshift.amazonaws.com
    port: 5439
    database: analytics
    user: "{{ env('REDSHIFT_USER') }}"
    password: "{{ env('REDSHIFT_PASSWORD') }}"
    ssh_tunnel:
      ssh_host: bastion.example.com
      ssh_port: 22
      ssh_username: ec2-user
      ssh_private_key_path: ~/.ssh/id_rsa
      ssh_private_key_passphrase: "{{ env('SSH_KEY_PASSPHRASE') }}"

Snowflake

databases:
  - name: snowflake_prod
    type: snowflake
    username: "{{ env('SNOWFLAKE_USER') }}"
    account_id: xy12345.us-east-1
    password: "{{ env('SNOWFLAKE_PASSWORD') }}"
    database: ANALYTICS
    warehouse: COMPUTE_WH
    schema_name: PUBLIC
Snowflake also supports:
  • private_key_path
  • passphrase
  • authenticator
For SSO, use:
databases:
  - name: snowflake_prod
    type: snowflake
    username: "{{ env('SNOWFLAKE_USER') }}"
    account_id: xy12345.us-east-1
    database: ANALYTICS
    warehouse: COMPUTE_WH
    authenticator: externalbrowser

Trino

databases:
  - name: trino_prod
    type: trino
    host: trino.example.com
    port: 8080
    catalog: iceberg
    user: "{{ env('TRINO_USER') }}"
    password: "{{ env('TRINO_PASSWORD') }}"
    schema_name: analytics

Synchronization

Once configured, sync your database schemas:
nao sync
This will:
  1. Connect to each database
  2. Extract schema information
  3. Render the configured accessors
  4. Save the output in databases/

Context Files

After syncing, you’ll see a structure like:
databases/
└── type=bigquery/
    └── database=my-gcp-project/
        └── schema=analytics/
            └── table=dim_users/
                ├── columns.md
                ├── description.md
                ├── preview.md
                └── profiling.md
Example generated files: columns.md
# dim_users

**Dataset:** `prod_silver`

## Columns (5)

- user_id (int64)
- email (string)
- username (string)
- created_at (timestamp('UTC'))
- is_paying (boolean)
description.md
# dim_users

**Dataset:** `prod_silver`

## Table Metadata

| Property | Value |
|----------|-------|
| **Row Count** | 2,198 |
| **Column Count** | 5 |
preview.md
# dim_users - Preview

**Dataset:** `prod_silver`

## Rows (10)

- {"user_id": 101, "email": "user_101@example.com", "username": "user_101", "created_at": "2026-03-01 10:15:00+00:00", "is_paying": true}
- {"user_id": 102, "email": "user_102@example.com", "username": "user_102", "created_at": "2026-03-02 08:05:42+00:00", "is_paying": false}
profiling.md
# dim_users - Profiling

**Dataset:** `prod_silver`

**Computed at:** `2026-03-14T18:57:58.672988+00:00`

**Columns:** 12

## Column Profiles (JSONL)

- {"column": "user_id", "type": "int64", "total_count": 2198, "null_count": 0, "null_percentage": 0.0, "distinct_count": 2198}
- {"column": "is_paying", "type": "boolean", "total_count": 2198, "null_count": 0, "null_percentage": 0.0, "distinct_count": 2, "top_values": [{"value": false, "count": 698}, {"value": true, "count": 1500}]}
- {"column": "country", "type": "string", "total_count": 1598, "null_count": 600, "null_percentage": 27.29, "distinct_count": 56, "top_values": [{"value": "US", "count": 500}, {"value": "FR", "count": 300}, {"value": "ES", "count": 200}]}

Table Selection

Control which tables are synced with include and exclude. Use glob patterns on schema.table:
  • analytics.orders
  • analytics.*
  • *.orders
  • *_staging
  • test_*
  • *
If both are set, nao applies include first and then removes matches from exclude.
databases:
  - name: warehouse_prod
    type: snowflake
    include:
      - analytics.*
      - marts.fct_*
    exclude:
      - analytics.tmp_*
      - marts.fct_*_backup

Best Practices

  • Start with your core schemas only
  • Keep accessors small if token usage matters
  • Use include and exclude to avoid temp, backup, and test tables

Context Engineering Principles

Learn how to find the optimal balance between comprehensiveness and efficiency