What is Monte Carlo toolkit?

Monte Carlo toolkit is a free, open-source AI agent skill. Generate SQL validation notebooks for dbt changes. Pass a GitHub PR URL or local dbt repo path.
How do I install Monte Carlo toolkit?

Install Monte Carlo toolkit with a single command: npx mdskills install monte-carlo-data/generate-validation-notebook. This downloads the skill files into your project and your AI agent picks them up automatically.
What platforms support Monte Carlo toolkit?

Monte Carlo toolkit works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Gemini Cli, Amp, Roo Code, Goose. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.
← Back to plugins
Monte Carlo toolkit

Name: Monte Carlo toolkit: AI Agent Skill
Rating: 8.2 (1 reviews)
Author: monte-carlo-data
Verified
PluginAI & Machine LearningIntermediate
Generate SQL validation notebooks for dbt changes. Pass a GitHub PR URL or local dbt repo path.
by @monte-carlo-data0Updated 3 weeks ago
Add this skill
npx mdskills install monte-carlo-data/generate-validation-notebook
Fork & Edit
Skill Advisor8.2
Comprehensive dbt validation workflow with detailed parsing, schema resolution, and multi-mode support
+Provides extensive step-by-step instructions for PR and local mode with clear fallback logic
+Integrates helper scripts for schema resolution and URL generation with proper error handling
+Generates multiple query patterns with segmentation and time-axis detection from dbt metadata
-Phase 3 query patterns appear incomplete (Pattern 5 description cuts off mid-sentence)
-Complex multi-phase workflow may be challenging for agents to execute reliably without clear checkpoints
SKILL.md
Edit in Browser
1---
2name: generate-validation-notebook
3description: Generate SQL validation notebooks for dbt changes. Pass a GitHub PR URL or local dbt repo path.
4---
5 
6> **Tip:** This skill works well with Sonnet. Run `/model sonnet` before invoking for faster generation.
7 
8Generate a SQL Notebook with validation queries for dbt changes.
9 
10**Arguments:** $ARGUMENTS
11 
12Parse the arguments:
13- **Target** (required): first argument — a GitHub PR URL or local dbt repo path
14- **MC Base URL** (optional): `--mc-base-url <URL>` — defaults to `https://getmontecarlo.com`
15- **Models** (optional): `--models <model1,model2,...>` — comma-separated list of model filenames (without `.sql` extension) to generate queries for. Only these models will be included. By default, all changed models are included up to a maximum of 10.
16 
17---
18 
19# Setup
20 
21**Prerequisites:**
22- **`gh`** (GitHub CLI) — required for PR mode. Must be authenticated (`gh auth status`).
23- **`python3`** — required for helper scripts.
24- **`pyyaml`** — install with `pip3 install pyyaml` (or `pip install pyyaml`, `uv pip install pyyaml`, etc.)
25 
26**Note:** Generated SQL uses ANSI-compatible syntax that works across Snowflake, BigQuery, Redshift, and Athena. Minor adjustments may be needed for specific warehouse quirks.
27 
28This skill includes two helper scripts in `${CLAUDE_PLUGIN_ROOT}/skills/generate-validation-notebook/scripts/`:
29 
30- **`resolve_dbt_schema.py`** - Resolves dbt model output schemas from `dbt_project.yml` routing rules and model config overrides.
31- **`generate_notebook_url.py`** - Encodes notebook YAML into a base64 import URL and opens it in the browser.
32 
33# Mode Detection
34 
35Auto-detect mode from the target argument:
36- If target looks like a URL (contains `://` or `github.com`) -> **PR mode**
37- If target is a path (`.`, `/path/to/repo`, relative path) -> **Local mode**
38 
39---
40 
41# Context
42 
43This command generates a SQL Notebook containing validation queries for dbt changes. The notebook can be opened in the MC Bridge SQL Notebook interface for interactive validation.
44 
45The output is an import URL that opens directly in the notebook interface:
46```
47<MC_BASE_URL>/notebooks/import#<base64-encoded-yaml>
48```
49 
50**Key Features:**
51- **Database Parameters**: Two `text` parameters (`prod_db` and `dev_db`) for selecting databases
52- **Schema Inference**: Automatically infers schema per model from `dbt_project.yml` and model configs
53- **Single-table queries**: Basic validation queries using `{{prod_db}}.<SCHEMA>.<TABLE>`
54- **Comparison queries**: Before/after queries comparing `{{prod_db}}` vs `{{dev_db}}`
55- **Flexible usage**: Users can set both parameters to the same database for single-database analysis
56 
57# Notebook YAML Spec Reference
58 
59Key structure:
60```yaml
61version: 1
62metadata:
63  id: string           # kebab-case + random suffix
64  name: string         # display name
65  created_at: string   # ISO 8601
66  updated_at: string   # ISO 8601
67default_context:       # optional database/schema context
68  database: string
69  schema: string
70cells:
71  - id: string
72    type: sql | markdown | parameter
73    content: string    # SQL, markdown, or parameter config (JSON)
74    display_type: table | bar | timeseries
75```
76 
77## Parameter Cell Spec
78 
79Parameter cells allow defining variables referenced in SQL via `{{param_name}}` syntax:
80 
81```yaml
82- id: param-prod-db
83  type: parameter
84  content:
85    name: prod_db              # variable name
86    config:
87      type: text                   # free-form text input
88      default_value: "ANALYTICS"
89      placeholder: "Prod database"
90  display_type: table
91```
92 
93Parameter types:
94- `text`: Free-form text input (used for database names)
95- `schema_selector`: Two dropdowns (database -> schema), value stored as `DATABASE.SCHEMA`
96- `dropdown`: Select from predefined options
97 
98# Task
99 
100Generate a SQL Notebook with validation queries based on the mode and target.
101 
102## Phase 1: Get Changed Files
103 
104The approach differs based on mode:
105 
106### If PR mode (GitHub PR):
107 
1081. Extract the PR number and repo from the target URL.
109   - Example: `https://github.com/monte-carlo-data/dbt/pull/3386` -> owner=`monte-carlo-data`, repo=`dbt`, PR=`3386`
110 
1112. Fetch PR metadata using `gh`:
112```bash
113gh pr view <PR#> --repo <owner>/<repo> --json number,title,author,mergedAt,headRefOid
114```
115 
1163. Fetch the list of changed files:
117```bash
118gh pr view <PR#> --repo <owner>/<repo> --json files --jq '.files[].path'
119```
120 
1214. Fetch the diff:
122```bash
123gh pr diff <PR#> --repo <owner>/<repo>
124```
125 
1265. Filter the changed files list to only `.sql` files under `models/` or `snapshots/` directories (at any depth — e.g., `models/`, `analytics/models/`, `dbt/models/`). These are the dbt models to analyze. If no model SQL files were changed, report that and stop.
127 
1286. For each changed model file, fetch the full file content at the head SHA:
129```bash
130gh api repos/<owner>/<repo>/contents/<file_path>?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"
131```
132 
1337. **Fetch dbt_project.yml** for schema resolution. Detect the dbt project root by looking at the changed file paths — find the common parent directory that contains `dbt_project.yml`. Try these paths in order until one succeeds:
134```bash
135gh api repos/<owner>/<repo>/contents/<dbt_root>/dbt_project.yml?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"
136```
137Common `<dbt_root>` locations: `analytics`, `.` (repo root), `dbt`, `transform`. Try each until found.
138 
139Save `dbt_project.yml` to `/tmp/validation_notebook_working/<PR#>/dbt_project.yml`.
140 
141### If Local mode (Local Directory):
142 
1431. Change to the target directory.
144 
1452. Get current branch info:
146```bash
147git rev-parse --abbrev-ref HEAD
148```
149 
1503. Detect base branch - try `main`, `master`, `develop` in order, or use upstream tracking branch.
151 
1524. Get the list of changed SQL files compared to base branch:
153```bash
154git diff --name-only <base_branch>...HEAD -- '*.sql'
155```
156 
1575. Filter to only `.sql` files under `models/` or `snapshots/` directories (at any depth — e.g., `models/`, `analytics/models/`, `dbt/models/`). If no model SQL files were changed, report that and stop.
158 
1596. Get the diff for each changed file:
160```bash
161git diff <base_branch>...HEAD -- <file_path>
162```
163 
1647. Read model files directly from the filesystem.
165 
1668. **Find dbt_project.yml**:
167```bash
168find . -name "dbt_project.yml" -type f | head -1
169```
170 
1719. For notebook metadata in local mode, use:
172   - **ID**: `local-<branch-name>-<timestamp>`
173   - **Title**: `Local: <branch-name>`
174   - **Author**: Output of `git config user.name`
175   - **Merged**: "N/A (local)"
176 
177### Model Selection (applies to both modes)
178 
179After filtering to `.sql` files under `models/` or `snapshots/`:
180 
1811. **If `--models` was specified:** Filter the changed files list to only include models whose filename (without `.sql` extension, case-insensitive) matches one of the specified model names. If any specified model is not found in the changed files, warn the user but continue with the models that were found. If none match, report that and stop.
182 
1832. **Model cap:** If more than 10 models remain after filtering, select the first 10 (by file path order) and warn the user:
184   ```
185   ⚠️ <total_count> models changed — generating validation queries for the first 10 only.
186   To generate for specific models, re-run with: --models <model1,model2,...>
187   Skipped models: <list of skipped model filenames>
188   ```
189 
190## Phase 2: Parse Changed Models
191 
192For EACH changed dbt model `.sql` file, parse and extract:
193 
194### 2a. Model Metadata
195 
196**Output table name** -- Derive from file name:
197- `<any_path>/models/<subdir>/<model_name>.sql` -> table is `<MODEL_NAME>` (uppercase, taken from the filename)
198 
199**Output schema** -- Use the schema resolution script:
200 
2011. **Setup**: Save `dbt_project.yml` and model files to `/tmp/validation_notebook_working/<id>/` preserving paths:
202   ```
203   /tmp/validation_notebook_working/<id>/
204   +-- dbt_project.yml
205   +-- models/
206       +-- <path>/<model>.sql
207   ```
208 
2092. **Run the script** for each model:
210   ```bash
211   python3 ${CLAUDE_PLUGIN_ROOT}/skills/generate-validation-notebook/scripts/resolve_dbt_schema.py /tmp/validation_notebook_working/<id>/dbt_project.yml /tmp/validation_notebook_working/<id>/models/<path>/<model>.sql
212   ```
213 
2143. **Error handling**: If the script fails, **STOP immediately** and report the error. Do NOT proceed with notebook generation if schema resolution fails.
215 
2164. **Output**: The script prints the resolved schema (e.g., `PROD`, `PROD_STAGE`, `PROD_LINEAGE`)
217 
218**Note**: Do NOT manually parse dbt_project.yml or model configs for schema -- always use the script. It handles model config overrides, dbt_project.yml routing rules, PROD_ prefix for custom schemas, and defaults to `PROD`.
219 
220**Config block** -- Look for `{{ config(...) }}` and extract:
221- `materialized` -- 'table', 'view', 'incremental', 'ephemeral'
222- `unique_key` -- the dedup key (may be a string or list)
223- `cluster_by` -- clustering fields (may contain the time axis)
224 
225**Core segmentation fields** -- Scan the entire model SQL for fields likely to be business keys:
226- Fields named `*_id` (e.g., `account_id`, `resource_id`, `monitor_id`) that appear in JOIN ON, GROUP BY, PARTITION BY, or `unique_key`
227- Deduplicate and rank by frequency. Take the top 3.
228 
229**Time axis field** -- Detect the model's time dimension (in priority order):
2301. `is_incremental()` block: field used in the WHERE comparison
2312. `cluster_by` config: timestamp/date fields
2323. Field name conventions: `ingest_ts`, `created_time`, `date_part`, `timestamp`, `run_start_time`, `export_ts`, `event_created_time`
2334. ORDER BY DESC in QUALIFY/ROW_NUMBER
234 
235If no time axis is found, skip time-axis queries for this model.
236 
237### 2b. Diff Analysis
238 
239Parse the diff hunks for this file. Classify each changed line:
240 
241- **Changed fields** -- Lines added/modified in SELECT clauses or CTE definitions. Extract the output column name.
242- **Changed filters** -- Lines added/modified in WHERE clauses.
243- **Changed joins** -- Lines added/modified in JOIN ON conditions.
244- **Changed unique_key** -- If `unique_key` in config was modified, note both old and new values.
245- **New columns** -- Columns in "after" SELECT that don't appear in "before" (pure additions).
246 
247### 2c. Model Classification
248 
249Classify each model as **new** or **modified** based on the diff:
250- If the diff for this file contains `new file mode` → classify as **new**
251- Otherwise → classify as **modified**
252 
253This classification determines which query patterns are generated in Phase 3.
254 
255**Note:** For **new models**, Phase 2b diff analysis is skipped (there is no "before" to compare against). Phase 2a metadata extraction still applies.
256 
257## Phase 3: Generate Validation Queries
258 
259For each changed model, generate the applicable queries based on its classification (new vs modified).
260 
261**CRITICAL: Parameter Placeholder Syntax**
262 
263Use **double curly braces** `{{...}}` for parameter placeholders. Do NOT use `${...}` or any other syntax.
264 
265Correct: `{{prod_db}}.PROD.AGENT_RUNS`
266Wrong: `${prod_db}.PROD.AGENT_RUNS`
267 
268**Table Reference Format:**
269- Use `{{prod_db}}.<SCHEMA>.<TABLE_NAME>` for prod queries
270- Use `{{dev_db}}.<SCHEMA>.<TABLE_NAME>` for dev queries
271- `<SCHEMA>` is **hardcoded per-model** using the output from the schema resolution script
272 
273---
274 
275### Query Patterns for NEW Models
276 
277For new models, all queries target `{{dev_db}}` only. No comparison queries are generated since no prod table exists.
278 
279#### Pattern 7-new: Total Row Count
280**Trigger:** Always.
281 
282```sql
283SELECT COUNT(*) AS total_rows
284FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
285```
286 
287#### Pattern 9: Sample Data Preview
288**Trigger:** Always.
289 
290```sql
291SELECT *
292FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
293LIMIT 20
294```
295 
296#### Pattern 2-new: Core Segmentation Counts
297**Trigger:** Always.
298 
299```sql
300SELECT
301    <segmentation_field>,
302    COUNT(*) AS row_count
303FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
304GROUP BY <segmentation_field>
305ORDER BY row_count DESC
306LIMIT 100
307```
308 
309#### Pattern 5: Uniqueness Check
310**Trigger:** Always for new models (verify unique_key constraint from the start).
311 
312```sql
313SELECT
314    COUNT(*) AS total_rows,
315    COUNT(DISTINCT <key_fields>) AS distinct_keys,
316    COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
317FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
318```
319 
320```sql
321SELECT <key_fields>, COUNT(*) AS n
322FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
323GROUP BY <key_fields>
324HAVING COUNT(*) > 1
325ORDER BY n DESC
326LIMIT 100
327```
328 
329#### Pattern 6-new: NULL Rate Check (all columns)
330**Trigger:** Always. Checks all output columns since everything is new.
331 
332```sql
333SELECT
334    COUNT(*) AS total_rows,
335    SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) AS <col1>_null_count,
336    ROUND(100.0 * SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col1>_null_pct,
337    SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) AS <col2>_null_count,
338    ROUND(100.0 * SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col2>_null_pct
339    -- repeat for each output column
340FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
341```
342 
343#### Pattern 8: Time-Axis Continuity
344**Trigger:** Model is `materialized='incremental'` OR a time axis field was identified.
345 
346```sql
347SELECT
348    CAST(<time_axis> AS DATE) AS day,
349    COUNT(*) AS row_count
350FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
351WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
352GROUP BY day
353ORDER BY day DESC
354LIMIT 30
355```
356 
357---
358 
359### Query Patterns for MODIFIED Models
360 
361For modified models, single-table queries use `{{prod_db}}` and comparison queries use both.
362 
363#### Pattern 7: Total Row Count
364**Trigger:** Always.
365 
366```sql
367SELECT COUNT(*) AS total_rows
368FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
369```
370 
371#### Pattern 9: Sample Data Preview
372**Trigger:** Always.
373 
374```sql
375SELECT *
376FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
377LIMIT 20
378```
379 
380#### Pattern 2: Core Segmentation Counts
381**Trigger:** Always.
382 
383```sql
384SELECT
385    <segmentation_field>,
386    COUNT(*) AS row_count
387FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
388GROUP BY <segmentation_field>
389ORDER BY row_count DESC
390LIMIT 100
391```
392 
393#### Pattern 1: Changed Field Distribution
394**Trigger:** Changed fields found in Phase 2b. **Exclude added columns** (from "New columns" in Phase 2b) — only include fields that exist in prod.
395 
396```sql
397SELECT
398    <changed_field>,
399    COUNT(*) AS row_count,
400    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) AS pct
401FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
402GROUP BY <changed_field>
403ORDER BY row_count DESC
404LIMIT 100
405```
406 
407#### Pattern 5: Uniqueness Check
408**Trigger:** JOIN condition changed, `unique_key` changed, or model is incremental.
409 
410```sql
411SELECT
412    COUNT(*) AS total_rows,
413    COUNT(DISTINCT <key_fields>) AS distinct_keys,
414    COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
415FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
416```
417 
418```sql
419SELECT <key_fields>, COUNT(*) AS n
420FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
421GROUP BY <key_fields>
422HAVING COUNT(*) > 1
423ORDER BY n DESC
424LIMIT 100
425```
426 
427#### Pattern 6: NULL Rate Check
428**Trigger:** New column added, or column wrapped in COALESCE/NULLIF.
429 
430**Important:** Added columns (from "New columns" in Phase 2b) do NOT exist in prod yet. For added columns, query `{{dev_db}}` only. For modified columns (COALESCE/NULLIF changes), compare both databases.
431 
432**For added columns** (dev only):
433```sql
434SELECT
435    COUNT(*) AS total_rows,
436    SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
437    ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
438FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
439```
440 
441**For modified columns** (prod vs dev):
442```sql
443SELECT
444    'prod' AS source,
445    COUNT(*) AS total_rows,
446    SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
447    ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
448FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
449UNION ALL
450SELECT
451    'dev' AS source,
452    COUNT(*) AS total_rows,
453    SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
454    ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
455FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
456```
457 
458#### Pattern 8: Time-Axis Continuity
459**Trigger:** Model is `materialized='incremental'` OR a time axis field was identified.
460 
461```sql
462SELECT
463    CAST(<time_axis> AS DATE) AS day,
464    COUNT(*) AS row_count
465FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
466WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
467GROUP BY day
468ORDER BY day DESC
469LIMIT 30
470```
471 
472#### Pattern 3: Before/After Comparison
473**Trigger:** Always (for changed fields + top segmentation field). **Modified models only.**
474 
475**Important:** Exclude added columns (from "New columns" in Phase 2b) from `<group_fields>`. Only use fields that exist in BOTH prod and dev. Added columns don't exist in prod and will cause query errors.
476 
477```sql
478WITH prod AS (
479    SELECT <group_fields>, COUNT(*) AS cnt
480    FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
481    GROUP BY <group_fields>
482),
483dev AS (
484    SELECT <group_fields>, COUNT(*) AS cnt
485    FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
486    GROUP BY <group_fields>
487)
488SELECT
489    COALESCE(b.<field>, d.<field>) AS <field>,
490    COALESCE(b.cnt, 0) AS cnt_prod,
491    COALESCE(d.cnt, 0) AS cnt_dev,
492    COALESCE(d.cnt, 0) - COALESCE(b.cnt, 0) AS diff
493FROM prod b
494FULL OUTER JOIN dev d ON b.<field> = d.<field>
495ORDER BY ABS(diff) DESC
496LIMIT 100
497```
498 
499#### Pattern 7b: Row Count Comparison
500**Trigger:** Always. **Modified models only.**
501 
502```sql
503SELECT 'prod' AS source, COUNT(*) AS row_count FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
504UNION ALL
505SELECT 'dev' AS source, COUNT(*) AS row_count FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
506```
507 
508## Phase 4: Build Notebook YAML
509 
510### 4a. Metadata
511```yaml
512version: 1
513metadata:
514  id: validation-pr-<PR_NUMBER>-<random_suffix>
515  name: "Validation: PR #<PR_NUMBER> - <PR_TITLE_TRUNCATED>"
516  created_at: "<current_iso_timestamp>"
517  updated_at: "<current_iso_timestamp>"
518```
519 
520### 4b. Parameter Cells
521 
522**Only include `prod_db` if there are modified models.** If all models are new, only include `dev_db`.
523 
524```yaml
525# Include ONLY if there are modified models:
526- id: param-prod-db
527  type: parameter
528  content:
529    name: prod_db
530    config:
531      type: text
532      default_value: "ANALYTICS"
533      placeholder: "Prod database (e.g., ANALYTICS)"
534  display_type: table
535 
536# Always include:
537- id: param-dev-db
538  type: parameter
539  content:
540    name: dev_db
541    config:
542      type: text
543      default_value: "PERSONAL_<USER>"
544      placeholder: "Dev database (e.g., PERSONAL_JSMITH)"
545  display_type: table
546```
547 
548### 4c. Markdown Summary Cell
549```yaml
550- id: cell-summary
551  type: markdown
552  content: |
553    # Validation Queries for <PR or Local Branch>
554    ## Summary
555    - **Title:** <title>
556    - **Author:** <author>
557    - **Source:** <PR URL or "Local branch: <branch>">
558    - **Status:** <merge_timestamp or "Not yet merged" or "N/A (local)">
559    ## Changes
560    <brief description based on diff analysis>
561    ## Changed Models
562    - `<SCHEMA>.<TABLE_NAME>` (from `<file_path>`)
563    ## How to Use
564    1. Select your Snowflake connector above
565    2. Set **dev_db** to your dev database (e.g., `PERSONAL_JSMITH`)
566    3. If modified models are present, set **prod_db** to your prod database (e.g., `ANALYTICS`)
567    4. Run single-table queries first, then comparison queries
568  display_type: table
569```
570 
571### 4d. SQL Cell Format
572```yaml
573- id: cell-<pattern>-<model>-<index>
574  type: sql
575  content: |
576    /*
577    ========================================
578    <Pattern Name (human-readable, e.g. "Total Row Count" — do NOT include pattern numbers like "Pattern 7:")>
579    ========================================
580    Model: <SCHEMA>.<TABLE_NAME>
581    Triggered by: <why this pattern was generated>
582    What to look for: <interpretation guidance>
583    ----------------------------------------
584    */
585    <actual_sql_query>
586  display_type: table
587```
588 
589### 4e. Cell Organization
590 
591Cells are ordered consistently for both model types, following this sequence:
592 
593**New models:**
5941. Summary markdown cell (note that model is new)
5952. Parameter cells (dev_db only — no prod_db if all models are new)
5963. Total row count (Pattern 7-new)
5974. Sample data preview (Pattern 9)
5985. Core segmentation counts (Pattern 2-new)
5996. Uniqueness check (Pattern 5), NULL rate check (Pattern 6-new), Time-axis continuity (Pattern 8)
600 
601**Modified models:**
6021. Summary markdown cell
6032. Parameter cells (prod_db, dev_db)
6043. Total row count (Pattern 7)
6054. Sample data preview (Pattern 9)
6065. Core segmentation counts (Pattern 2)
6076. Changed field distribution (Pattern 1)
6087. Uniqueness check (Pattern 5), NULL rate check (Pattern 6), Time-axis continuity (Pattern 8)
6098. Before/after comparisons (Pattern 3), Row count comparison (Pattern 7b)
610 
611## Phase 5: Generate Import URL
612 
6131. Write notebook YAML to `/tmp/validation_notebook_working/<id>/notebook.yaml`
6142. Run the URL generation script:
615```bash
616python3 ${CLAUDE_PLUGIN_ROOT}/skills/generate-validation-notebook/scripts/generate_notebook_url.py /tmp/validation_notebook_working/<id>/notebook.yaml --mc-base-url <MC_BASE_URL>
617```
6183. The script validates both YAML syntax and notebook schema (required fields on metadata and cells). If validation fails, read the error messages carefully, fix the YAML to match the spec in Phase 4, and re-run.
619 
620## Phase 6: Output
621 
622Present:
623```markdown
624# Validation Notebook Generated
625## Summary
626- **Source:** PR #<number> - <title> OR Local: <branch>
627- **Author:** <author>
628- **Changed Models:** <count> models (of <total_count> changed)
629- **Generated Queries:** <count> queries
630 
631> ⚠️ If models were capped: "Only the first 10 of <total_count> changed models were included. Re-run with `--models` to select specific models."
632 
633## Notebook Opened
634The notebook has been opened directly in your browser.
635Select your Snowflake connector in the notebook interface to begin running queries.
636*Make sure MC Bridge is running. Let me know if you want tips on how to install this locally*
637```
638 
639## Important Guidelines
640 
6411. **Do NOT execute queries** -- only generate the notebook
6422. **Keep SQL readable** -- proper formatting and meaningful aliases
6433. **Include LIMIT 100** on queries that could return many rows
6444. **Use double curly braces** -- `{{prod_db}}` NOT `${prod_db}`
6455. **Use correct table format** -- `{{prod_db}}.<SCHEMA>.<TABLE>` and `{{dev_db}}.<SCHEMA>.<TABLE>`
6466. **Always use the schema resolution script** -- do NOT manually parse dbt_project.yml
6477. **Schema is NOT a parameter** -- only `prod_db` and `dev_db` are parameters
6488. **Skip ephemeral models** -- they have no physical table
6499. **Truncate notebook name** -- keep under 50 chars
65010. **Generate unique cell IDs** -- use pattern like `cell-p3-model-1`
65111. **YAML multiline content** -- use `|` block scalar for SQL with comments
65212. **ASCII-only YAML** -- the script sanitizes and validates before encoding
653 
654## Query Pattern Reference
655 
656| Pattern | Name | Trigger | Model Type | Database | Order |
657|---------|------|---------|------------|----------|-------|
658| 7 / 7-new | Total Row Count | Always | Both | `{{prod_db}}` (modified) / `{{dev_db}}` (new) | 1 |
659| 9 | Sample Data Preview | Always | Both | `{{prod_db}}` (modified) / `{{dev_db}}` (new) | 2 |
660| 2 / 2-new | Core Segmentation Counts | Always | Both | `{{prod_db}}` (modified) / `{{dev_db}}` (new) | 3 |
661| 1 | Changed Field Distribution | Column modified in diff (not added) | Modified only | `{{prod_db}}` | 4 |
662| 5 | Uniqueness Check | JOIN/unique_key changed (modified) / Always (new) | Both | `{{dev_db}}` | 5 |
663| 6 / 6-new | NULL Rate Check | New column or COALESCE (modified) / Always (new) | Both | Added col: `{{dev_db}}` only; COALESCE: Both (modified) / `{{dev_db}}` (new) | 5 |
664| 8 | Time-Axis Continuity | Incremental or time field | Both | `{{prod_db}}` (modified) / `{{dev_db}}` (new) | 5 |
665| 3 | Before/After Comparison | Changed fields (not added) | Modified only | Both | 6 |
666| 7b | Row Count Comparison | Always | Modified only | Both | 6 |
667 
668## MC Bridge Setup Help
669 
670If the user asks how to install or set up MC Bridge, fetch the README from the mc-bridge repo and show the relevant quick start / setup instructions:
671 
672```bash
673gh api repos/monte-carlo-data/mc-bridge/readme --jq '.content' | base64 --decode
674```
675 
676Focus on: how to install, configure connections, and run MC Bridge. Don't dump the entire README — extract just the setup-relevant sections.
677
Full transparency — inspect the skill content before installing.