Pre-ingestion verification for epistemic quality in RAG systems. Ensures documents are properly qualified before entering knowledge bases. Produces CGD (Clarity-Gated Documents) and validates SOT (Source of Truth) files.
Add this skill
npx mdskills install frmoretto/clarity-gateGeneric placeholder with no actionable instructions or verification workflow details
⚠️ LATEST: Version 2.1 released (2026-01-27). RFC-001 applied: claim status semantics, bundled scripts. See CHANGELOG.
✅ This README passed Clarity Gate verification (2026-01-13, adversarial mode, Claude Opus 4.5)
Open-source pre-ingestion verification for epistemic quality in RAG systems.
"Detection finds what is; enforcement ensures what should be. In practice: find the missing uncertainty markers before they become confident hallucinations."
If you feed a well-aligned model a document that states "Revenue will reach $50M by Q4" as fact (when it's actually a projection), the model will confidently report this as fact.
The model isn't hallucinating. It's faithfully representing what it was told.
The failure happened before the model saw the input.
| Document Says | Accuracy Check | Epistemic Check |
|---|---|---|
| "Revenue will be $50M" (unmarked projection) | ✅ PASS | ❌ FAIL — projection stated as fact |
| "Our approach outperforms X" (no evidence) | ✅ PASS | ❌ FAIL — ungrounded assertion |
| "Users prefer feature Y" (no methodology) | ✅ PASS | ❌ FAIL — missing epistemic basis |
Accuracy verification asks: "Does this match the source?"
Epistemic verification asks: "Is this claim properly qualified?"
Both matter. Accuracy verification has mature open-source tools. Epistemic verification has detection systems (UnScientify, HedgeHunter, BioScope), but at the date of 2.0 release (January 13th, 2026), I found no open-source pre-ingestion epistemic enforcement system (methodology: deep research conducted via multiple LLMs). Corrections welcome.
Clarity Gate is a proposal for that layer.
Clarity Gate is an open-source pre-ingestion verification system for epistemic quality.
| Component | Status |
|---|---|
| Pre-ingestion gate pattern | ✅ Proven (Adlib, pharma QMS) |
| Epistemic detection | ✅ Proven (UnScientify, HedgeHunter) |
| Pre-ingestion epistemic enforcement | ❌ Gap (to my knowledge) |
| Open-source accessibility | ❌ Gap |
| Dimension | Enterprise (Adlib) | Clarity Gate |
|---|---|---|
| License | Proprietary | Open source (CC BY 4.0) |
| Focus | Accuracy, compliance | Epistemic quality |
| Target | Fortune 500 | Founders, researchers, small teams |
| Cost | Enterprise pricing | Free |
Most valuable when:
| Aspect | Semantica / LlamaIndex | Clarity Gate |
|---|---|---|
| Stage | Post-extraction | Pre-ingestion |
| Input | Structured entities | Raw documents |
| Problem | "Which value is correct?" | "Is this claim properly qualified?" |
| Output | Resolved knowledge graph | Annotated document (CGD) |
| Conflict | Multi-source disagreement | Unmarked projections/assumptions |
They're complementary: Use Clarity Gate before Semantica/LlamaIndex.
dist/clarity-gate.skill.skill fileSame as Option 1 — Claude Desktop uses the same skill format as claude.ai.
Clone the repo — Claude Code auto-detects skills in .claude/skills/:
git clone https://github.com/frmoretto/clarity-gate
cd clarity-gate
# Claude Code will automatically detect .claude/skills/clarity-gate/SKILL.md
Or copy .claude/skills/clarity-gate/ to your project's .claude/skills/ directory.
Ask Claude: "Run clarity gate on this document"
Add skills/clarity-gate/SKILL.md to project knowledge. Claude will search it when needed, though Skills provide better integration.
Copy the canonical skill to the appropriate directory:
| Platform | Location |
|---|---|
| OpenAI Codex | .codex/skills/clarity-gate/SKILL.md |
| GitHub Copilot | .github/skills/clarity-gate/SKILL.md |
Use skills/clarity-gate/SKILL.md (agentskills.io format).
Use the 9-point verification as a manual review process.
For Cursor, Windsurf, or other AI tools, extract the 9 verification points into your .cursorrules. The methodology is tool-agnostic—only SKILL.md is Claude-optimized.
| Platform | Skill Location | Frontmatter Format |
|---|---|---|
| Claude.ai / Claude Desktop | .claude/skills/clarity-gate/ | Minimal (name, description only) |
| Claude Code | .claude/skills/clarity-gate/ | Minimal |
| OpenAI Codex | .codex/skills/clarity-gate/ | agentskills.io (full) |
| GitHub Copilot | .github/skills/clarity-gate/ | agentskills.io (full) |
| Canonical | skills/clarity-gate/ | agentskills.io (full) |
Pre-built skill file: dist/clarity-gate.skill
See CLARITY_GATE_FORMAT_SPEC.md for the complete format specification (v2.0).
Verify Mode (default):
"Run clarity gate on this document"
→ Issues report + Two-Round HITL verification
Annotate Mode:
"Run clarity gate and annotate this document"
→ Complete document with fixes applied inline (CGD)
The annotated output is a Clarity-Gated Document (CGD).
flowchart TD
A[Raw Docs
notes, PRDs, transcripts] --> B[process
add epistemic markers
compute document-sha256]
B --> C[CGD
safe for RAG ingestion]
C -->|optional| D[promote
add tier block]
D --> E[SOT
canonical + extractable]
C --> F[generate HITL queue
claim IDs + locations]
F --> G[Human review
confirm/reject]
G --> H[apply-hitl
transaction + checkpoint]
H --> C
See ARCHITECTURE.md for full details and examples.
Different claims need different types of verification:
| Claim Type | What Human Checks | Cognitive Load |
|---|---|---|
| LLM found source, human witnessed | "Did I interpret correctly?" | Low (quick scan) |
| Human's own data | "Is this actually true?" | High (real verification) |
| No source found | "Is this actually true?" | High (real verification) |
The system separates these into two rounds:
Quick scan of claims from sources found in the current session:
## Derived Data Confirmation
These claims came from sources found in this session:
- [Specific claim from source A] (source link)
- [Specific claim from source B] (source link)
Reply "confirmed" or flag any I misread.
Full verification of claims needing actual checking:
## HITL Verification Required
| # | Claim | Why HITL Needed | Human Confirms |
|---|-------|-----------------|----------------|
| 1 | Benchmark scores (100%, 75%→100%) | Your experiment data | [ ] True / [ ] False |
Result: Human attention focused on claims that actually need it.
flowchart TD
A[Claim Extracted] --> B{Source of Truth Exists?}
B -->|YES| C[Tier 1: Automated Verification]
B -->|NO| D[Tier 2: HITL Two-Round Verification]
C --> E[Tier 1A: Internal]
C --> F[Tier 1B: External]
E --> G[PASS / BLOCK]
F --> G
D --> H[Round A]
H --> I[Round B]
I --> J[APPROVE / REJECT]
Checks for contradictions within a document — no external systems required.
| Check Type | Example |
|---|---|
| Figure vs. Text | Figure shows β=0.33, text claims β=0.73 |
| Abstract vs. Body | Abstract claims "40% improvement," body shows 28% |
| Table vs. Prose | Table lists 5 features, text references 7 |
See biology paper example for a real case where Clarity Gate detected a Δ=0.40 discrepancy. Try it yourself at arxiparse.org.
For claims verifiable against structured sources. Users provide connectors.
The system detects which specific claims need human review AND what kind of review each needs.
Example: Most claims in a document typically pass automated checks, with the remainder split between Round A (quick confirmation) and Round B (real verification). (Illustrative — actual ratios vary by document type.)
Layer 4: Human Strategic Oversight
Layer 3: AI Behavior Verification (behavioral evals, red-teaming)
Layer 2: Input/Context Verification **Clarity Gate verifies FORM, not TRUTH.**
This system checks whether claims are properly marked as uncertain — it cannot verify if claims are actually true.
**Risk:** An LLM can hallucinate facts INTO a document, then "pass" Clarity Gate by adding source markers to false claims.
**Mitigation:** Two-Round HITL verification is **mandatory** before declaring PASS. See [SKILL.md](skills/clarity-gate/SKILL.md) for the full protocol.
---
## Non-Goals (By Design)
- Does **not** prove truth automatically — enforces correct labeling and verification workflow
- Does **not** replace source citations — prevents epistemic category errors
- Does **not** require a centralized database — file-first and Git-friendly
---
## Roadmap
| Phase | Status | Description |
|-------|--------|-------------|
| **Phase 1** | ✅ Ready | Internal consistency checks + Two-Round HITL + annotation (Claude skill) |
| **Phase 2** | 🔜 Planned | npm/PyPI validators for CI/CD integration |
| **Phase 3** | 🔜 Planned | External verification hooks (user connectors) |
| **Phase 4** | 🔜 Planned | Confidence scoring for HITL optimization |
See [ROADMAP.md](docs/ROADMAP.md) for details.
---
## Documentation
| Document | Description |
|----------|-------------|
| [CLARITY_GATE_FORMAT_SPEC.md](docs/CLARITY_GATE_FORMAT_SPEC.md) | Unified format specification (v2.0) |
| [CLARITY_GATE_PROCEDURES.md](docs/CLARITY_GATE_PROCEDURES.md) | Verification procedures and workflows |
| [ARCHITECTURE.md](docs/ARCHITECTURE.md) | Full 9-point system, verification hierarchy |
| [PRIOR_ART.md](docs/PRIOR_ART.md) | Landscape of existing systems |
| [ROADMAP.md](docs/ROADMAP.md) | Phase 1/2/3 development plan |
| [BENCHMARK_RESULTS.md](docs/research/BENCHMARK_RESULTS.md) | Empirical validation (+19-25% improvement for mid-tier models) |
| [SKILL.md](skills/clarity-gate/SKILL.md) | Claude skill implementation (v2.0) |
| [examples/](examples/) | Real-world verification examples |
---
## Related
**arxiparse.org** — Live implementation for scientific papers
[arxiparse.org](https://arxiparse.org)
**Source of Truth Creator** — Create epistemically calibrated documents (use before verification)
[github.com/frmoretto/source-of-truth-creator](https://github.com/frmoretto/source-of-truth-creator)
**Stream Coding** — Documentation-first methodology where Clarity Gate originated
[github.com/frmoretto/stream-coding](https://github.com/frmoretto/stream-coding)
---
## License
CC BY 4.0 — Use freely with attribution.
---
## Author
**Francesco Marinoni Moretto**
- GitHub: [@frmoretto](https://github.com/frmoretto)
- LinkedIn: [francesco-moretto](https://www.linkedin.com/in/francesco-moretto/)
---
## Contributing
Looking for:
1. **Prior art** — Open-source pre-ingestion gates for epistemic quality I missed?
2. **Integration** — LlamaIndex, LangChain implementations
3. **Verification feedback** — Are the 9 points the right focus?
4. **Real-world examples** — Documents that expose edge cases
Open an issue or PR.
Best experience: Claude Code
/plugin marketplace add frmoretto/clarity-gateThen /plugin menu → select skill → restart. Use /skill-name:init for first-time setup.
Other platforms
Install via CLI
npx mdskills install frmoretto/clarity-gateClarity Gate is a free, open-source AI agent skill. Pre-ingestion verification for epistemic quality in RAG systems. Ensures documents are properly qualified before entering knowledge bases. Produces CGD (Clarity-Gated Documents) and validates SOT (Source of Truth) files.
Install Clarity Gate with a single command:
npx mdskills install frmoretto/clarity-gateThis downloads the skill files into your project and your AI agent picks them up automatically.
Clarity Gate works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.