English · 简体中文 · 日本語 👋 join us on Discord and WeChat If you like Dingo, please give us a ⭐ on GitHub! Dingo is A Comprehensive AI Data, Model and Application Quality Evaluation Tool, designed for ML practitioners, data engineers, and AI researchers. It helps you systematically assess and improve the quality of training data, fine-tuning datasets, and production AI systems. 🎯 Production-Grade Qua
Add this skill
npx mdskills install DataEval/dingoComprehensive data quality evaluation tool with 70+ metrics, multi-source support, and hybrid rule-LLM assessment
1<!-- SEO Meta Information and Structured Data -->2<div itemscope itemtype="https://schema.org/SoftwareApplication" align="center" xmlns="http://www.w3.org/1999/html">3 <meta itemprop="name" content="Dingo: A Comprehensive AI Data Quality Evaluation Tool">4 <meta itemprop="description" content="Comprehensive AI-powered data quality assessment platform for machine learning datasets, LLM training data validation, hallucination detection, and RAG system evaluation">5 <meta itemprop="applicationCategory" content="Data Quality Software">6 <meta itemprop="operatingSystem" content="Cross-platform">7 <meta itemprop="programmingLanguage" content="Python">8 <meta itemprop="url" content="https://github.com/MigoXLab/dingo">9 <meta itemprop="downloadUrl" content="https://pypi.org/project/dingo-python/">10 <meta itemprop="softwareVersion" content="latest">11 <meta itemprop="license" content="Apache-2.0">1213<!-- logo -->14<p align="center">15 <img src="docs/assets/dingo-logo.png" width="300px" style="vertical-align:middle;" alt="Dingo AI Data Quality Evaluation Tool Logo">16</p>1718<!-- badges -->19<p align="center">20 <a href="https://github.com/pre-commit/pre-commit"><img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white" alt="pre-commit"></a>21 <a href="https://pypi.org/project/dingo-python/"><img src="https://img.shields.io/pypi/v/dingo-python.svg" alt="PyPI version"></a>22 <a href="https://pypi.org/project/dingo-python/"><img src="https://img.shields.io/pypi/pyversions/dingo-python.svg" alt="Python versions"></a>23 <a href="https://github.com/DataEval/dingo/blob/main/LICENSE"><img src="https://img.shields.io/github/license/DataEval/dingo" alt="License"></a>24 <a href="https://github.com/DataEval/dingo/stargazers"><img src="https://img.shields.io/github/stars/DataEval/dingo" alt="GitHub stars"></a>25 <a href="https://github.com/DataEval/dingo/network/members"><img src="https://img.shields.io/github/forks/DataEval/dingo" alt="GitHub forks"></a>26 <a href="https://github.com/DataEval/dingo/issues"><img src="https://img.shields.io/github/issues/DataEval/dingo" alt="GitHub issues"></a>27 <a href="https://mseep.ai/app/dataeval-dingo"><img src="https://mseep.net/pr/dataeval-dingo-badge.png" alt="MseeP.ai Security Assessment Badge" height="20"></a>28 <a href="https://deepwiki.com/MigoXLab/dingo"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>29 <a href="https://archestra.ai/mcp-catalog/dataeval__dingo"><img src="https://archestra.ai/mcp-catalog/api/badge/quality/DataEval/dingo" alt="Trust Score"></a>30</p>3132</div>333435<div align="center">3637[English](README.md) · [简体中文](README_zh-CN.md) · [日本語](README_ja.md)3839</div>404142<!-- join us -->4344<p align="center">45 👋 join us on <a href="https://discord.gg/Jhgb2eKWh8" target="_blank">Discord</a> and <a href="./docs/assets/wechat.jpg" target="_blank">WeChat</a>46</p>474849<p align="center">50 If you like Dingo, please give us a ⭐ on GitHub!51 <br/>52 <a href="https://github.com/DataEval/dingo/stargazers" target="_blank">53 <img src="docs/assets/clickstar_2.gif" alt="Click Star" width="480">54 </a>55</p>565758# Introduction5960**Dingo is A Comprehensive AI Data, Model and Application Quality Evaluation Tool**, designed for ML practitioners, data engineers, and AI researchers. It helps you systematically assess and improve the quality of training data, fine-tuning datasets, and production AI systems.6162## Why Dingo?6364🎯 **Production-Grade Quality Checks** - From pre-training datasets to RAG systems, ensure your AI gets high-quality data6566🗄️ **Multi-Source Data Integration** - Seamlessly connect to Local files, SQL databases (PostgreSQL/MySQL/SQLite), HuggingFace datasets, and S3 storage6768🔍 **Multi-Field Evaluation** - Apply different quality rules to different fields in parallel (e.g., ISBN validation for `isbn`, text quality for `title`)6970🤖 **RAG System Assessment** - Comprehensive evaluation of retrieval and generation quality with 5 academic-backed metrics7172🧠 **LLM & Rule & Agent Hybrid** - Combine fast heuristic rules (30+ built-in) with LLM-based deep assessment7374🚀 **Flexible Execution** - Run locally for rapid iteration or scale with Spark for billion-scale datasets7576📊 **Rich Reporting** - Detailed quality reports with GUI visualization and field-level insights7778## Architecture Diagram79808182# Quick Start8384## Installation8586```shell87pip install dingo-python88```8990## Example Use Cases of Dingo9192### 1. Evaluate LLM chat data9394```python95from dingo.config.input_args import EvaluatorLLMArgs96from dingo.io.input import Data97from dingo.model.llm.text_quality.llm_text_quality_v4 import LLMTextQualityV498from dingo.model.rule.rule_common import RuleEnterAndSpace99100data = Data(101 data_id='123',102 prompt="hello, introduce the world",103 content="Hello! The world is a vast and diverse place, full of wonders, cultures, and incredible natural beauty."104)105106107def llm():108 LLMTextQualityV4.dynamic_config = EvaluatorLLMArgs(109 key='YOUR_API_KEY',110 api_url='https://api.openai.com/v1/chat/completions',111 model='gpt-4o',112 )113 res = LLMTextQualityV4.eval(data)114 print(res)115116117def rule():118 res = RuleEnterAndSpace().eval(data)119 print(res)120```121122### 2. Evaluate Dataset123124```python125from dingo.config import InputArgs126from dingo.exec import Executor127128# Evaluate a dataset from Hugging Face129input_data = {130 "input_path": "tatsu-lab/alpaca", # Dataset from Hugging Face131 "dataset": {132 "source": "hugging_face",133 "format": "plaintext" # Format: plaintext134 },135 "executor": {136 "result_save": {137 "bad": True # Save evaluation results138 }139 },140 "evaluator": [141 {142 "evals": [143 {"name": "RuleColonEnd"},144 {"name": "RuleSpecialCharacter"}145 ]146 }147 ]148}149150input_args = InputArgs(**input_data)151executor = Executor.exec_map["local"](input_args)152result = executor.execute()153print(result)154```155156## Command Line Interface157158### Evaluate with Rule Sets159160```shell161python -m dingo.run.cli --input test/env/local_plaintext.json162```163164### Evaluate with LLM (e.g., GPT-4o)165166```shell167python -m dingo.run.cli --input test/env/local_json.json168```169170## GUI Visualization171172After evaluation (with `result_save.bad=True`), a frontend page will be automatically generated. To manually start the frontend:173174```shell175python -m dingo.run.vsl --input output_directory176```177178Where `output_directory` contains the evaluation results with a `summary.json` file.179180181182## Online Demo183Try Dingo on our online demo: [(Hugging Face)🤗](https://huggingface.co/spaces/DataEval/dingo)184185## Local Demo186Try Dingo in local:187188```shell189cd app_gradio190python app.py191```192193194195196## Google Colab Demo197Experience Dingo interactively with Google Colab notebook: [](https://colab.research.google.com/github/DataEval/dingo/blob/dev/examples/colab/dingo_colab_demo.ipynb)198199200201# MCP Server202203Dingo includes an experimental Model Context Protocol (MCP) server. For details on running the server and integrating it with clients like Cursor, please see the dedicated documentation:204205[English](README_mcp.md) · [简体中文](README_mcp_zh-CN.md) · [日本語](README_mcp_ja.md)206207## Video Demonstration208209To help you get started quickly with Dingo MCP, we've created a video walkthrough:210211https://github.com/user-attachments/assets/aca26f4c-3f2e-445e-9ef9-9331c4d7a37b212213This video demonstrates step-by-step how to use Dingo MCP server with Cursor.214215216# 🎓 Key Concepts for Practitioners217218## What Makes Dingo Production-Ready?219220### 1. **Multi-Field Evaluation Pipeline**221Apply different quality checks to different fields in a single pass:222```python223"evaluator": [224 {"fields": {"content": "isbn"}, "evals": [{"name": "RuleIsbn"}]},225 {"fields": {"content": "title"}, "evals": [{"name": "RuleAbnormalChar"}]},226 {"fields": {"content": "description"}, "evals": [{"name": "LLMTextQualityV5"}]}227]228```229**Why It Matters**: Evaluate structured data (like database tables) without writing separate scripts for each field.230231### 2. **Stream Processing for Large Datasets**232SQL datasources use SQLAlchemy's server-side cursors:233```python234# Handles billions of rows without OOM235for data in dataset.get_data(): # Yields one row at a time236 result = evaluator.eval(data)237```238**Why It Matters**: Process production databases without exporting to intermediate files.239240### 3. **Field Isolation in Memory**241RAG evaluations prevent context bleeding across different field combinations:242```243outputs/244├── user_input,response,retrieved_contexts/ # Faithfulness group245└── user_input,response/ # Answer Relevancy group246```247**Why It Matters**: Accurate metric calculations when evaluating multiple field combinations.248249### 4. **Hybrid Rule-LLM Strategy**250Combine fast rules (100% coverage) with sampled LLM checks (10% coverage):251```python252"evals": [253 {"name": "RuleAbnormalChar"}, # Fast, runs on all data254 {"name": "LLMTextQualityV5"} # Expensive, sample if needed255]256```257**Why It Matters**: Balance cost and coverage for production-scale evaluation.258259### 5. **Extensibility Through Registration**260Clean plugin architecture for custom rules, prompts, and models:261```python262@Model.rule_register('QUALITY_BAD_CUSTOM', ['default'])263class MyCustomRule(BaseRule):264 @classmethod265 def eval(cls, input_data: Data) -> EvalDetail:266 # Example: check if content is empty267 if not input_data.content:268 return EvalDetail(269 metric=cls.__name__,270 status=True, # Found an issue271 label=[f'{cls.metric_type}.{cls.__name__}'],272 reason=["Content is empty"]273 )274 return EvalDetail(275 metric=cls.__name__,276 status=False, # No issue found277 label=['QUALITY_GOOD']278 )279```280**Why It Matters**: Adapt to domain-specific requirements without forking the codebase.281282---283284# 📚 Data Quality Metrics285286Dingo provides **70+ evaluation metrics** across multiple dimensions, combining rule-based speed with LLM-based depth.287288## Metric Categories289290| Category | Examples | Use Case |291|----------|----------|----------|292| **Pretrain Text Quality** | Completeness, Effectiveness, Similarity, Security | LLM pre-training data filtering |293| **SFT Data Quality** | Honest, Helpful, Harmless (3H) | Instruction fine-tuning data |294| **RAG Evaluation** | Faithfulness, Context Precision, Answer Relevancy | RAG system assessment |295| **Hallucination Detection** | HHEM-2.1-Open, Factuality Check | Production AI reliability |296| **Classification** | Topic categorization, Content labeling | Data organization |297| **Multimodal** | Image-text relevance, VLM quality | Vision-language data |298| **Security** | PII detection, Perspective API toxicity | Privacy and safety |299300📊 **[View Complete Metrics Documentation →](docs/metrics.md)**301📖 **[RAG Evaluation Guide →](docs/rag_evaluation_metrics.md)** | **[中文版](docs/rag_evaluation_metrics_zh.md)**302🔍 **[Hallucination Detection Guide →](docs/hallucination_detection_guide.md)** | **[中文版](docs/hallucination_guide.md)**303✅ **[Factuality Assessment Guide →](docs/factuality_assessment_guide.md)** | **[中文版](docs/factcheck_guide.md)**304305Most metrics are backed by academic research to ensure scientific rigor.306307## Quick Metric Usage308309```python310llm_config = {311 "model": "gpt-4o",312 "key": "YOUR_API_KEY",313 "api_url": "https://api.openai.com/v1/chat/completions"314}315316input_data = {317 "evaluator": [318 {319 "fields": {"content": "content"},320 "evals": [321 {"name": "RuleAbnormalChar"}, # Rule-based (fast)322 {"name": "LLMTextQualityV5", "config": llm_config} # LLM-based (deep)323 ]324 }325 ]326}327```328329**Customization**: All prompts are defined in `dingo/model/llm/` directory (organized by category: `text_quality/`, `rag/`, `hhh/`, etc.). Extend or modify them for domain-specific requirements.330331332# 🌟 Feature Highlights333334## 📊 Multi-Source Data Integration335336**Diverse Data Sources** - Connect to where your data lives337✅ **Local Files**: JSONL, CSV, TXT, Parquet338✅ **SQL Databases**: PostgreSQL, MySQL, SQLite, Oracle, SQL Server (with stream processing)339✅ **Cloud Storage**: S3 and S3-compatible storage340✅ **ML Platforms**: Direct HuggingFace datasets integration341342**Enterprise-Ready SQL Support** - Production database integration343✅ Memory-efficient streaming for billion-scale datasets344✅ Connection pooling and automatic resource cleanup345✅ Complex SQL queries (JOIN, WHERE, aggregations)346✅ Multiple dialect support with SQLAlchemy347348**Multi-Field Quality Checks** - Different rules for different fields349✅ Parallel evaluation pipelines (e.g., ISBN validation + text quality simultaneously)350✅ Field aliasing and nested field extraction (`user.profile.name`)351✅ Independent result reports per field352✅ ETL pipeline architecture for flexible data transformation353354---355356## 🤖 RAG System Evaluation357358**5 Academic-Backed Metrics** - Based on RAGAS, DeepEval, TruLens research359✅ **Faithfulness**: Answer-context consistency (hallucination detection)360✅ **Answer Relevancy**: Answer-query alignment361✅ **Context Precision**: Retrieval precision362✅ **Context Recall**: Retrieval recall363✅ **Context Relevancy**: Context-query relevance364365**Comprehensive Reporting** - Auto-aggregated statistics366✅ Average, min, max, standard deviation for each metric367✅ Field-grouped results368✅ Batch and single evaluation modes369370📖 **[View RAG Evaluation Guide →](docs/rag_evaluation_metrics_zh.md)**371372---373374## 🧠 Hybrid Evaluation System375376**Rule-Based** - Fast, deterministic, cost-effective377✅ 30+ built-in rules (text quality, format, PII detection)378✅ Regex, heuristics, statistical checks379✅ Custom rule registration380381**LLM-Based** - Deep semantic understanding382✅ OpenAI (GPT-4o, GPT-3.5), DeepSeek, Kimi383✅ Local models (Llama3, Qwen)384✅ Vision-Language Models (InternVL, Gemini)385✅ Custom prompt registration386387**Agent-Based** - Multi-step reasoning with tools388✅ Web search integration (Tavily)389✅ Adaptive context gathering390✅ Multi-source fact verification391✅ Custom agent & tool registration392393**Extensible Architecture**394✅ Plugin-based rule/prompt/model registration395✅ Clean separation of concerns (agents, tools, orchestration)396✅ Domain-specific customization397398---399400## 🚀 Flexible Execution & Integration401402**Multiple Interfaces**403✅ CLI for quick checks404✅ Python SDK for integration405✅ MCP (Model Context Protocol) server for IDEs (Cursor, etc.)406407**Scalable Execution**408✅ Local executor for rapid iteration409✅ Spark executor for distributed processing410✅ Configurable concurrency and batching411412**Data Sources**413✅ **Local Files**: JSONL, CSV, TXT, Parquet formats414✅ **Hugging Face**: Direct integration with HF datasets hub415✅ **S3 Storage**: AWS S3 and S3-compatible storage416✅ **SQL Databases**: PostgreSQL, MySQL, SQLite, Oracle, SQL Server (stream processing for large-scale data)417418**Modalities**419✅ Text (chat, documents, code)420✅ Images (with VLM support)421✅ Multimodal (text + image consistency)422423---424425## 📈 Rich Reporting & Visualization426427**Multi-Level Reports**428✅ Summary JSON with overall scores429✅ Field-level breakdown430✅ Per-rule violation details431✅ Type and name distribution432433**GUI Visualization**434✅ Built-in web interface435✅ Interactive data exploration436✅ Anomaly tracking437438**Metric Aggregation**439✅ Automatic statistics (avg, min, max, std_dev)440✅ Field-grouped metrics441✅ Overall quality score442443---444445# 📖 User Guide446447## 🔧 Extensibility448449Dingo uses a clean plugin architecture for domain-specific customization:450451### Custom Rule Registration452453```python454from dingo.model import Model455from dingo.model.rule.base import BaseRule456from dingo.io import Data457from dingo.io.output.eval_detail import EvalDetail458459@Model.rule_register('QUALITY_BAD_CUSTOM', ['default'])460class DomainSpecificRule(BaseRule):461 """Check domain-specific patterns"""462463 @classmethod464 def eval(cls, input_data: Data) -> EvalDetail:465 text = input_data.content466467 # Your custom logic468 is_valid = your_validation_logic(text)469470 return EvalDetail(471 metric=cls.__name__,472 status=not is_valid, # False = good, True = bad473 label=['QUALITY_GOOD' if is_valid else 'QUALITY_BAD_CUSTOM'],474 reason=["Validation details..."]475 )476```477478### Custom LLM/Prompt Registration479480```python481from dingo.model import Model482from dingo.model.llm.base_openai import BaseOpenAI483484@Model.llm_register('custom_evaluator')485class CustomEvaluator(BaseOpenAI):486 """Custom LLM evaluator with specialized prompts"""487488 _metric_info = {489 "metric_name": "CustomEvaluator",490 "metric_type": "LLM-Based Quality",491 "category": "Custom Category"492 }493494 prompt = """Your custom prompt here..."""495```496497**Examples:**498- [Custom Rules](examples/register/sdk_register_rule.py)499- [Custom Models](examples/register/sdk_register_llm.py)500501### Agent-Based Evaluation with Tools502503Dingo supports agent-based evaluators that can use external tools for multi-step reasoning and adaptive context gathering:504505```python506from dingo.io import Data507from dingo.io.output.eval_detail import EvalDetail508from dingo.model import Model509from dingo.model.llm.agent.base_agent import BaseAgent510511@Model.llm_register('MyAgent')512class MyAgent(BaseAgent):513 """Custom agent with tool support"""514515 available_tools = ["tavily_search", "my_custom_tool"]516 max_iterations = 5517518 @classmethod519 def eval(cls, input_data: Data) -> EvalDetail:520 # Use tools for fact-checking521 search_result = cls.execute_tool('tavily_search', query=input_data.content)522523 # Multi-step reasoning with LLM524 result = cls.send_messages([...])525526 return EvalDetail(...)527```528529**Built-in Agent:**530- `AgentHallucination`: Enhanced hallucination detection with web search fallback531532**Configuration Example:**533```json534{535 "evaluator": [{536 "evals": [{537 "name": "AgentHallucination",538 "config": {539 "key": "openai-api-key",540 "model": "gpt-4",541 "parameters": {542 "agent_config": {543 "max_iterations": 5,544 "tools": {545 "tavily_search": {"api_key": "tavily-key"}546 }547 }548 }549 }550 }]551 }]552}553```554555**Learn More:**556- [Agent Development Guide](docs/agent_development_guide.md) - Comprehensive guide for creating custom agents and tools557- [AgentHallucination Example](examples/agent/agent_hallucination_example.py) - Production agent example558- [AgentFactCheck Example](examples/agent/agent_executor_example.py) - LangChain agent example559560## ⚙️ Execution Modes561562### Local Executor (Development & Small-Scale)563564```python565from dingo.config import InputArgs566from dingo.exec import Executor567568input_args = InputArgs(**input_data)569executor = Executor.exec_map["local"](input_args)570result = executor.execute()571572# Access results573summary = executor.get_summary() # Overall metrics574bad_data = executor.get_bad_info_list() # Quality issues575good_data = executor.get_good_info_list() # High-quality data576```577578**Best For**: Rapid iteration, debugging, datasets < 100K rows579580### Spark Executor (Production & Large-Scale)581582```python583from pyspark.sql import SparkSession584from dingo.exec import Executor585586spark = SparkSession.builder.appName("Dingo").getOrCreate()587spark_rdd = spark.sparkContext.parallelize(your_data)588589executor = Executor.exec_map["spark"](590 input_args,591 spark_session=spark,592 spark_rdd=spark_rdd593)594result = executor.execute()595```596597**Best For**: Production pipelines, distributed processing, datasets > 1M rows598599## Evaluation Reports600601After evaluation, Dingo generates:6026031. **Summary Report** (`summary.json`): Overall metrics and scores6042. **Detailed Reports**: Specific issues for each rule violation605606Report Description:6071. **score**: `num_good` / `total`6082. **type_ratio**: The count of type / total, such as: `QUALITY_BAD_COMPLETENESS` / `total`609610Example summary:611```json612{613 "task_id": "d6c922ec-981c-11ef-b723-7c10c9512fac",614 "task_name": "dingo",615 "eval_group": "default",616 "input_path": "test/data/test_local_jsonl.jsonl",617 "output_path": "outputs/d6c921ac-981c-11ef-b723-7c10c9512fac",618 "create_time": "20241101_144510",619 "score": 50.0,620 "num_good": 1,621 "num_bad": 1,622 "total": 2,623 "type_ratio": {624 "content": {625 "QUALITY_BAD_COMPLETENESS.RuleColonEnd": 0.5,626 "QUALITY_BAD_RELEVANCE.RuleSpecialCharacter": 0.5627 }628 }629}630```631632# 🚀 Roadmap & Contributions633634## Future Plans635636- [ ] **Agent-as-a-Judge** - Multi-agent debate patterns for bias reduction and complex reasoning637- [ ] **SaaS Platform** - Hosted evaluation service with API access and dashboard638- [ ] **Audio & Video Modalities** - Extend beyond text/image639- [ ] **Diversity Metrics** - Statistical diversity assessment640- [ ] **Real-time Monitoring** - Continuous quality checks in production pipelines641642## Limitations643644The current built-in detection rules and model methods primarily focus on common data quality issues. For special evaluation needs, we recommend customizing detection rules.645646# Acknowledgments647648- [RedPajama-Data](https://github.com/togethercomputer/RedPajama-Data)649- [mlflow](https://github.com/mlflow/mlflow)650- [deepeval](https://github.com/confident-ai/deepeval)651- [ragas](https://github.com/explodinggradients/ragas)652653# Contribution654655We appreciate all the contributors for their efforts to improve and enhance `Dingo`. Please refer to the [Contribution Guide](docs/en/CONTRIBUTING.md) for guidance on contributing to the project.656657# License658659This project uses the [Apache 2.0 Open Source License](LICENSE).660661This project uses fasttext for some functionality including language detection. fasttext is licensed under the MIT License, which is compatible with our Apache 2.0 license and provides flexibility for various usage scenarios.662663# Citation664665If you find this project useful, please consider citing our tool:666667```668@misc{dingo,669 title={Dingo: A Comprehensive AI Data Quality Evaluation Tool for Large Models},670 author={Dingo Contributors},671 howpublished={\url{https://github.com/MigoXLab/dingo}},672 year={2024}673}674```675
Full transparency — inspect the skill content before installing.