How do I install Dingo?

Install Dingo with a single command: npx mdskills install DataEval/dingo. This downloads the skill files into your project and your AI agent picks them up automatically.
What platforms support Dingo?

Dingo works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.
← Back to skills
Dingo

Name: Dingo: AI Agent Skill
Rating: 8 (1 reviews)
Author: DataEval
Verified
AI & Machine LearningIntermediate
English · 简体中文 · 日本語 👋 join us on Discord and WeChat If you like Dingo, please give us a ⭐ on GitHub! Dingo is A Comprehensive AI Data, Model and Application Quality Evaluation Tool, designed for ML practitioners, data engineers, and AI researchers. It helps you systematically assess and improve the quality of training data, fine-tuning datasets, and production AI systems. 🎯 Production-Grade Qua
by @DataEval0Updated 2/24/2026
Add this skill
npx mdskills install DataEval/dingo
Fork & Edit
Skill Advisor8.0
Comprehensive data quality evaluation tool with 70+ metrics, multi-source support, and hybrid rule-LLM assessment
+Provides production-ready architecture with multi-field evaluation and stream processing for large datasets
+Offers extensive metric library covering pretrain, SFT, RAG, hallucination detection, and multimodal quality
+Supports multiple data sources (SQL, HuggingFace, S3, local) with flexible execution modes and GUI visualization
-Declared permissions are broad without clear justification for shell execution and write access scope
-Documentation focuses heavily on capabilities but lacks concrete examples of security validation and error handling
SKILL.md
Edit in Browser
1<!-- SEO Meta Information and Structured Data -->
2<div itemscope itemtype="https://schema.org/SoftwareApplication" align="center" xmlns="http://www.w3.org/1999/html">
3  <meta itemprop="name" content="Dingo: A Comprehensive AI Data Quality Evaluation Tool">
4  <meta itemprop="description" content="Comprehensive AI-powered data quality assessment platform for machine learning datasets, LLM training data validation, hallucination detection, and RAG system evaluation">
5  <meta itemprop="applicationCategory" content="Data Quality Software">
6  <meta itemprop="operatingSystem" content="Cross-platform">
7  <meta itemprop="programmingLanguage" content="Python">
8  <meta itemprop="url" content="https://github.com/MigoXLab/dingo">
9  <meta itemprop="downloadUrl" content="https://pypi.org/project/dingo-python/">
10  <meta itemprop="softwareVersion" content="latest">
11  <meta itemprop="license" content="Apache-2.0">
12 
13<!-- logo -->
14<p align="center">
15  <img src="docs/assets/dingo-logo.png" width="300px" style="vertical-align:middle;" alt="Dingo AI Data Quality Evaluation Tool Logo">
16</p>
17 
18<!-- badges -->
19<p align="center">
20  <a href="https://github.com/pre-commit/pre-commit"><img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white" alt="pre-commit"></a>
21  <a href="https://pypi.org/project/dingo-python/"><img src="https://img.shields.io/pypi/v/dingo-python.svg" alt="PyPI version"></a>
22  <a href="https://pypi.org/project/dingo-python/"><img src="https://img.shields.io/pypi/pyversions/dingo-python.svg" alt="Python versions"></a>
23  <a href="https://github.com/DataEval/dingo/blob/main/LICENSE"><img src="https://img.shields.io/github/license/DataEval/dingo" alt="License"></a>
24  <a href="https://github.com/DataEval/dingo/stargazers"><img src="https://img.shields.io/github/stars/DataEval/dingo" alt="GitHub stars"></a>
25  <a href="https://github.com/DataEval/dingo/network/members"><img src="https://img.shields.io/github/forks/DataEval/dingo" alt="GitHub forks"></a>
26  <a href="https://github.com/DataEval/dingo/issues"><img src="https://img.shields.io/github/issues/DataEval/dingo" alt="GitHub issues"></a>
27  <a href="https://mseep.ai/app/dataeval-dingo"><img src="https://mseep.net/pr/dataeval-dingo-badge.png" alt="MseeP.ai Security Assessment Badge" height="20"></a>
28  <a href="https://deepwiki.com/MigoXLab/dingo"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
29  <a href="https://archestra.ai/mcp-catalog/dataeval__dingo"><img src="https://archestra.ai/mcp-catalog/api/badge/quality/DataEval/dingo" alt="Trust Score"></a>
30</p>
31 
32</div>
33 
34 
35<div align="center">
36 
37[English](README.md) · [简体中文](README_zh-CN.md) · [日本語](README_ja.md)
38 
39</div>
40 
41 
42<!-- join us -->
43 
44<p align="center">
45    👋 join us on <a href="https://discord.gg/Jhgb2eKWh8" target="_blank">Discord</a> and <a href="./docs/assets/wechat.jpg" target="_blank">WeChat</a>
46</p>
47 
48 
49<p align="center">
50  If you like Dingo, please give us a ⭐ on GitHub!
51  <br/>
52  <a href="https://github.com/DataEval/dingo/stargazers" target="_blank">
53    <img src="docs/assets/clickstar_2.gif" alt="Click Star" width="480">
54  </a>
55</p>
56 
57 
58# Introduction
59 
60**Dingo is A Comprehensive AI Data, Model and Application Quality Evaluation Tool**, designed for ML practitioners, data engineers, and AI researchers. It helps you systematically assess and improve the quality of training data, fine-tuning datasets, and production AI systems.
61 
62## Why Dingo?
63 
64🎯 **Production-Grade Quality Checks** - From pre-training datasets to RAG systems, ensure your AI gets high-quality data
65 
66🗄️ **Multi-Source Data Integration** - Seamlessly connect to Local files, SQL databases (PostgreSQL/MySQL/SQLite), HuggingFace datasets, and S3 storage
67 
68🔍 **Multi-Field Evaluation** - Apply different quality rules to different fields in parallel (e.g., ISBN validation for `isbn`, text quality for `title`)
69 
70🤖 **RAG System Assessment** - Comprehensive evaluation of retrieval and generation quality with 5 academic-backed metrics
71 
72🧠 **LLM & Rule & Agent Hybrid** - Combine fast heuristic rules (30+ built-in) with LLM-based deep assessment
73 
74🚀 **Flexible Execution** - Run locally for rapid iteration or scale with Spark for billion-scale datasets
75 
76📊 **Rich Reporting** - Detailed quality reports with GUI visualization and field-level insights
77 
78## Architecture Diagram
79 
80![Architecture of dingo](./docs/assets/architeture.png)
81 
82# Quick Start
83 
84## Installation
85 
86```shell
87pip install dingo-python
88```
89 
90## Example Use Cases of Dingo
91 
92### 1. Evaluate LLM chat data
93 
94```python
95from dingo.config.input_args import EvaluatorLLMArgs
96from dingo.io.input import Data
97from dingo.model.llm.text_quality.llm_text_quality_v4 import LLMTextQualityV4
98from dingo.model.rule.rule_common import RuleEnterAndSpace
99 
100data = Data(
101    data_id='123',
102    prompt="hello, introduce the world",
103    content="Hello! The world is a vast and diverse place, full of wonders, cultures, and incredible natural beauty."
104)
105 
106 
107def llm():
108    LLMTextQualityV4.dynamic_config = EvaluatorLLMArgs(
109        key='YOUR_API_KEY',
110        api_url='https://api.openai.com/v1/chat/completions',
111        model='gpt-4o',
112    )
113    res = LLMTextQualityV4.eval(data)
114    print(res)
115 
116 
117def rule():
118    res = RuleEnterAndSpace().eval(data)
119    print(res)
120```
121 
122### 2. Evaluate Dataset
123 
124```python
125from dingo.config import InputArgs
126from dingo.exec import Executor
127 
128# Evaluate a dataset from Hugging Face
129input_data = {
130    "input_path": "tatsu-lab/alpaca",  # Dataset from Hugging Face
131    "dataset": {
132        "source": "hugging_face",
133        "format": "plaintext"  # Format: plaintext
134    },
135    "executor": {
136        "result_save": {
137            "bad": True  # Save evaluation results
138        }
139    },
140    "evaluator": [
141        {
142            "evals": [
143                {"name": "RuleColonEnd"},
144                {"name": "RuleSpecialCharacter"}
145            ]
146        }
147    ]
148}
149 
150input_args = InputArgs(**input_data)
151executor = Executor.exec_map["local"](input_args)
152result = executor.execute()
153print(result)
154```
155 
156## Command Line Interface
157 
158### Evaluate with Rule Sets
159 
160```shell
161python -m dingo.run.cli --input test/env/local_plaintext.json
162```
163 
164### Evaluate with LLM (e.g., GPT-4o)
165 
166```shell
167python -m dingo.run.cli --input test/env/local_json.json
168```
169 
170## GUI Visualization
171 
172After evaluation (with `result_save.bad=True`), a frontend page will be automatically generated. To manually start the frontend:
173 
174```shell
175python -m dingo.run.vsl --input output_directory
176```
177 
178Where `output_directory` contains the evaluation results with a `summary.json` file.
179 
180![GUI output](docs/assets/dingo_gui.png)
181 
182## Online Demo
183Try Dingo on our online demo: [(Hugging Face)🤗](https://huggingface.co/spaces/DataEval/dingo)
184 
185## Local Demo
186Try Dingo in local:
187 
188```shell
189cd app_gradio
190python app.py
191```
192 
193![Gradio demo](docs/assets/gradio_demo.png)
194 
195 
196## Google Colab Demo
197Experience Dingo interactively with Google Colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DataEval/dingo/blob/dev/examples/colab/dingo_colab_demo.ipynb)
198 
199 
200 
201# MCP Server
202 
203Dingo includes an experimental Model Context Protocol (MCP) server. For details on running the server and integrating it with clients like Cursor, please see the dedicated documentation:
204 
205[English](README_mcp.md) · [简体中文](README_mcp_zh-CN.md) · [日本語](README_mcp_ja.md)
206 
207## Video Demonstration
208 
209To help you get started quickly with Dingo MCP, we've created a video walkthrough:
210 
211https://github.com/user-attachments/assets/aca26f4c-3f2e-445e-9ef9-9331c4d7a37b
212 
213This video demonstrates step-by-step how to use Dingo MCP server with Cursor.
214 
215 
216# 🎓 Key Concepts for Practitioners
217 
218## What Makes Dingo Production-Ready?
219 
220### 1. **Multi-Field Evaluation Pipeline**
221Apply different quality checks to different fields in a single pass:
222```python
223"evaluator": [
224    {"fields": {"content": "isbn"}, "evals": [{"name": "RuleIsbn"}]},
225    {"fields": {"content": "title"}, "evals": [{"name": "RuleAbnormalChar"}]},
226    {"fields": {"content": "description"}, "evals": [{"name": "LLMTextQualityV5"}]}
227]
228```
229**Why It Matters**: Evaluate structured data (like database tables) without writing separate scripts for each field.
230 
231### 2. **Stream Processing for Large Datasets**
232SQL datasources use SQLAlchemy's server-side cursors:
233```python
234# Handles billions of rows without OOM
235for data in dataset.get_data():  # Yields one row at a time
236    result = evaluator.eval(data)
237```
238**Why It Matters**: Process production databases without exporting to intermediate files.
239 
240### 3. **Field Isolation in Memory**
241RAG evaluations prevent context bleeding across different field combinations:
242```
243outputs/
244├── user_input,response,retrieved_contexts/  # Faithfulness group
245└── user_input,response/                     # Answer Relevancy group
246```
247**Why It Matters**: Accurate metric calculations when evaluating multiple field combinations.
248 
249### 4. **Hybrid Rule-LLM Strategy**
250Combine fast rules (100% coverage) with sampled LLM checks (10% coverage):
251```python
252"evals": [
253    {"name": "RuleAbnormalChar"},        # Fast, runs on all data
254    {"name": "LLMTextQualityV5"}         # Expensive, sample if needed
255]
256```
257**Why It Matters**: Balance cost and coverage for production-scale evaluation.
258 
259### 5. **Extensibility Through Registration**
260Clean plugin architecture for custom rules, prompts, and models:
261```python
262@Model.rule_register('QUALITY_BAD_CUSTOM', ['default'])
263class MyCustomRule(BaseRule):
264    @classmethod
265    def eval(cls, input_data: Data) -> EvalDetail:
266        # Example: check if content is empty
267        if not input_data.content:
268            return EvalDetail(
269                metric=cls.__name__,
270                status=True,  # Found an issue
271                label=[f'{cls.metric_type}.{cls.__name__}'],
272                reason=["Content is empty"]
273            )
274        return EvalDetail(
275            metric=cls.__name__,
276            status=False,  # No issue found
277            label=['QUALITY_GOOD']
278        )
279```
280**Why It Matters**: Adapt to domain-specific requirements without forking the codebase.
281 
282---
283 
284# 📚 Data Quality Metrics
285 
286Dingo provides **70+ evaluation metrics** across multiple dimensions, combining rule-based speed with LLM-based depth.
287 
288## Metric Categories
289 
290| Category | Examples | Use Case |
291|----------|----------|----------|
292| **Pretrain Text Quality** | Completeness, Effectiveness, Similarity, Security | LLM pre-training data filtering |
293| **SFT Data Quality** | Honest, Helpful, Harmless (3H) | Instruction fine-tuning data |
294| **RAG Evaluation** | Faithfulness, Context Precision, Answer Relevancy | RAG system assessment |
295| **Hallucination Detection** | HHEM-2.1-Open, Factuality Check | Production AI reliability |
296| **Classification** | Topic categorization, Content labeling | Data organization |
297| **Multimodal** | Image-text relevance, VLM quality | Vision-language data |
298| **Security** | PII detection, Perspective API toxicity | Privacy and safety |
299 
300📊 **[View Complete Metrics Documentation →](docs/metrics.md)**  
301📖 **[RAG Evaluation Guide →](docs/rag_evaluation_metrics.md)** | **[中文版](docs/rag_evaluation_metrics_zh.md)**  
302🔍 **[Hallucination Detection Guide →](docs/hallucination_detection_guide.md)** | **[中文版](docs/hallucination_guide.md)**  
303✅ **[Factuality Assessment Guide →](docs/factuality_assessment_guide.md)** | **[中文版](docs/factcheck_guide.md)**
304 
305Most metrics are backed by academic research to ensure scientific rigor.
306 
307## Quick Metric Usage
308 
309```python
310llm_config = {
311    "model": "gpt-4o",
312    "key": "YOUR_API_KEY",
313    "api_url": "https://api.openai.com/v1/chat/completions"
314}
315 
316input_data = {
317    "evaluator": [
318        {
319            "fields": {"content": "content"},
320            "evals": [
321                {"name": "RuleAbnormalChar"},           # Rule-based (fast)
322                {"name": "LLMTextQualityV5", "config": llm_config}  # LLM-based (deep)
323            ]
324        }
325    ]
326}
327```
328 
329**Customization**: All prompts are defined in `dingo/model/llm/` directory (organized by category: `text_quality/`, `rag/`, `hhh/`, etc.). Extend or modify them for domain-specific requirements.
330 
331 
332# 🌟 Feature Highlights
333 
334## 📊 Multi-Source Data Integration
335 
336**Diverse Data Sources** - Connect to where your data lives  
337✅ **Local Files**: JSONL, CSV, TXT, Parquet  
338✅ **SQL Databases**: PostgreSQL, MySQL, SQLite, Oracle, SQL Server (with stream processing)  
339✅ **Cloud Storage**: S3 and S3-compatible storage  
340✅ **ML Platforms**: Direct HuggingFace datasets integration
341 
342**Enterprise-Ready SQL Support** - Production database integration  
343✅ Memory-efficient streaming for billion-scale datasets  
344✅ Connection pooling and automatic resource cleanup  
345✅ Complex SQL queries (JOIN, WHERE, aggregations)  
346✅ Multiple dialect support with SQLAlchemy
347 
348**Multi-Field Quality Checks** - Different rules for different fields  
349✅ Parallel evaluation pipelines (e.g., ISBN validation + text quality simultaneously)  
350✅ Field aliasing and nested field extraction (`user.profile.name`)  
351✅ Independent result reports per field  
352✅ ETL pipeline architecture for flexible data transformation
353 
354---
355 
356## 🤖 RAG System Evaluation
357 
358**5 Academic-Backed Metrics** - Based on RAGAS, DeepEval, TruLens research  
359✅ **Faithfulness**: Answer-context consistency (hallucination detection)  
360✅ **Answer Relevancy**: Answer-query alignment  
361✅ **Context Precision**: Retrieval precision  
362✅ **Context Recall**: Retrieval recall  
363✅ **Context Relevancy**: Context-query relevance
364 
365**Comprehensive Reporting** - Auto-aggregated statistics  
366✅ Average, min, max, standard deviation for each metric  
367✅ Field-grouped results  
368✅ Batch and single evaluation modes
369 
370📖 **[View RAG Evaluation Guide →](docs/rag_evaluation_metrics_zh.md)**
371 
372---
373 
374## 🧠 Hybrid Evaluation System
375 
376**Rule-Based** - Fast, deterministic, cost-effective  
377✅ 30+ built-in rules (text quality, format, PII detection)  
378✅ Regex, heuristics, statistical checks  
379✅ Custom rule registration
380 
381**LLM-Based** - Deep semantic understanding  
382✅ OpenAI (GPT-4o, GPT-3.5), DeepSeek, Kimi  
383✅ Local models (Llama3, Qwen)  
384✅ Vision-Language Models (InternVL, Gemini)  
385✅ Custom prompt registration
386 
387**Agent-Based** - Multi-step reasoning with tools
388✅ Web search integration (Tavily)
389✅ Adaptive context gathering
390✅ Multi-source fact verification
391✅ Custom agent & tool registration
392 
393**Extensible Architecture**  
394✅ Plugin-based rule/prompt/model registration  
395✅ Clean separation of concerns (agents, tools, orchestration)  
396✅ Domain-specific customization
397 
398---
399 
400## 🚀 Flexible Execution & Integration
401 
402**Multiple Interfaces**  
403✅ CLI for quick checks  
404✅ Python SDK for integration  
405✅ MCP (Model Context Protocol) server for IDEs (Cursor, etc.)
406 
407**Scalable Execution**  
408✅ Local executor for rapid iteration  
409✅ Spark executor for distributed processing  
410✅ Configurable concurrency and batching
411 
412**Data Sources**  
413✅ **Local Files**: JSONL, CSV, TXT, Parquet formats  
414✅ **Hugging Face**: Direct integration with HF datasets hub  
415✅ **S3 Storage**: AWS S3 and S3-compatible storage  
416✅ **SQL Databases**: PostgreSQL, MySQL, SQLite, Oracle, SQL Server (stream processing for large-scale data)
417 
418**Modalities**  
419✅ Text (chat, documents, code)  
420✅ Images (with VLM support)  
421✅ Multimodal (text + image consistency)
422 
423---
424 
425## 📈 Rich Reporting & Visualization
426 
427**Multi-Level Reports**  
428✅ Summary JSON with overall scores  
429✅ Field-level breakdown  
430✅ Per-rule violation details  
431✅ Type and name distribution
432 
433**GUI Visualization**  
434✅ Built-in web interface  
435✅ Interactive data exploration  
436✅ Anomaly tracking
437 
438**Metric Aggregation**  
439✅ Automatic statistics (avg, min, max, std_dev)  
440✅ Field-grouped metrics  
441✅ Overall quality score
442 
443---
444 
445# 📖 User Guide
446 
447## 🔧 Extensibility
448 
449Dingo uses a clean plugin architecture for domain-specific customization:
450 
451### Custom Rule Registration
452 
453```python
454from dingo.model import Model
455from dingo.model.rule.base import BaseRule
456from dingo.io import Data
457from dingo.io.output.eval_detail import EvalDetail
458 
459@Model.rule_register('QUALITY_BAD_CUSTOM', ['default'])
460class DomainSpecificRule(BaseRule):
461    """Check domain-specific patterns"""
462 
463    @classmethod
464    def eval(cls, input_data: Data) -> EvalDetail:
465        text = input_data.content
466 
467        # Your custom logic
468        is_valid = your_validation_logic(text)
469 
470        return EvalDetail(
471            metric=cls.__name__,
472            status=not is_valid,  # False = good, True = bad
473            label=['QUALITY_GOOD' if is_valid else 'QUALITY_BAD_CUSTOM'],
474            reason=["Validation details..."]
475        )
476```
477 
478### Custom LLM/Prompt Registration
479 
480```python
481from dingo.model import Model
482from dingo.model.llm.base_openai import BaseOpenAI
483 
484@Model.llm_register('custom_evaluator')
485class CustomEvaluator(BaseOpenAI):
486    """Custom LLM evaluator with specialized prompts"""
487 
488    _metric_info = {
489        "metric_name": "CustomEvaluator",
490        "metric_type": "LLM-Based Quality",
491        "category": "Custom Category"
492    }
493 
494    prompt = """Your custom prompt here..."""
495```
496 
497**Examples:**
498- [Custom Rules](examples/register/sdk_register_rule.py)
499- [Custom Models](examples/register/sdk_register_llm.py)
500 
501### Agent-Based Evaluation with Tools
502 
503Dingo supports agent-based evaluators that can use external tools for multi-step reasoning and adaptive context gathering:
504 
505```python
506from dingo.io import Data
507from dingo.io.output.eval_detail import EvalDetail
508from dingo.model import Model
509from dingo.model.llm.agent.base_agent import BaseAgent
510 
511@Model.llm_register('MyAgent')
512class MyAgent(BaseAgent):
513    """Custom agent with tool support"""
514 
515    available_tools = ["tavily_search", "my_custom_tool"]
516    max_iterations = 5
517 
518    @classmethod
519    def eval(cls, input_data: Data) -> EvalDetail:
520        # Use tools for fact-checking
521        search_result = cls.execute_tool('tavily_search', query=input_data.content)
522 
523        # Multi-step reasoning with LLM
524        result = cls.send_messages([...])
525 
526        return EvalDetail(...)
527```
528 
529**Built-in Agent:**
530- `AgentHallucination`: Enhanced hallucination detection with web search fallback
531 
532**Configuration Example:**
533```json
534{
535  "evaluator": [{
536    "evals": [{
537      "name": "AgentHallucination",
538      "config": {
539        "key": "openai-api-key",
540        "model": "gpt-4",
541        "parameters": {
542          "agent_config": {
543            "max_iterations": 5,
544            "tools": {
545              "tavily_search": {"api_key": "tavily-key"}
546            }
547          }
548        }
549      }
550    }]
551  }]
552}
553```
554 
555**Learn More:**
556- [Agent Development Guide](docs/agent_development_guide.md) - Comprehensive guide for creating custom agents and tools
557- [AgentHallucination Example](examples/agent/agent_hallucination_example.py) - Production agent example
558- [AgentFactCheck Example](examples/agent/agent_executor_example.py) - LangChain agent example
559 
560## ⚙️ Execution Modes
561 
562### Local Executor (Development & Small-Scale)
563 
564```python
565from dingo.config import InputArgs
566from dingo.exec import Executor
567 
568input_args = InputArgs(**input_data)
569executor = Executor.exec_map["local"](input_args)
570result = executor.execute()
571 
572# Access results
573summary = executor.get_summary()           # Overall metrics
574bad_data = executor.get_bad_info_list()    # Quality issues
575good_data = executor.get_good_info_list()  # High-quality data
576```
577 
578**Best For**: Rapid iteration, debugging, datasets < 100K rows
579 
580### Spark Executor (Production & Large-Scale)
581 
582```python
583from pyspark.sql import SparkSession
584from dingo.exec import Executor
585 
586spark = SparkSession.builder.appName("Dingo").getOrCreate()
587spark_rdd = spark.sparkContext.parallelize(your_data)
588 
589executor = Executor.exec_map["spark"](
590    input_args,
591    spark_session=spark,
592    spark_rdd=spark_rdd
593)
594result = executor.execute()
595```
596 
597**Best For**: Production pipelines, distributed processing, datasets > 1M rows
598 
599## Evaluation Reports
600 
601After evaluation, Dingo generates:
602 
6031. **Summary Report** (`summary.json`): Overall metrics and scores
6042. **Detailed Reports**: Specific issues for each rule violation
605 
606Report Description:
6071. **score**: `num_good` / `total`
6082. **type_ratio**: The count of type / total, such as: `QUALITY_BAD_COMPLETENESS` / `total`
609 
610Example summary:
611```json
612{
613    "task_id": "d6c922ec-981c-11ef-b723-7c10c9512fac",
614    "task_name": "dingo",
615    "eval_group": "default",
616    "input_path": "test/data/test_local_jsonl.jsonl",
617    "output_path": "outputs/d6c921ac-981c-11ef-b723-7c10c9512fac",
618    "create_time": "20241101_144510",
619    "score": 50.0,
620    "num_good": 1,
621    "num_bad": 1,
622    "total": 2,
623    "type_ratio": {
624        "content": {
625            "QUALITY_BAD_COMPLETENESS.RuleColonEnd": 0.5,
626            "QUALITY_BAD_RELEVANCE.RuleSpecialCharacter": 0.5
627        }
628    }
629}
630```
631 
632# 🚀 Roadmap & Contributions
633 
634## Future Plans
635 
636- [ ] **Agent-as-a-Judge** - Multi-agent debate patterns for bias reduction and complex reasoning
637- [ ] **SaaS Platform** - Hosted evaluation service with API access and dashboard
638- [ ] **Audio & Video Modalities** - Extend beyond text/image
639- [ ] **Diversity Metrics** - Statistical diversity assessment
640- [ ] **Real-time Monitoring** - Continuous quality checks in production pipelines
641 
642## Limitations
643 
644The current built-in detection rules and model methods primarily focus on common data quality issues. For special evaluation needs, we recommend customizing detection rules.
645 
646# Acknowledgments
647 
648- [RedPajama-Data](https://github.com/togethercomputer/RedPajama-Data)
649- [mlflow](https://github.com/mlflow/mlflow)
650- [deepeval](https://github.com/confident-ai/deepeval)
651- [ragas](https://github.com/explodinggradients/ragas)
652 
653# Contribution
654 
655We appreciate all the contributors for their efforts to improve and enhance `Dingo`. Please refer to the [Contribution Guide](docs/en/CONTRIBUTING.md) for guidance on contributing to the project.
656 
657# License
658 
659This project uses the [Apache 2.0 Open Source License](LICENSE).
660 
661This project uses fasttext for some functionality including language detection. fasttext is licensed under the MIT License, which is compatible with our Apache 2.0 license and provides flexibility for various usage scenarios.
662 
663# Citation
664 
665If you find this project useful, please consider citing our tool:
666 
667```
668@misc{dingo,
669  title={Dingo: A Comprehensive AI Data Quality Evaluation Tool for Large Models},
670  author={Dingo Contributors},
671  howpublished={\url{https://github.com/MigoXLab/dingo}},
672  year={2024}
673}
674```
675
Full transparency — inspect the skill content before installing.