How do I install AI Research Engineering Skills Library?

Install AI Research Engineering Skills Library with a single command: npx mdskills install Orchestra-Research/ai-research-skills. This downloads the skill files into your project and your AI agent picks them up automatically.
What platforms support AI Research Engineering Skills Library?

AI Research Engineering Skills Library works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code, Factory. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.
← Back to skills
AI Research Engineering Skills Library

Name: AI Research Engineering Skills Library: AI Agent Skill
Brand: Orchestra-Research
Availability: InStock
Rating: 8 (1 reviews)
Author: Orchestra-Research
Verified
Data & AnalyticsIntermediate
View All 21 Categories - Our Mission - Path Towards AI Research Agent - Available AI Research Engineering Skills - Skill Structure - Repository Structure - Use Cases - Contributing - Community We provide the layer of Engineering Ability that enable your coding agent to write and conduct AI research experiments, including preparing datasets, executing training pipelines, deploying models, and build
by @Orchestra-Research8 downloads3,811Updated 2/24/2026
Add this skill
npx mdskills install Orchestra-Research/ai-research-skills
Fork & Edit
Are you @Orchestra-Research? Sign in with GitHub to claim this listing.
Skill Advisor8.0
Comprehensive collection of 85 expert-level AI research skills across 21 categories with production-ready workflows
+Covers full AI research lifecycle from tokenization to deployment with deep expertise
+Provides clear installation paths via CLI and Claude Code marketplace integration
+Organizes skills into well-defined categories with specific tool counts and descriptions
-Lacks concrete examples of individual skill structure and agent instruction format
-Does not specify what actionable directives or workflows each skill contains
SKILL.md
Edit in Browser
1# AI Research Engineering `Skills` Library
2 
3> **The most comprehensive open-source library of AI research engineering skills for AI agents**
4 
5<p align="center">
6  <img src="docs/assets/promo.gif" alt="AI Research Skills Demo" width="700">
7</p>
8 
9<p align="center">
10  <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
11  <a href="https://www.npmjs.com/package/@orchestra-research/ai-research-skills"><img src="https://img.shields.io/npm/v/@orchestra-research/ai-research-skills.svg" alt="npm version"></a>
12  <a href="https://www.orchestra-research.com/perspectives/ai-research-skills"><img src="https://img.shields.io/badge/Blog-Read%20More-orange.svg" alt="Blog Post"></a>
13  <a href="https://join.slack.com/t/orchestrarese-efu1990/shared_invite/zt-3iu6gr8io-zJvpkZTPToEviQ9KFZvNSg"><img src="https://img.shields.io/badge/Slack-Join%20Community-4A154B.svg?logo=slack" alt="Slack"></a>
14  <a href="https://x.com/orch_research"><img src="https://img.shields.io/badge/Twitter-Follow-1DA1F2.svg?logo=x" alt="Twitter"></a>
15  <a href="https://www.linkedin.com/company/orchestra-research/"><img src="https://img.shields.io/badge/LinkedIn-Follow-0A66C2.svg?logo=linkedin" alt="LinkedIn"></a>
16</p>
17 
18<div align="center">
19 
20### **85 Skills Powering AI Research in 2026**
21 
22</div>
23 
24<details>
25<summary><b>View All 21 Categories</b></summary>
26 
27<div align="center">
28 
29| | | |
30|:---:|:---:|:---:|
31| **Model Architecture** (5) | **Fine-Tuning** (4) | **Post-Training** (8) |
32| **Distributed Training** (6) | **Optimization** (6) | **Inference** (4) |
33| **Tokenization** (2) | **Data Processing** (2) | **Evaluation** (3) |
34| **Safety & Alignment** (4) | **Agents** (4) | **RAG** (5) |
35| **Multimodal** (7) | **Prompt Engineering** (4) | **MLOps** (3) |
36| **Observability** (2) | **Infrastructure** (3) | **Mech Interp** (4) |
37| **Emerging Techniques** (6) | **ML Paper Writing** (1) | **Ideation** (2) |
38 
39</div>
40 
41</details>
42 
43---
44 
45## Table of Contents
46 
47- [Our Mission](#our-mission)
48- [Path Towards AI Research Agent](#path-towards-ai-research-agent)
49- [Available AI Research Engineering Skills](#available-ai-research-engineering-skills)
50- [Demos](#demos)
51- [Skill Structure](#skill-structure)
52- [Roadmap](#roadmap)
53- [Repository Structure](#repository-structure)
54- [Use Cases](#use-cases)
55- [Contributing](#contributing)
56- [Community](#community)
57 
58 
59## Our Mission
60 
61We provide the layer of **Engineering Ability** that **enable your coding agent to write and conduct AI research experiments**, including preparing datasets, executing training pipelines, deploying models, and building your AI agents.
62<p align="center">
63  <img src="docs/skills.png" alt="AI Research Agent System" width="50%">
64  <br>
65  <em>System diagram of an AI research agent</em>
66</p>
67 
68## Path Towards AI Research Agent
69 
70Modern AI research requires mastering dozens of specialized tools and frameworks. 
71AI Researchers spend more time debugging infrastructure than testing hypotheses—slowing the pace of scientific discovery. 
72We provide a comprehensive library of expert-level research engineering skills that enable AI agents to autonomously implement and execute different stages of AI research experiments—from data preparation and model training to evaluation and deployment.
73  - Specialized Expertise - Each skill provides deep, production-ready knowledge of a specific framework (Megatron-LM, vLLM, TRL, etc.)
74  - End-to-End Coverage - 85 skills spanning the full AI research lifecycle, from model architecture to deployment
75  - Research-Grade Quality - Documentation sourced from official repos, real GitHub issues, and battle-tested production workflows
76 
77## Available AI Research Engineering Skills
78 
79**Quality over quantity**: Each skill provides comprehensive, expert-level guidance with real code examples, troubleshooting guides, and production-ready workflows.
80 
81### 📦 Quick Install (Recommended)
82 
83Install skills to **any coding agent** (Claude Code, OpenCode, Cursor, Codex, Gemini CLI, Qwen Code) with one command:
84 
85```bash
86npx @orchestra-research/ai-research-skills
87```
88 
89This launches an interactive installer that:
90- **Auto-detects** your installed coding agents
91- **Installs** skills to `~/.orchestra/skills/` with symlinks to each agent
92- **Offers** everything, quickstart bundle, by category, or individual skills
93- **Updates** installed skills with latest versions
94- **Uninstalls** all or selected skills
95 
96<details>
97<summary><b>CLI Commands</b></summary>
98 
99```bash
100# Interactive installer (recommended)
101npx @orchestra-research/ai-research-skills
102 
103# Direct commands
104npx @orchestra-research/ai-research-skills list      # View installed skills
105npx @orchestra-research/ai-research-skills update    # Update installed skills
106```
107 
108</details>
109 
110<details>
111<summary><b>Claude Code Marketplace (Alternative)</b></summary>
112 
113Install skill categories directly using the **Claude Code CLI**:
114 
115```bash
116# Add the marketplace
117/plugin marketplace add orchestra-research/AI-research-SKILLs
118 
119# Install by category (21 categories available)
120/plugin install fine-tuning@ai-research-skills        # Axolotl, LLaMA-Factory, PEFT, Unsloth
121/plugin install post-training@ai-research-skills      # TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge
122/plugin install inference-serving@ai-research-skills  # vLLM, TensorRT-LLM, llama.cpp, SGLang
123/plugin install distributed-training@ai-research-skills
124/plugin install optimization@ai-research-skills
125```
126 
127</details>
128 
129### All 21 Categories (85 Skills)
130 
131| Category | Skills | Included |
132|----------|--------|----------|
133| Model Architecture | 5 | LitGPT, Mamba, NanoGPT, RWKV, TorchTitan |
134| Tokenization | 2 | HuggingFace Tokenizers, SentencePiece |
135| Fine-Tuning | 4 | Axolotl, LLaMA-Factory, PEFT, Unsloth |
136| Mech Interp | 4 | TransformerLens, SAELens, pyvene, nnsight |
137| Data Processing | 2 | NeMo Curator, Ray Data |
138| Post-Training | 8 | TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge |
139| Safety | 4 | Constitutional AI, LlamaGuard, NeMo Guardrails, Prompt Guard |
140| Distributed | 6 | DeepSpeed, FSDP, Accelerate, Megatron-Core, Lightning, Ray Train |
141| Infrastructure | 3 | Modal, Lambda Labs, SkyPilot |
142| Optimization | 6 | Flash Attention, bitsandbytes, GPTQ, AWQ, HQQ, GGUF |
143| Evaluation | 3 | lm-eval-harness, BigCode, NeMo Evaluator |
144| Inference | 4 | vLLM, TensorRT-LLM, llama.cpp, SGLang |
145| MLOps | 3 | W&B, MLflow, TensorBoard |
146| Agents | 4 | LangChain, LlamaIndex, CrewAI, AutoGPT |
147| RAG | 5 | Chroma, FAISS, Pinecone, Qdrant, Sentence Transformers |
148| Prompt Eng | 4 | DSPy, Instructor, Guidance, Outlines |
149| Observability | 2 | LangSmith, Phoenix |
150| Multimodal | 7 | CLIP, Whisper, LLaVA, BLIP-2, SAM, Stable Diffusion, AudioCraft |
151| Emerging | 6 | MoE, Model Merging, Long Context, Speculative Decoding, Distillation, Pruning |
152| ML Paper Writing | 1 | ML Paper Writing (LaTeX templates, citation verification) |
153| Ideation | 2 | Research Brainstorming, Creative Thinking |
154 
155<details>
156<summary><b>View All 85 Skills in Details</b></summary>
157 
158### 🏗️ Model Architecture (5 skills)
159- **[LitGPT](01-model-architecture/litgpt/)** - Lightning AI's 20+ clean LLM implementations with production training recipes (462 lines + 4 refs)
160- **[Mamba](01-model-architecture/mamba/)** - State-space models with O(n) complexity, 5× faster than Transformers (253 lines + 3 refs)
161- **[RWKV](01-model-architecture/rwkv/)** - RNN+Transformer hybrid, infinite context, Linux Foundation project (253 lines + 3 refs)
162- **[NanoGPT](01-model-architecture/nanogpt/)** - Educational GPT in ~300 lines by Karpathy (283 lines + 3 refs)
163- **[TorchTitan](01-model-architecture/torchtitan/)** - PyTorch-native distributed training for Llama 3.1 with 4D parallelism
164 
165### 🔤 Tokenization (2 skills)
166- **[HuggingFace Tokenizers](02-tokenization/huggingface-tokenizers/)** - Rust-based, <20s/GB, BPE/WordPiece/Unigram algorithms (486 lines + 4 refs)
167- **[SentencePiece](02-tokenization/sentencepiece/)** - Language-independent, 50k sentences/sec, used by T5/ALBERT (228 lines + 2 refs)
168 
169### 🎯 Fine-Tuning (4 skills)
170- **[Axolotl](03-fine-tuning/axolotl/)** - YAML-based fine-tuning with 100+ models (156 lines + 4 refs)
171- **[LLaMA-Factory](03-fine-tuning/llama-factory/)** - WebUI no-code fine-tuning (78 lines + 5 refs)
172- **[Unsloth](03-fine-tuning/unsloth/)** - 2x faster QLoRA fine-tuning (75 lines + 4 refs)
173- **[PEFT](03-fine-tuning/peft/)** - Parameter-efficient fine-tuning with LoRA, QLoRA, DoRA, 25+ methods (431 lines + 2 refs)
174 
175### 🔬 Mechanistic Interpretability (4 skills)
176- **[TransformerLens](04-mechanistic-interpretability/transformer-lens/)** - Neel Nanda's library for mech interp with HookPoints, activation caching (346 lines + 3 refs)
177- **[SAELens](04-mechanistic-interpretability/saelens/)** - Sparse Autoencoder training and analysis for feature discovery (386 lines + 3 refs)
178- **[pyvene](04-mechanistic-interpretability/pyvene/)** - Stanford's causal intervention library with declarative configs (473 lines + 3 refs)
179- **[nnsight](04-mechanistic-interpretability/nnsight/)** - Remote interpretability via NDIF, run experiments on 70B+ models (436 lines + 3 refs)
180 
181 
182### 📊 Data Processing (2 skills)
183- **[Ray Data](05-data-processing/ray-data/)** - Distributed ML data processing, streaming execution, GPU support (318 lines + 2 refs)
184- **[NeMo Curator](05-data-processing/nemo-curator/)** - GPU-accelerated data curation, 16× faster deduplication (375 lines + 2 refs)
185 
186### 🎓 Post-Training (8 skills)
187- **[TRL Fine-Tuning](06-post-training/trl-fine-tuning/)** - Transformer Reinforcement Learning (447 lines + 4 refs)
188- **[GRPO-RL-Training](06-post-training/grpo-rl-training/)** (TRL) - Group Relative Policy Optimization with TRL (569 lines, **gold standard**)
189- **[OpenRLHF](06-post-training/openrlhf/)** - Full RLHF pipeline with Ray + vLLM (241 lines + 4 refs)
190- **[SimPO](06-post-training/simpo/)** - Simple Preference Optimization, no reference model needed (211 lines + 3 refs)
191- **[verl](06-post-training/verl/)** - ByteDance's HybridFlow RL framework, FSDP/Megatron + vLLM/SGLang backends (389 lines + 2 refs)
192- **[slime](06-post-training/slime/)** - THUDM's Megatron+SGLang framework powering GLM-4.x models (464 lines + 2 refs)
193- **[miles](06-post-training/miles/)** - Enterprise fork of slime with FP8, INT4, speculative RL for MoE training (315 lines + 2 refs)
194- **[torchforge](06-post-training/torchforge/)** - Meta's PyTorch-native RL with Monarch+TorchTitan+vLLM (380 lines + 2 refs)
195 
196### 🛡️ Safety & Alignment (4 skills)
197- **[Constitutional AI](07-safety-alignment/constitutional-ai/)** - AI-driven self-improvement via principles (282 lines)
198- **[LlamaGuard](07-safety-alignment/llamaguard/)** - Safety classifier for LLM inputs/outputs (329 lines)
199- **[NeMo Guardrails](07-safety-alignment/nemo-guardrails/)** - Programmable guardrails with Colang (289 lines)
200- **[Prompt Guard](07-safety-alignment/prompt-guard/)** - Meta's 86M prompt injection & jailbreak detector, 99%+ TPR, <2ms GPU (313 lines)
201 
202### ⚡ Distributed Training (6 skills)
203- **[Megatron-Core](08-distributed-training/megatron-core/)** - NVIDIA's framework for training 2B-462B param models with 47% MFU on H100 (359 lines + 4 refs)
204- **[DeepSpeed](08-distributed-training/deepspeed/)** - Microsoft's ZeRO optimization (137 lines + 9 refs)
205- **[PyTorch FSDP2](08-distributed-training/pytorch-fsdp2/)** - Fully Sharded Data Parallel v2 with `fully_shard` and DTensor (231 lines + 12 refs)
206- **[Accelerate](08-distributed-training/accelerate/)** - HuggingFace's 4-line distributed training API (324 lines + 3 refs)
207- **[PyTorch Lightning](08-distributed-training/pytorch-lightning/)** - High-level training framework with Trainer class (339 lines + 3 refs)
208- **[Ray Train](08-distributed-training/ray-train/)** - Multi-node orchestration and hyperparameter tuning (399 lines + 1 ref)
209 
210### 🚀 Optimization (6 skills)
211- **[Flash Attention](10-optimization/flash-attention/)** - 2-4x faster attention with memory efficiency (359 lines + 2 refs)
212- **[bitsandbytes](10-optimization/bitsandbytes/)** - 8-bit/4-bit quantization for 50-75% memory reduction (403 lines + 3 refs)
213- **[GPTQ](10-optimization/gptq/)** - 4-bit post-training quantization, 4× memory reduction, <2% accuracy loss (443 lines + 3 refs)
214- **[AWQ](10-optimization/awq/)** - Activation-aware weight quantization, 4-bit with minimal accuracy loss (310 lines + 2 refs)
215- **[HQQ](10-optimization/hqq/)** - Half-Quadratic Quantization, no calibration data needed, multi-backend (370 lines + 2 refs)
216- **[GGUF](10-optimization/gguf/)** - llama.cpp quantization format, K-quant methods, CPU/Metal inference (380 lines + 2 refs)
217 
218### 📊 Evaluation (3 skills)
219- **[lm-evaluation-harness](11-evaluation/lm-evaluation-harness/)** - EleutherAI's standard for benchmarking LLMs across 60+ tasks (482 lines + 4 refs)
220- **[BigCode Evaluation Harness](11-evaluation/bigcode-evaluation-harness/)** - Code model benchmarking with HumanEval, MBPP, MultiPL-E, pass@k metrics (406 lines + 3 refs)
221- **[NeMo Evaluator](11-evaluation/nemo-evaluator/)** - NVIDIA's enterprise platform for 100+ benchmarks across 18+ harnesses with multi-backend execution (454 lines + 4 refs)
222 
223### ☁️ Infrastructure (3 skills)
224- **[Modal](09-infrastructure/modal/)** - Serverless GPU cloud with Python-native API, T4-H200 on-demand (342 lines + 2 refs)
225- **[SkyPilot](09-infrastructure/skypilot/)** - Multi-cloud orchestration across 20+ providers with spot recovery (390 lines + 2 refs)
226- **[Lambda Labs](09-infrastructure/lambda-labs/)** - Reserved/on-demand GPU cloud with H100/A100, persistent filesystems (390 lines + 2 refs)
227 
228### 🔥 Inference & Serving (4 skills)
229- **[vLLM](12-inference-serving/vllm/)** - High-throughput LLM serving with PagedAttention (356 lines + 4 refs, **production-ready**)
230- **[TensorRT-LLM](12-inference-serving/tensorrt-llm/)** - NVIDIA's fastest inference, 24k tok/s, FP8/INT4 quantization (180 lines + 3 refs)
231- **[llama.cpp](12-inference-serving/llama-cpp/)** - CPU/Apple Silicon inference, GGUF quantization (251 lines + 3 refs)
232- **[SGLang](12-inference-serving/sglang/)** - Structured generation with RadixAttention, 5-10× faster for agents (435 lines + 3 refs)
233 
234### 🤖 Agents (4 skills)
235- **[LangChain](14-agents/langchain/)** - Most popular agent framework, 500+ integrations, ReAct pattern (658 lines + 3 refs, **production-ready**)
236- **[LlamaIndex](14-agents/llamaindex/)** - Data framework for LLM apps, 300+ connectors, RAG-focused (535 lines + 3 refs)
237- **[CrewAI](14-agents/crewai/)** - Multi-agent orchestration, role-based collaboration, autonomous workflows (498 lines + 3 refs)
238- **[AutoGPT](14-agents/autogpt/)** - Autonomous AI agent platform, visual workflow builder, continuous execution (400 lines + 2 refs)
239 
240### 🔍 RAG (5 skills)
241- **[Chroma](15-rag/chroma/)** - Open-source embedding database, local/cloud, 24k stars (385 lines + 1 ref)
242- **[FAISS](15-rag/faiss/)** - Facebook's similarity search, billion-scale, GPU acceleration (295 lines)
243- **[Sentence Transformers](15-rag/sentence-transformers/)** - 5000+ embedding models, multilingual, 15k stars (370 lines)
244- **[Pinecone](15-rag/pinecone/)** - Managed vector database, auto-scaling, <100ms latency (410 lines)
245- **[Qdrant](15-rag/qdrant/)** - High-performance vector search, Rust-powered, hybrid search with filtering (493 lines + 2 refs)
246 
247### 🎨 Multimodal (7 skills)
248- **[CLIP](18-multimodal/clip/)** - OpenAI's vision-language model, zero-shot classification, 25k stars (320 lines)
249- **[Whisper](18-multimodal/whisper/)** - Robust speech recognition, 99 languages, 73k stars (395 lines)
250- **[LLaVA](18-multimodal/llava/)** - Vision-language assistant, image chat, GPT-4V level (360 lines)
251- **[Stable Diffusion](18-multimodal/stable-diffusion/)** - Text-to-image generation via HuggingFace Diffusers, SDXL, ControlNet (380 lines + 2 refs)
252- **[Segment Anything](18-multimodal/segment-anything/)** - Meta's SAM for zero-shot image segmentation with points/boxes (500 lines + 2 refs)
253- **[BLIP-2](18-multimodal/blip-2/)** - Vision-language pretraining with Q-Former, image captioning, VQA (500 lines + 2 refs)
254- **[AudioCraft](18-multimodal/audiocraft/)** - Meta's MusicGen/AudioGen for text-to-music and text-to-sound (470 lines + 2 refs)
255 
256### 🎯 Prompt Engineering (4 skills)
257- **[DSPy](16-prompt-engineering/dspy/)** - Declarative prompt programming with optimizers, Stanford NLP, 22k stars (438 lines + 3 refs)
258- **[Instructor](16-prompt-engineering/instructor/)** - Structured LLM outputs with Pydantic validation, 15k stars (726 lines + 3 refs)
259- **[Guidance](16-prompt-engineering/guidance/)** - Constrained generation with regex/grammars, Microsoft Research, 18k stars (485 lines + 3 refs)
260- **[Outlines](16-prompt-engineering/outlines/)** - Structured text with FSM, zero-overhead, 8k stars (601 lines + 3 refs)
261 
262### 📊 MLOps (3 skills)
263- **[Weights & Biases](13-mlops/weights-and-biases/)** - Experiment tracking, sweeps, artifacts, model registry (427 lines + 3 refs)
264- **[MLflow](13-mlops/mlflow/)** - Model registry, tracking, deployment, autologging (514 lines + 3 refs)
265- **[TensorBoard](13-mlops/tensorboard/)** - Visualization, profiling, embeddings, scalars/images (538 lines + 3 refs)
266 
267### 👁️ Observability (2 skills)
268- **[LangSmith](17-observability/langsmith/)** - LLM observability, tracing, evaluation, monitoring for AI apps (422 lines + 2 refs)
269- **[Phoenix](17-observability/phoenix/)** - Open-source AI observability with OpenTelemetry tracing and LLM evaluation (380 lines + 2 refs)
270 
271### 🔬 Emerging Techniques (6 skills)
272- **[MoE Training](19-emerging-techniques/moe-training/)** - Mixture of Experts training with DeepSpeed, Mixtral 8x7B, 5× cost reduction (515 lines + 3 refs)
273- **[Model Merging](19-emerging-techniques/model-merging/)** - Combine models with TIES, DARE, SLERP using mergekit (528 lines + 3 refs)
274- **[Long Context](19-emerging-techniques/long-context/)** - Extend context windows with RoPE, YaRN, ALiBi, 32k-128k tokens (624 lines + 3 refs)
275- **[Speculative Decoding](19-emerging-techniques/speculative-decoding/)** - 1.5-3.6× faster inference with Medusa, Lookahead (379 lines)
276- **[Knowledge Distillation](19-emerging-techniques/knowledge-distillation/)** - Compress models 70B→7B with MiniLLM, temperature scaling (424 lines)
277- **[Model Pruning](19-emerging-techniques/model-pruning/)** - 50% sparsity with Wanda, SparseGPT, <1% accuracy loss (417 lines)
278 
279### 📝 ML Paper Writing (1 skill)
280- **[ML Paper Writing](20-ml-paper-writing/)** - Write publication-ready papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM with LaTeX templates, citation verification, and writing best practices (532 lines + 5 refs)
281 
282### 💡 Ideation (2 skills)
283- **[Research Brainstorming](21-research-ideation/brainstorming-research-ideas/)** - Structured ideation frameworks for discovering high-impact research directions with 10 complementary lenses (384 lines)
284- **[Creative Thinking](21-research-ideation/creative-thinking-for-research/)** - Cognitive science frameworks (bisociation, structure-mapping, constraint manipulation) for genuinely novel research ideas (366 lines)
285 
286 
287</details>
288 
289## Demos
290 
291All 85 skills in this repo are automatically synced to [Orchestra Research](https://www.orchestra-research.com/research-skills), where you can add them to your projects with one click and use them with AI research agents.
292 
293**See skills in action → [demos/](demos/README.md)**
294 
295We maintain a curated collection of demo repositories showing how to use skills for real AI research tasks:
296 
297| Demo | Skills Used | What It Does |
298|------|-------------|--------------|
299| **[NeMo Eval: GPQA Benchmark](https://github.com/zechenzhangAGI/Nemo-Eval-Skill-Demo)** | NeMo Evaluator | Compare Llama 8B/70B/405B on graduate-level science questions |
300| **[LoRA Without Regret Reproduction](https://www.orchestra-research.com/perspectives/LLM-with-Orchestra)** | GRPO, TRL | Reproduce SFT + GRPO RL experiments via prompting |
301| **ML Paper Writing** *(coming soon)* | ML Paper Writing | Transform research repo → publication-ready paper |
302| **[Layer-Wise Quantization Experiment](https://github.com/AmberLJC/llama-quantization-experiment)** | llama.cpp, GGUF | Investigate optimal layer precision allocation—early layers at Q8 achieve 1.9× compression with 1.3% perplexity loss |
303| **[Cross-Lingual Alignment Analysis](https://github.com/AmberLJC/faiss-demo)** | FAISS | Quantify how well multilingual embeddings align semantic concepts across 8 languages using FAISS similarity search |
304 
305**Featured Demo**: Reproduce Thinking Machines Lab's "LoRA Without Regret" paper **by simply prompting an AI agent**. The agent autonomously writes training code for both SFT and GRPO reinforcement learning, provisions H100 GPUs, runs LoRA rank ablation experiments overnight, and generates publication-ready analysis. No manual coding required—just describe what you want to reproduce. ([Blog](https://www.orchestra-research.com/perspectives/LLM-with-Orchestra) | [Video](https://www.youtube.com/watch?v=X0DoLYfXl5I))
306 
307## Skill Structure
308 
309Each skill follows a battle-tested format for maximum usefulness:
310 
311```
312skill-name/
313├── SKILL.md                    # Quick reference (50-150 lines)
314│   ├── Metadata (name, description, version)
315│   ├── When to use this skill
316│   ├── Quick patterns & examples
317│   └── Links to references
318│
319├── references/                 # Deep documentation (300KB+)
320│   ├── README.md              # From GitHub/official docs
321│   ├── api.md                 # API reference
322│   ├── tutorials.md           # Step-by-step guides
323│   ├── issues.md              # Real GitHub issues & solutions
324│   ├── releases.md            # Version history & breaking changes
325│   └── file_structure.md      # Codebase navigation
326│
327├── scripts/                    # Helper scripts (optional)
328└── assets/                     # Templates & examples (optional)
329```
330 
331<details>
332<summary><b>Quality Standards</b></summary>
333 
334- 300KB+ documentation from official sources
335- Real GitHub issues & solutions (when available)
336- Code examples with language detection
337- Version history & breaking changes
338- Links to official docs
339 
340</details>
341 
342## Roadmap
343 
344We're building towards 80 comprehensive skills across the full AI research lifecycle. See our [detailed roadmap](docs/ROADMAP.md) for the complete development plan.
345 
346[View Full Roadmap →](docs/ROADMAP.md)
347 
348<details>
349<summary><b>View Detailed Statistics</b></summary>
350 
351| Metric | Current | Target |
352|--------|---------|--------|
353| **Skills** | **85** (high-quality, standardized YAML) | 80 ✅ |
354| **Avg Lines/Skill** | **420 lines** (focused + progressive disclosure) | 200-600 lines |
355| **Documentation** | **~130,000 lines** total (SKILL.md + references) | 100,000+ lines |
356| **Gold Standard Skills** | **65** with comprehensive references | 50+ |
357| **Contributors** | 1 | 100+ |
358| **Coverage** | Architecture, Tokenization, Fine-Tuning, Mechanistic Interpretability, Data Processing, Post-Training, Safety, Distributed, Optimization, Evaluation, Infrastructure, Inference, Agents, RAG, Multimodal, Prompt Engineering, MLOps, Observability, ML Paper Writing, Ideation | Full Lifecycle ✅ |
359 
360**Recent Progress**: npm package `@orchestra-research/ai-research-skills` for one-command installation across all coding agents
361 
362**Philosophy**: Quality > Quantity. Following [Anthropic official best practices](anthropic_official_docs/best_practices.md) - each skill provides 200-500 lines of focused, actionable guidance with progressive disclosure.
363 
364</details>
365 
366 
367 
368## Repository Structure
369 
370```
371claude-ai-research-skills/
372├── README.md                    ← You are here
373├── CONTRIBUTING.md              ← Contribution guide
374├── demos/                       ← Curated demo gallery (links to demo repos)
375├── docs/ 
376├── 01-model-architecture/       (5 skills ✓ - LitGPT, Mamba, RWKV, NanoGPT, TorchTitan)
377├── 02-tokenization/             (2 skills ✓ - HuggingFace Tokenizers, SentencePiece)
378├── 03-fine-tuning/              (4 skills ✓ - Axolotl, LLaMA-Factory, Unsloth, PEFT)
379├── 04-mechanistic-interpretability/ (4 skills ✓ - TransformerLens, SAELens, pyvene, nnsight)
380├── 05-data-processing/          (2 skills ✓ - Ray Data, NeMo Curator)
381├── 06-post-training/            (8 skills ✓ - TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge)
382├── 07-safety-alignment/         (4 skills ✓ - Constitutional AI, LlamaGuard, NeMo Guardrails, Prompt Guard)
383├── 08-distributed-training/     (6 skills ✓ - Megatron-Core, DeepSpeed, FSDP, Accelerate, Lightning, Ray Train)
384├── 09-infrastructure/           (3 skills ✓ - Modal, SkyPilot, Lambda Labs)
385├── 10-optimization/             (6 skills ✓ - Flash Attention, bitsandbytes, GPTQ, AWQ, HQQ, GGUF)
386├── 11-evaluation/               (3 skills ✓ - lm-evaluation-harness, BigCode, NeMo Evaluator)
387├── 12-inference-serving/        (4 skills ✓ - vLLM, TensorRT-LLM, llama.cpp, SGLang)
388├── 13-mlops/                    (3 skills ✓ - Weights & Biases, MLflow, TensorBoard)
389├── 14-agents/                   (4 skills ✓ - LangChain, LlamaIndex, CrewAI, AutoGPT)
390├── 15-rag/                      (5 skills ✓ - Chroma, FAISS, Sentence Transformers, Pinecone, Qdrant)
391├── 16-prompt-engineering/       (4 skills ✓ - DSPy, Instructor, Guidance, Outlines)
392├── 17-observability/            (2 skills ✓ - LangSmith, Phoenix)
393├── 18-multimodal/               (7 skills ✓ - CLIP, Whisper, LLaVA, Stable Diffusion, SAM, BLIP-2, AudioCraft)
394├── 19-emerging-techniques/      (6 skills ✓ - MoE, Model Merging, Long Context, Speculative Decoding, Distillation, Pruning)
395├── 20-ml-paper-writing/         (1 skill ✓ - ML Paper Writing with LaTeX templates)
396├── 21-research-ideation/                 (2 skills ✓ - Research Brainstorming, Creative Thinking)
397└── packages/ai-research-skills/ (npm package for one-command installation)
398```
399 
400## Use Cases
401 
402### For Researchers
403"I need to fine-tune Llama 3 with custom data"
404→ **03-fine-tuning/axolotl/** - YAML configs, 100+ model support
405 
406### For ML Engineers
407"How do I optimize inference latency?"
408→ **12-inference-serving/vllm/** - PagedAttention, batching
409 
410### For Students
411"I want to learn how transformers work"
412→ **01-model-architecture/litgpt/** - Clean implementations
413 
414### For Teams
415"We need to scale training to 100 GPUs"
416→ **08-distributed-training/deepspeed/** - ZeRO stages, 3D parallelism
417 
418## License
419 
420MIT License - See [LICENSE](LICENSE) for details.
421 
422**Note**: Individual skills may reference libraries with different licenses. Please check each project's license before use.
423 
424## Acknowledgments
425 
426Built with:
427- **[Claude Code](https://www.claude.com/product/claude-code)** - AI pair programming
428- **[Skill Seeker](https://github.com/yusufkaraaslan/Skill_Seekers)** - Automated doc scraping
429- **Open Source AI Community** - For amazing tools and docs
430 
431Special thanks to:
432- EleutherAI, HuggingFace, NVIDIA, Lightning AI, Meta AI, Anthropic
433- All researchers who maintain excellent documentation
434 
435 
436## Contributing
437 
438We welcome contributions from the AI research community! See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines on:
439 
440- Adding new skills
441- Improving existing skills
442- Quality standards and best practices
443- Submission process
444 
445All contributors are featured in our [Contributors Hall of Fame](CONTRIBUTORS.md) 🌟
446 
447 
448## Recent Updates
449 
450<details open>
451<summary><b>February 2026 - v0.15.0 🛡️ Prompt Guard & 83 Skills</b></summary>
452 
453- 🛡️ **NEW SKILL**: Prompt Guard - Meta's 86M prompt injection & jailbreak detector
454- ⚡ 99%+ TPR, <1% FPR, <2ms GPU latency, multilingual (8 languages)
455- 🔒 3 workflows: user input filtering, third-party data filtering, batch RAG processing
456- 📊 **83 total skills** across 20 categories
457 
458</details>
459 
460<details>
461<summary><b>January 2026 - v0.14.0 📦 npm Package & 82 Skills</b></summary>
462 
463- 📦 **NEW**: `npx @orchestra-research/ai-research-skills` - One-command installation for all coding agents
464- 🤖 **Supported agents**: Claude Code, OpenCode, Cursor, Codex, Gemini CLI, Qwen Code
465- ✨ Interactive installer with category/individual skill selection
466- 🔄 Update installed skills, selective uninstall
467- 📊 **82 total skills** (5 new post-training skills: verl, slime, miles, torchforge + TorchTitan)
468- 🏗️ Megatron-Core moved to Distributed Training category
469 
470</details>
471 
472<details>
473<summary><b>January 2026 - v0.13.0 📝 ML Paper Writing & Demos Gallery</b></summary>
474 
475- 📝 **NEW CATEGORY**: ML Paper Writing (20th category, 77th skill)
476- 🎯 Write publication-ready papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM
477- 📚 Writing philosophy from top researchers (Neel Nanda, Farquhar, Gopen & Swan, Lipton, Perez)
478- 🔬 Citation verification workflow - never hallucinate references
479- 📄 LaTeX templates for 6 major conferences
480- 🎪 **NEW**: Curated demos gallery (`demos/`) showcasing skills in action
481- 🔗 Demo repos: NeMo Evaluator benchmark, LoRA Without Regret reproduction
482- 📖 936-line comprehensive SKILL.md with 4 workflows
483 
484</details>
485 
486<details>
487<summary><b>January 2026 - v0.12.0 📊 NeMo Evaluator SDK</b></summary>
488 
489- 📊 **NEW SKILL**: NeMo Evaluator SDK for enterprise LLM benchmarking
490- 🔧 NVIDIA's evaluation platform with 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM)
491- ⚡ Multi-backend execution: local Docker, Slurm HPC, Lepton cloud
492- 📦 Container-first architecture for reproducible evaluation
493- 📝 454 lines SKILL.md + 4 comprehensive reference files (~48KB documentation)
494 
495</details>
496 
497<details>
498<summary><b>December 2025 - v0.11.0 🔬 Mechanistic Interpretability</b></summary>
499 
500- 🔬 **NEW CATEGORY**: Mechanistic Interpretability (4 skills)
501- 🔍 TransformerLens skill: Neel Nanda's library for mech interp with HookPoints, activation caching, circuit analysis
502- 🧠 SAELens skill: Sparse Autoencoder training and analysis for feature discovery, monosemanticity research
503- ⚡ pyvene skill: Stanford's causal intervention library with declarative configs, DAS, activation patching
504- 🌐 nnsight skill: Remote interpretability via NDIF, run experiments on 70B+ models without local GPUs
505- 📝 ~6,500 new lines of documentation across 16 files
506- **76 total skills** (filling the missing 04 category slot)
507 
508</details>
509 
510<details>
511<summary><b>November 25, 2025 - v0.10.0 🎉 70 Skills Complete!</b></summary>
512 
513- 🎉 **ROADMAP COMPLETE**: Reached 70-skill milestone!
514- 🚀 Added 4 skills: Lambda Labs, Segment Anything (SAM), BLIP-2, AudioCraft
515- ☁️ Lambda Labs skill: Reserved/on-demand GPU cloud with H100/A100, persistent filesystems, 1-Click Clusters
516- 🖼️ SAM skill: Meta's Segment Anything for zero-shot image segmentation with points/boxes/masks
517- 👁️ BLIP-2 skill: Vision-language pretraining with Q-Former, image captioning, VQA
518- 🎵 AudioCraft skill: Meta's MusicGen/AudioGen for text-to-music and text-to-sound generation
519- 📝 ~10,000 new lines of documentation across 12 files
520- **70 total skills** (100% roadmap complete!)
521 
522</details>
523 
524<details>
525<summary><b>November 25, 2025 - v0.9.0</b></summary>
526 
527- 🚀 Added 2 infrastructure skills: Modal, SkyPilot
528- ☁️ Modal skill: Serverless GPU cloud with Python-native API, T4-H200 on-demand, auto-scaling
529- 🌐 SkyPilot skill: Multi-cloud orchestration across 20+ providers with spot recovery
530- ✨ New Infrastructure category (2 skills - serverless GPU and multi-cloud orchestration)
531- 📝 ~2,500 new lines of documentation across 6 files
532- **66 total skills** (94% towards 70-skill target)
533 
534</details>
535 
536<details>
537<summary><b>November 25, 2025 - v0.8.0</b></summary>
538 
539- 🚀 Added 5 high-priority skills: HQQ, GGUF, Phoenix, AutoGPT, Stable Diffusion
540- ⚡ HQQ skill: Half-Quadratic Quantization without calibration data, multi-backend support
541- 📦 GGUF skill: llama.cpp quantization format, K-quant methods, CPU/Metal inference
542- 👁️ Phoenix skill: Open-source AI observability with OpenTelemetry tracing and LLM evaluation
543- 🤖 AutoGPT skill: Autonomous AI agent platform with visual workflow builder
544- 🎨 Stable Diffusion skill: Text-to-image generation via Diffusers, SDXL, ControlNet, LoRA
545- 📝 ~9,000 new lines of documentation across 15 files
546- **64 total skills** (91% towards 70-skill target)
547 
548</details>
549 
550<details>
551<summary><b>November 25, 2025 - v0.7.0</b></summary>
552 
553- 🚀 Added 5 high-priority skills: PEFT, CrewAI, Qdrant, AWQ, LangSmith
554- ✨ New Observability category with LangSmith for LLM tracing and evaluation
555- 🎯 PEFT skill: Parameter-efficient fine-tuning with LoRA, QLoRA, DoRA, 25+ methods
556- 🤖 CrewAI skill: Multi-agent orchestration with role-based collaboration
557- 🔍 Qdrant skill: High-performance Rust vector search with hybrid filtering
558- ⚡ AWQ skill: Activation-aware 4-bit quantization with minimal accuracy loss
559- 📝 ~8,000 new lines of documentation across 15 files
560- **59 total skills** (84% towards 70-skill target)
561 
562</details>
563 
564<details>
565<summary><b>November 15, 2025 - v0.6.0</b></summary>
566 
567- 📊 Added 3 comprehensive MLOps skills: Weights & Biases, MLflow, TensorBoard
568- ✨ New MLOps category (3 skills - experiment tracking, model registry, visualization)
569- 📝 ~10,000 new lines of documentation across 13 files
570- 🔧 Comprehensive coverage: experiment tracking, hyperparameter sweeps, model registry, profiling, embeddings visualization
571- **54 total skills** (77% towards 70-skill target)
572 
573</details>
574 
575<details>
576<summary><b>November 12, 2025 - v0.5.0</b></summary>
577 
578- 🎯 Added 4 comprehensive prompt engineering skills: DSPy, Instructor, Guidance, Outlines
579- ✨ New Prompt Engineering category (4 skills - DSPy, Instructor, Guidance, Outlines)
580- 📝 ~10,000 new lines of documentation across 16 files
581- 🔧 Comprehensive coverage: declarative programming, structured outputs, constrained generation, FSM-based generation
582- **47 total skills** (67% towards 70-skill target)
583 
584</details>
585 
586<details>
587<summary><b>November 9, 2025 - v0.4.0</b></summary>
588 
589- 🤖 Added 11 comprehensive skills: LangChain, LlamaIndex, Chroma, FAISS, Sentence Transformers, Pinecone, CLIP, Whisper, LLaVA
590- ✨ New Agents category (2 skills - LangChain, LlamaIndex)
591- 🔍 New RAG category (4 skills - Chroma, FAISS, Sentence Transformers, Pinecone)
592- 🎨 New Multimodal category (3 skills - CLIP, Whisper, LLaVA)
593- 📝 ~15,000 new lines of documentation
594- **43 total skills** (61% towards 70-skill target)
595 
596</details>
597 
598<details>
599<summary><b>November 8, 2025 - v0.3.0</b></summary>
600 
601- 🚀 Added 8 comprehensive skills: TensorRT-LLM, llama.cpp, SGLang, GPTQ, HuggingFace Tokenizers, SentencePiece, Ray Data, NeMo Curator
602- ⚡ Completed Inference & Serving category (4/4 skills)
603- 🔤 New Tokenization category (2 skills)
604- 📊 New Data Processing category (2 skills)
605- 📝 9,617 new lines of documentation across 30 files
606- **32 total skills** (45% towards 70-skill target)
607 
608</details>
609 
610<details>
611<summary><b>November 6, 2025 - v0.2.0</b></summary>
612 
613- Added 10 skills from GitHub (Megatron-Core, Lightning, Ray Train, etc.)
614- Improved skill structure with comprehensive references
615- Created strategic roadmap to 70 skills
616- Added contribution guidelines
617 
618</details>
619 
620<details>
621<summary><b>November 3, 2025 - v0.1.0</b></summary>
622 
623- 🎉 Initial release with 5 fine-tuning skills
624 
625</details>
626 
627## Community
628 
629Join our community to stay updated, ask questions, and connect with other AI researchers:
630 
631- **[SkillEvolve Meta-Skill](https://github.com/Skill-Evolve/meta-skill)** - Connect your agent to the collective intelligence of the community. Captures techniques discovered during sessions and shares them back as curated skills.
632- **[Slack Community](https://join.slack.com/t/orchestrarese-efu1990/shared_invite/zt-3iu6gr8io-zJvpkZTPToEviQ9KFZvNSg)** - Chat with the team and other users
633- **[Twitter/X](https://x.com/orch_research)** - Follow for updates and announcements
634- **[LinkedIn](https://www.linkedin.com/company/orchestra-research/)** - Connect professionally
635 
636## Star History
637 
638<a href="https://star-history.com/#orchestra-research/AI-research-SKILLs&Date">
639 <picture>
640   <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=orchestra-research/AI-research-SKILLs&type=Date&theme=dark" />
641   <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=orchestra-research/AI-research-SKILLs&type=Date" />
642   <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=orchestra-research/AI-research-SKILLs&type=Date" />
643 </picture>
644</a>
645
Full transparency — inspect the skill content before installing.
New to skill.md files?
See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.
Read the guide →