What is AI Research Engineering Skills Library?

AI Research Engineering Skills Library is a free, open-source AI agent skill. View All 21 Categories - Our Mission - Path Towards AI Research Agent - Available AI Research Engineering Skills - Skill Structure - Repository Structure - Use Cases - Contributing - Community We provide the layer of Engineering Ability that enable your coding agent to write and conduct AI research experiments, including preparing datasets, executing training pipelines, deploying models, and build

How do I install AI Research Engineering Skills Library?

Install AI Research Engineering Skills Library with a single command: npx mdskills install Orchestra-Research/ai-research-skills. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support AI Research Engineering Skills Library?

AI Research Engineering Skills Library works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code, Factory. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

AI Research Engineering `Skills` Library

Name: AI Research Engineering Skills Library: AI Agent Skill
Brand: Orchestra-Research
Availability: InStock
Rating: 8 (1 reviews)
Author: Orchestra-Research

The most comprehensive open-source library of AI research engineering skills for AI agents

85 Skills Powering AI Research in 2026

View All 21 Categories


Model Architecture (5)	Fine-Tuning (4)	Post-Training (8)
Distributed Training (6)	Optimization (6)	Inference (4)
Tokenization (2)	Data Processing (2)	Evaluation (3)
Safety & Alignment (4)	Agents (4)	RAG (5)
Multimodal (7)	Prompt Engineering (4)	MLOps (3)
Observability (2)	Infrastructure (3)	Mech Interp (4)
Emerging Techniques (6)	ML Paper Writing (1)	Ideation (2)

Our Mission
Path Towards AI Research Agent
Available AI Research Engineering Skills
Demos
Skill Structure
Roadmap
Repository Structure
Use Cases
Contributing
Community

Our Mission

We provide the layer of Engineering Ability that enable your coding agent to write and conduct AI research experiments, including preparing datasets, executing training pipelines, deploying models, and building your AI agents.

AI Research Agent System

System diagram of an AI research agent

Path Towards AI Research Agent

Modern AI research requires mastering dozens of specialized tools and frameworks. AI Researchers spend more time debugging infrastructure than testing hypotheses—slowing the pace of scientific discovery. We provide a comprehensive library of expert-level research engineering skills that enable AI agents to autonomously implement and execute different stages of AI research experiments—from data preparation and model training to evaluation and deployment.

Specialized Expertise - Each skill provides deep, production-ready knowledge of a specific framework (Megatron-LM, vLLM, TRL, etc.)
End-to-End Coverage - 85 skills spanning the full AI research lifecycle, from model architecture to deployment
Research-Grade Quality - Documentation sourced from official repos, real GitHub issues, and battle-tested production workflows

Available AI Research Engineering Skills

Quality over quantity: Each skill provides comprehensive, expert-level guidance with real code examples, troubleshooting guides, and production-ready workflows.

📦 Quick Install (Recommended)

Install skills to any coding agent (Claude Code, OpenCode, Cursor, Codex, Gemini CLI, Qwen Code) with one command:

npx @orchestra-research/ai-research-skills

This launches an interactive installer that:

Auto-detects your installed coding agents
Installs skills to ~/.orchestra/skills/ with symlinks to each agent
Offers everything, quickstart bundle, by category, or individual skills
Updates installed skills with latest versions
Uninstalls all or selected skills

CLI Commands

# Interactive installer (recommended)
npx @orchestra-research/ai-research-skills

# Direct commands
npx @orchestra-research/ai-research-skills list      # View installed skills
npx @orchestra-research/ai-research-skills update    # Update installed skills

Claude Code Marketplace (Alternative)

Install skill categories directly using the Claude Code CLI:

# Add the marketplace
/plugin marketplace add orchestra-research/AI-research-SKILLs

# Install by category (21 categories available)
/plugin install fine-tuning@ai-research-skills        # Axolotl, LLaMA-Factory, PEFT, Unsloth
/plugin install post-training@ai-research-skills      # TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge
/plugin install inference-serving@ai-research-skills  # vLLM, TensorRT-LLM, llama.cpp, SGLang
/plugin install distributed-training@ai-research-skills
/plugin install optimization@ai-research-skills

All 21 Categories (85 Skills)

Category	Skills	Included
Model Architecture	5	LitGPT, Mamba, NanoGPT, RWKV, TorchTitan
Tokenization	2	HuggingFace Tokenizers, SentencePiece
Fine-Tuning	4	Axolotl, LLaMA-Factory, PEFT, Unsloth
Mech Interp	4	TransformerLens, SAELens, pyvene, nnsight
Data Processing	2	NeMo Curator, Ray Data
Post-Training	8	TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge
Safety	4	Constitutional AI, LlamaGuard, NeMo Guardrails, Prompt Guard
Distributed	6	DeepSpeed, FSDP, Accelerate, Megatron-Core, Lightning, Ray Train
Infrastructure	3	Modal, Lambda Labs, SkyPilot
Optimization	6	Flash Attention, bitsandbytes, GPTQ, AWQ, HQQ, GGUF
Evaluation	3	lm-eval-harness, BigCode, NeMo Evaluator
Inference	4	vLLM, TensorRT-LLM, llama.cpp, SGLang
MLOps	3	W&B, MLflow, TensorBoard
Agents	4	LangChain, LlamaIndex, CrewAI, AutoGPT
RAG	5	Chroma, FAISS, Pinecone, Qdrant, Sentence Transformers
Prompt Eng	4	DSPy, Instructor, Guidance, Outlines
Observability	2	LangSmith, Phoenix
Multimodal	7	CLIP, Whisper, LLaVA, BLIP-2, SAM, Stable Diffusion, AudioCraft
Emerging	6	MoE, Model Merging, Long Context, Speculative Decoding, Distillation, Pruning
ML Paper Writing	1	ML Paper Writing (LaTeX templates, citation verification)
Ideation	2	Research Brainstorming, Creative Thinking

View All 85 Skills in Details

🏗️ Model Architecture (5 skills)

LitGPT - Lightning AI's 20+ clean LLM implementations with production training recipes (462 lines + 4 refs)
Mamba - State-space models with O(n) complexity, 5× faster than Transformers (253 lines + 3 refs)
RWKV - RNN+Transformer hybrid, infinite context, Linux Foundation project (253 lines + 3 refs)
NanoGPT - Educational GPT in ~300 lines by Karpathy (283 lines + 3 refs)
TorchTitan - PyTorch-native distributed training for Llama 3.1 with 4D parallelism

🔤 Tokenization (2 skills)

HuggingFace Tokenizers - Rust-based,

Demos

All 85 skills in this repo are automatically synced to Orchestra Research, where you can add them to your projects with one click and use them with AI research agents.

See skills in action → demos/

We maintain a curated collection of demo repositories showing how to use skills for real AI research tasks:

Demo	Skills Used	What It Does
NeMo Eval: GPQA Benchmark	NeMo Evaluator	Compare Llama 8B/70B/405B on graduate-level science questions
LoRA Without Regret Reproduction	GRPO, TRL	Reproduce SFT + GRPO RL experiments via prompting
ML Paper Writing (coming soon)	ML Paper Writing	Transform research repo → publication-ready paper
Layer-Wise Quantization Experiment	llama.cpp, GGUF	Investigate optimal layer precision allocation—early layers at Q8 achieve 1.9× compression with 1.3% perplexity loss
Cross-Lingual Alignment Analysis	FAISS	Quantify how well multilingual embeddings align semantic concepts across 8 languages using FAISS similarity search

Featured Demo: Reproduce Thinking Machines Lab's "LoRA Without Regret" paper by simply prompting an AI agent. The agent autonomously writes training code for both SFT and GRPO reinforcement learning, provisions H100 GPUs, runs LoRA rank ablation experiments overnight, and generates publication-ready analysis. No manual coding required—just describe what you want to reproduce. (Blog | Video)

Skill Structure

Each skill follows a battle-tested format for maximum usefulness:

skill-name/
├── SKILL.md                    # Quick reference (50-150 lines)
│   ├── Metadata (name, description, version)
│   ├── When to use this skill
│   ├── Quick patterns & examples
│   └── Links to references
│
├── references/                 # Deep documentation (300KB+)
│   ├── README.md              # From GitHub/official docs
│   ├── api.md                 # API reference
│   ├── tutorials.md           # Step-by-step guides
│   ├── issues.md              # Real GitHub issues & solutions
│   ├── releases.md            # Version history & breaking changes
│   └── file_structure.md      # Codebase navigation
│
├── scripts/                    # Helper scripts (optional)
└── assets/                     # Templates & examples (optional)

Quality Standards

300KB+ documentation from official sources
Real GitHub issues & solutions (when available)
Code examples with language detection
Version history & breaking changes
Links to official docs

Roadmap

We're building towards 80 comprehensive skills across the full AI research lifecycle. See our detailed roadmap for the complete development plan.

View Full Roadmap →

View Detailed Statistics

Metric	Current	Target
Skills	85 (high-quality, standardized YAML)	80 ✅
Avg Lines/Skill	420 lines (focused + progressive disclosure)	200-600 lines
Documentation	~130,000 lines total (SKILL.md + references)	100,000+ lines
Gold Standard Skills	65 with comprehensive references	50+
Contributors	1	100+
Coverage	Architecture, Tokenization, Fine-Tuning, Mechanistic Interpretability, Data Processing, Post-Training, Safety, Distributed, Optimization, Evaluation, Infrastructure, Inference, Agents, RAG, Multimodal, Prompt Engineering, MLOps, Observability, ML Paper Writing, Ideation	Full Lifecycle ✅

Recent Progress: npm package @orchestra-research/ai-research-skills for one-command installation across all coding agents

Philosophy: Quality > Quantity. Following Anthropic official best practices - each skill provides 200-500 lines of focused, actionable guidance with progressive disclosure.

Repository Structure

claude-ai-research-skills/
├── README.md                    ← You are here
├── CONTRIBUTING.md              ← Contribution guide
├── demos/                       ← Curated demo gallery (links to demo repos)
├── docs/ 
├── 01-model-architecture/       (5 skills ✓ - LitGPT, Mamba, RWKV, NanoGPT, TorchTitan)
├── 02-tokenization/             (2 skills ✓ - HuggingFace Tokenizers, SentencePiece)
├── 03-fine-tuning/              (4 skills ✓ - Axolotl, LLaMA-Factory, Unsloth, PEFT)
├── 04-mechanistic-interpretability/ (4 skills ✓ - TransformerLens, SAELens, pyvene, nnsight)
├── 05-data-processing/          (2 skills ✓ - Ray Data, NeMo Curator)
├── 06-post-training/            (8 skills ✓ - TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge)
├── 07-safety-alignment/         (4 skills ✓ - Constitutional AI, LlamaGuard, NeMo Guardrails, Prompt Guard)
├── 08-distributed-training/     (6 skills ✓ - Megatron-Core, DeepSpeed, FSDP, Accelerate, Lightning, Ray Train)
├── 09-infrastructure/           (3 skills ✓ - Modal, SkyPilot, Lambda Labs)
├── 10-optimization/             (6 skills ✓ - Flash Attention, bitsandbytes, GPTQ, AWQ, HQQ, GGUF)
├── 11-evaluation/               (3 skills ✓ - lm-evaluation-harness, BigCode, NeMo Evaluator)
├── 12-inference-serving/        (4 skills ✓ - vLLM, TensorRT-LLM, llama.cpp, SGLang)
├── 13-mlops/                    (3 skills ✓ - Weights & Biases, MLflow, TensorBoard)
├── 14-agents/                   (4 skills ✓ - LangChain, LlamaIndex, CrewAI, AutoGPT)
├── 15-rag/                      (5 skills ✓ - Chroma, FAISS, Sentence Transformers, Pinecone, Qdrant)
├── 16-prompt-engineering/       (4 skills ✓ - DSPy, Instructor, Guidance, Outlines)
├── 17-observability/            (2 skills ✓ - LangSmith, Phoenix)
├── 18-multimodal/               (7 skills ✓ - CLIP, Whisper, LLaVA, Stable Diffusion, SAM, BLIP-2, AudioCraft)
├── 19-emerging-techniques/      (6 skills ✓ - MoE, Model Merging, Long Context, Speculative Decoding, Distillation, Pruning)
├── 20-ml-paper-writing/         (1 skill ✓ - ML Paper Writing with LaTeX templates)
├── 21-research-ideation/                 (2 skills ✓ - Research Brainstorming, Creative Thinking)
└── packages/ai-research-skills/ (npm package for one-command installation)

Use Cases

For Researchers

"I need to fine-tune Llama 3 with custom data" → 03-fine-tuning/axolotl/ - YAML configs, 100+ model support

For ML Engineers

"How do I optimize inference latency?" → 12-inference-serving/vllm/ - PagedAttention, batching

For Students

"I want to learn how transformers work" → 01-model-architecture/litgpt/ - Clean implementations

For Teams

"We need to scale training to 100 GPUs" → 08-distributed-training/deepspeed/ - ZeRO stages, 3D parallelism

License

MIT License - See LICENSE for details.

Note: Individual skills may reference libraries with different licenses. Please check each project's license before use.

Acknowledgments

Built with:

Claude Code - AI pair programming
Skill Seeker - Automated doc scraping
Open Source AI Community - For amazing tools and docs

Special thanks to:

EleutherAI, HuggingFace, NVIDIA, Lightning AI, Meta AI, Anthropic
All researchers who maintain excellent documentation

Contributing

We welcome contributions from the AI research community! See CONTRIBUTING.md for detailed guidelines on:

Adding new skills
Improving existing skills
Quality standards and best practices
Submission process

All contributors are featured in our Contributors Hall of Fame 🌟

Recent Updates

February 2026 - v0.15.0 🛡️ Prompt Guard & 83 Skills

🛡️ NEW SKILL: Prompt Guard - Meta's 86M prompt injection & jailbreak detector
⚡ 99%+ TPR,

January 2026 - v0.14.0 📦 npm Package & 82 Skills

📦 NEW: npx @orchestra-research/ai-research-skills - One-command installation for all coding agents
🤖 Supported agents: Claude Code, OpenCode, Cursor, Codex, Gemini CLI, Qwen Code
✨ Interactive installer with category/individual skill selection
🔄 Update installed skills, selective uninstall
📊 82 total skills (5 new post-training skills: verl, slime, miles, torchforge + TorchTitan)
🏗️ Megatron-Core moved to Distributed Training category

January 2026 - v0.13.0 📝 ML Paper Writing & Demos Gallery

📝 NEW CATEGORY: ML Paper Writing (20th category, 77th skill)
🎯 Write publication-ready papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM
📚 Writing philosophy from top researchers (Neel Nanda, Farquhar, Gopen & Swan, Lipton, Perez)
🔬 Citation verification workflow - never hallucinate references
📄 LaTeX templates for 6 major conferences
🎪 NEW: Curated demos gallery (demos/) showcasing skills in action
🔗 Demo repos: NeMo Evaluator benchmark, LoRA Without Regret reproduction
📖 936-line comprehensive SKILL.md with 4 workflows

January 2026 - v0.12.0 📊 NeMo Evaluator SDK

📊 NEW SKILL: NeMo Evaluator SDK for enterprise LLM benchmarking
🔧 NVIDIA's evaluation platform with 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM)
⚡ Multi-backend execution: local Docker, Slurm HPC, Lepton cloud
📦 Container-first architecture for reproducible evaluation
📝 454 lines SKILL.md + 4 comprehensive reference files (~48KB documentation)

December 2025 - v0.11.0 🔬 Mechanistic Interpretability

🔬 NEW CATEGORY: Mechanistic Interpretability (4 skills)
🔍 TransformerLens skill: Neel Nanda's library for mech interp with HookPoints, activation caching, circuit analysis
🧠 SAELens skill: Sparse Autoencoder training and analysis for feature discovery, monosemanticity research
⚡ pyvene skill: Stanford's causal intervention library with declarative configs, DAS, activation patching
🌐 nnsight skill: Remote interpretability via NDIF, run experiments on 70B+ models without local GPUs
📝 ~6,500 new lines of documentation across 16 files
76 total skills (filling the missing 04 category slot)

November 25, 2025 - v0.10.0 🎉 70 Skills Complete!

🎉 ROADMAP COMPLETE: Reached 70-skill milestone!
🚀 Added 4 skills: Lambda Labs, Segment Anything (SAM), BLIP-2, AudioCraft
☁️ Lambda Labs skill: Reserved/on-demand GPU cloud with H100/A100, persistent filesystems, 1-Click Clusters
🖼️ SAM skill: Meta's Segment Anything for zero-shot image segmentation with points/boxes/masks
👁️ BLIP-2 skill: Vision-language pretraining with Q-Former, image captioning, VQA
🎵 AudioCraft skill: Meta's MusicGen/AudioGen for text-to-music and text-to-sound generation
📝 ~10,000 new lines of documentation across 12 files
70 total skills (100% roadmap complete!)

November 25, 2025 - v0.9.0

🚀 Added 2 infrastructure skills: Modal, SkyPilot
☁️ Modal skill: Serverless GPU cloud with Python-native API, T4-H200 on-demand, auto-scaling
🌐 SkyPilot skill: Multi-cloud orchestration across 20+ providers with spot recovery
✨ New Infrastructure category (2 skills - serverless GPU and multi-cloud orchestration)
📝 ~2,500 new lines of documentation across 6 files
66 total skills (94% towards 70-skill target)

November 25, 2025 - v0.8.0

🚀 Added 5 high-priority skills: HQQ, GGUF, Phoenix, AutoGPT, Stable Diffusion
⚡ HQQ skill: Half-Quadratic Quantization without calibration data, multi-backend support
📦 GGUF skill: llama.cpp quantization format, K-quant methods, CPU/Metal inference
👁️ Phoenix skill: Open-source AI observability with OpenTelemetry tracing and LLM evaluation
🤖 AutoGPT skill: Autonomous AI agent platform with visual workflow builder
🎨 Stable Diffusion skill: Text-to-image generation via Diffusers, SDXL, ControlNet, LoRA
📝 ~9,000 new lines of documentation across 15 files
64 total skills (91% towards 70-skill target)

November 25, 2025 - v0.7.0

🚀 Added 5 high-priority skills: PEFT, CrewAI, Qdrant, AWQ, LangSmith
✨ New Observability category with LangSmith for LLM tracing and evaluation
🎯 PEFT skill: Parameter-efficient fine-tuning with LoRA, QLoRA, DoRA, 25+ methods
🤖 CrewAI skill: Multi-agent orchestration with role-based collaboration
🔍 Qdrant skill: High-performance Rust vector search with hybrid filtering
⚡ AWQ skill: Activation-aware 4-bit quantization with minimal accuracy loss
📝 ~8,000 new lines of documentation across 15 files
59 total skills (84% towards 70-skill target)

November 15, 2025 - v0.6.0

📊 Added 3 comprehensive MLOps skills: Weights & Biases, MLflow, TensorBoard
✨ New MLOps category (3 skills - experiment tracking, model registry, visualization)
📝 ~10,000 new lines of documentation across 13 files
🔧 Comprehensive coverage: experiment tracking, hyperparameter sweeps, model registry, profiling, embeddings visualization
54 total skills (77% towards 70-skill target)

November 12, 2025 - v0.5.0

🎯 Added 4 comprehensive prompt engineering skills: DSPy, Instructor, Guidance, Outlines
✨ New Prompt Engineering category (4 skills - DSPy, Instructor, Guidance, Outlines)
📝 ~10,000 new lines of documentation across 16 files
🔧 Comprehensive coverage: declarative programming, structured outputs, constrained generation, FSM-based generation
47 total skills (67% towards 70-skill target)

November 9, 2025 - v0.4.0

🤖 Added 11 comprehensive skills: LangChain, LlamaIndex, Chroma, FAISS, Sentence Transformers, Pinecone, CLIP, Whisper, LLaVA
✨ New Agents category (2 skills - LangChain, LlamaIndex)
🔍 New RAG category (4 skills - Chroma, FAISS, Sentence Transformers, Pinecone)
🎨 New Multimodal category (3 skills - CLIP, Whisper, LLaVA)
📝 ~15,000 new lines of documentation
43 total skills (61% towards 70-skill target)

November 8, 2025 - v0.3.0

🚀 Added 8 comprehensive skills: TensorRT-LLM, llama.cpp, SGLang, GPTQ, HuggingFace Tokenizers, SentencePiece, Ray Data, NeMo Curator
⚡ Completed Inference & Serving category (4/4 skills)
🔤 New Tokenization category (2 skills)
📊 New Data Processing category (2 skills)
📝 9,617 new lines of documentation across 30 files
32 total skills (45% towards 70-skill target)

November 6, 2025 - v0.2.0

Added 10 skills from GitHub (Megatron-Core, Lightning, Ray Train, etc.)
Improved skill structure with comprehensive references
Created strategic roadmap to 70 skills
Added contribution guidelines

November 3, 2025 - v0.1.0

🎉 Initial release with 5 fine-tuning skills

Community

Join our community to stay updated, ask questions, and connect with other AI researchers:

SkillEvolve Meta-Skill - Connect your agent to the collective intelligence of the community. Captures techniques discovered during sessions and shares them back as curated skills.
Slack Community - Chat with the team and other users
Twitter/X - Follow for updates and announcements
LinkedIn - Connect professionally

AI Research Engineering Skills Library