How do I install Hugging Face Model Trainer?

Install Hugging Face Model Trainer with a single command: npx mdskills install huggingface/hugging-face-model-trainer. This downloads the skill files into your project and your AI agent picks them up automatically.
What platforms support Hugging Face Model Trainer?

Hugging Face Model Trainer works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.
← Back to skills
Hugging Face Model Trainer

Name: Hugging Face Model Trainer: AI Agent Skill
Brand: huggingface
Availability: InStock
Rating: 9 (1 reviews)
Author: huggingface
Verified
ProductivityIntermediate
This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence.
by @huggingface 4,202Updated 2/24/2026
Add this skill
npx mdskills install huggingface/hugging-face-model-trainer
Fork & Edit
Are you @huggingface? Sign in with GitHub to claim this listing.
Skill Advisor9.0
Comprehensive TRL training guide with clear MCP integration, multi-method support, and practical examples
+Provides complete workflow from dataset validation to job submission with actionable MCP tool directives
+Covers multiple training methods (SFT, DPO, GRPO) with clear method selection guidance and Unsloth integration
+Includes strong asynchronous job handling guidelines and critical Hub token authentication warnings
-Demands shell execution but uses hf_jobs() MCP tool exclusively—could scope to network-only permissions
SKILL.md
Edit in Browser
1---
2name: hugging-face-model-trainer
3description: This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.
4license: Complete terms in LICENSE.txt
5---
6 
7# TRL Training on Hugging Face Jobs
8 
9## Overview
10 
11Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.
12 
13**TRL provides multiple training methods:**
14- **SFT** (Supervised Fine-Tuning) - Standard instruction tuning
15- **DPO** (Direct Preference Optimization) - Alignment from preference data
16- **GRPO** (Group Relative Policy Optimization) - Online RL training
17- **Reward Modeling** - Train reward models for RLHF
18 
19**For detailed TRL method documentation:**
20```python
21hf_doc_search("your query", product="trl")
22hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
23hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO
24# etc.
25```
26 
27**See also:** `references/training_methods.md` for method overviews and selection guidance
28 
29## When to Use This Skill
30 
31Use this skill when users want to:
32- Fine-tune language models on cloud GPUs without local infrastructure
33- Train with TRL methods (SFT, DPO, GRPO, etc.)
34- Run training jobs on Hugging Face Jobs infrastructure
35- Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)
36- Ensure trained models are permanently saved to the Hub
37- Use modern workflows with optimized defaults
38 
39### When to Use Unsloth
40 
41Use **Unsloth** (`references/unsloth.md`) instead of standard TRL when:
42- **Limited GPU memory** - Unsloth uses ~60% less VRAM
43- **Speed matters** - Unsloth is ~2x faster
44- Training **large models (>13B)** - memory efficiency is critical
45- Training **Vision-Language Models (VLMs)** - Unsloth has `FastVisionModel` support
46 
47See `references/unsloth.md` for complete Unsloth documentation and `scripts/unsloth_sft_example.py` for a production-ready training script.
48 
49## Key Directives
50 
51When assisting with training jobs:
52 
531. **ALWAYS use `hf_jobs()` MCP tool** - Submit jobs using `hf_jobs("uv", {...})`, NOT bash `trl-jobs` commands. The `script` parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to `hf_jobs()`. If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using `hf_jobs()`.
54 
552. **Always include Trackio** - Every training script should include Trackio for real-time monitoring. Use example scripts in `scripts/` as templates.
56 
573. **Provide job details after submission** - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
58 
594. **Use example scripts as templates** - Reference `scripts/train_sft_example.py`, `scripts/train_dpo_example.py`, etc. as starting points.
60 
61## Local Script Dependencies
62 
63To run scripts locally (like `estimate_cost.py`), install dependencies:
64```bash
65pip install -r requirements.txt
66```
67 
68## Prerequisites Checklist
69 
70Before starting any training job, verify:
71 
72### ✅ **Account & Authentication**
73- Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)
74- Authenticated login: Check with `hf_whoami()`
75- **HF_TOKEN for Hub Push** ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost
76- Token must have write permissions  
77- **MUST pass `secrets={"HF_TOKEN": "$HF_TOKEN"}` in job config** to make token available (the `$HF_TOKEN` syntax
78  references your actual token value)
79 
80### ✅ **Dataset Requirements**
81- Dataset must exist on Hub or be loadable via `datasets.load_dataset()`
82- Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)
83- **ALWAYS validate unknown datasets** before GPU training to prevent format failures (see Dataset Validation section below)
84- Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)
85 
86### ⚠️ **Critical Settings**
87- **Timeout must exceed expected training time** - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.
88- **Hub push must be enabled** - Config: `push_to_hub=True`, `hub_model_id="username/model-name"`; Job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`
89 
90## Asynchronous Job Guidelines
91 
92**⚠️ IMPORTANT: Training jobs run asynchronously and can take hours**
93 
94### Action Required
95 
96**When user requests training:**
971. **Create the training script** with Trackio included (use `scripts/train_sft_example.py` as template)
982. **Submit immediately** using `hf_jobs()` MCP tool with script content inline - don't save to file unless user requests
993. **Report submission** with job ID, monitoring URL, and estimated time
1004. **Wait for user** to request status checks - don't poll automatically
101 
102### Ground Rules
103- **Jobs run in background** - Submission returns immediately; training continues independently
104- **Initial logs delayed** - Can take 30-60 seconds for logs to appear
105- **User checks status** - Wait for user to request status updates
106- **Avoid polling** - Check logs only on user request; provide monitoring links instead
107 
108### After Submission
109 
110**Provide to user:**
111- ✅ Job ID and monitoring URL
112- ✅ Expected completion time
113- ✅ Trackio dashboard URL
114- ✅ Note that user can request status checks later
115 
116**Example Response:**
117```
118✅ Job submitted successfully!
119 
120Job ID: abc123xyz
121Monitor: https://huggingface.co/jobs/username/abc123xyz
122 
123Expected time: ~2 hours
124Estimated cost: ~$10
125 
126The job is running in the background. Ask me to check status/logs when ready!
127```
128 
129## Quick Start: Three Approaches
130 
131**💡 Tip for Demos:** For quick demos on smaller GPUs (t4-small), omit `eval_dataset` and `eval_strategy` to save ~40% memory. You'll still see training loss and learning progress.
132 
133### Sequence Length Configuration
134 
135**TRL config classes use `max_length` (not `max_seq_length`)** to control tokenized sequence length:
136 
137```python
138# ✅ CORRECT - If you need to set sequence length
139SFTConfig(max_length=512)   # Truncate sequences to 512 tokens
140DPOConfig(max_length=2048)  # Longer context (2048 tokens)
141 
142# ❌ WRONG - This parameter doesn't exist
143SFTConfig(max_seq_length=512)  # TypeError!
144```
145 
146**Default behavior:** `max_length=1024` (truncates from right). This works well for most training.
147 
148**When to override:**
149- **Longer context**: Set higher (e.g., `max_length=2048`)
150- **Memory constraints**: Set lower (e.g., `max_length=512`)
151- **Vision models**: Set `max_length=None` (prevents cutting image tokens)
152 
153**Usually you don't need to set this parameter at all** - the examples below use the sensible default.
154 
155### Approach 1: UV Scripts (Recommended—Default Choice)
156 
157UV scripts use PEP 723 inline dependencies for clean, self-contained training. **This is the primary approach for Claude Code.**
158 
159```python
160hf_jobs("uv", {
161    "script": """
162# /// script
163# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
164# ///
165 
166from datasets import load_dataset
167from peft import LoraConfig
168from trl import SFTTrainer, SFTConfig
169import trackio
170 
171dataset = load_dataset("trl-lib/Capybara", split="train")
172 
173# Create train/eval split for monitoring
174dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
175 
176trainer = SFTTrainer(
177    model="Qwen/Qwen2.5-0.5B",
178    train_dataset=dataset_split["train"],
179    eval_dataset=dataset_split["test"],
180    peft_config=LoraConfig(r=16, lora_alpha=32),
181    args=SFTConfig(
182        output_dir="my-model",
183        push_to_hub=True,
184        hub_model_id="username/my-model",
185        num_train_epochs=3,
186        eval_strategy="steps",
187        eval_steps=50,
188        report_to="trackio",
189        project="meaningful_prject_name", # project name for the training name (trackio)
190        run_name="meaningful_run_name",   # descriptive name for the specific training run (trackio)
191    )
192)
193 
194trainer.train()
195trainer.push_to_hub()
196""",
197    "flavor": "a10g-large",
198    "timeout": "2h",
199    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
200})
201```
202 
203**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control
204**When to use:** Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring `hf_jobs()`
205 
206#### Working with Scripts
207 
208⚠️ **Important:** The `script` parameter accepts either inline code (as shown above) OR a URL. **Local file paths do NOT work.**
209 
210**Why local paths don't work:**
211Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:
212- Inline code (recommended for custom training)
213- Publicly accessible URLs
214- Private repo URLs (with HF_TOKEN)
215 
216**Common mistakes:**
217```python
218# ❌ These will all fail
219hf_jobs("uv", {"script": "train.py"})
220hf_jobs("uv", {"script": "./scripts/train.py"})
221hf_jobs("uv", {"script": "/path/to/train.py"})
222```
223 
224**Correct approaches:**
225```python
226# ✅ Inline code (recommended)
227hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})
228 
229# ✅ From Hugging Face Hub
230hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})
231 
232# ✅ From GitHub
233hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})
234 
235# ✅ From Gist
236hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})
237```
238 
239**To use local scripts:** Upload to HF Hub first:
240```bash
241huggingface-cli repo create my-training-scripts --type model
242huggingface-cli upload my-training-scripts ./train.py train.py
243# Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py
244```
245 
246### Approach 2: TRL Maintained Scripts (Official Examples)
247 
248TRL provides battle-tested scripts for all methods. Can be run from URLs:
249 
250```python
251hf_jobs("uv", {
252    "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
253    "script_args": [
254        "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
255        "--dataset_name", "trl-lib/Capybara",
256        "--output_dir", "my-model",
257        "--push_to_hub",
258        "--hub_model_id", "username/my-model"
259    ],
260    "flavor": "a10g-large",
261    "timeout": "2h",
262    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
263})
264```
265 
266**Benefits:** No code to write, maintained by TRL team, production-tested
267**When to use:** Standard TRL training, quick experiments, don't need custom code
268**Available:** Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts
269 
270### Finding More UV Scripts on Hub
271 
272The `uv-scripts` organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
273 
274```python
275# Discover available UV script collections
276dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
277 
278# Explore a specific collection
279hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
280```
281 
282**Popular collections:** ocr, classification, synthetic-data, vllm, dataset-creation
283 
284### Approach 3: HF Jobs CLI (Direct Terminal Commands)
285 
286When the `hf_jobs()` MCP tool is unavailable, use the `hf jobs` CLI directly.
287 
288**⚠️ CRITICAL: CLI Syntax Rules**
289 
290```bash
291# ✅ CORRECT syntax - flags BEFORE script URL
292hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"
293 
294# ❌ WRONG - "run uv" instead of "uv run"
295hf jobs run uv "https://example.com/train.py" --flavor a10g-large
296 
297# ❌ WRONG - flags AFTER script URL (will be ignored!)
298hf jobs uv run "https://example.com/train.py" --flavor a10g-large
299 
300# ❌ WRONG - "--secret" instead of "--secrets" (plural)
301hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"
302```
303 
304**Key syntax rules:**
3051. Command order is `hf jobs uv run` (NOT `hf jobs run uv`)
3062. All flags (`--flavor`, `--timeout`, `--secrets`) must come BEFORE the script URL
3073. Use `--secrets` (plural), not `--secret`
3084. Script URL must be the last positional argument
309 
310**Complete CLI example:**
311```bash
312hf jobs uv run \
313  --flavor a10g-large \
314  --timeout 2h \
315  --secrets HF_TOKEN \
316  "https://huggingface.co/user/repo/resolve/main/train.py"
317```
318 
319**Check job status via CLI:**
320```bash
321hf jobs ps                        # List all jobs
322hf jobs logs <job-id>             # View logs
323hf jobs inspect <job-id>          # Job details
324hf jobs cancel <job-id>           # Cancel a job
325```
326 
327### Approach 4: TRL Jobs Package (Simplified Training)
328 
329The `trl-jobs` package provides optimized defaults and one-liner training.
330 
331```bash
332# Install
333pip install trl-jobs
334 
335# Train with SFT (simplest possible)
336trl-jobs sft \
337  --model_name Qwen/Qwen2.5-0.5B \
338  --dataset_name trl-lib/Capybara
339```
340 
341**Benefits:** Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands
342**When to use:** User working in terminal directly (not Claude Code context), quick local experimentation
343**Repository:** https://github.com/huggingface/trl-jobs
344 
345⚠️ **In Claude Code context, prefer using `hf_jobs()` MCP tool (Approach 1) when available.**
346 
347## Hardware Selection
348 
349| Model Size | Recommended Hardware | Cost (approx/hr) | Use Case |
350|------------|---------------------|------------------|----------|
351| <1B params | `t4-small` | ~$0.75 | Demos, quick tests only without eval steps |
352| 1-3B params | `t4-medium`, `l4x1` | ~$1.50-2.50 | Development |
353| 3-7B params | `a10g-small`, `a10g-large` | ~$3.50-5.00 | Production training |
354| 7-13B params | `a10g-large`, `a100-large` | ~$5-10 | Large models (use LoRA) |
355| 13B+ params | `a100-large`, `a10g-largex2` | ~$10-20 | Very large (use LoRA) |
356 
357**GPU Flavors:** cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8
358 
359**Guidelines:**
360- Use **LoRA/PEFT** for models >7B to reduce memory
361- Multi-GPU automatically handled by TRL/Accelerate
362- Start with smaller hardware for testing
363 
364**See:** `references/hardware_guide.md` for detailed specifications
365 
366## Critical: Saving Results to Hub
367 
368**⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB**
369 
370The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, **ALL TRAINING IS LOST**.
371 
372### Required Configuration
373 
374**In training script/config:**
375```python
376SFTConfig(
377    push_to_hub=True,
378    hub_model_id="username/model-name",  # MUST specify
379    hub_strategy="every_save",  # Optional: push checkpoints
380)
381```
382 
383**In job submission:**
384```python
385{
386    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Enables authentication
387}
388```
389 
390### Verification Checklist
391 
392Before submitting:
393- [ ] `push_to_hub=True` set in config
394- [ ] `hub_model_id` includes username/repo-name
395- [ ] `secrets` parameter includes HF_TOKEN
396- [ ] User has write access to target repo
397 
398**See:** `references/hub_saving.md` for detailed troubleshooting
399 
400## Timeout Management
401 
402**⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING**
403 
404### Setting Timeouts
405 
406```python
407{
408    "timeout": "2h"   # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)
409}
410```
411 
412### Timeout Guidelines
413 
414| Scenario | Recommended | Notes |
415|----------|-------------|-------|
416| Quick demo (50-100 examples) | 10-30 min | Verify setup |
417| Development training | 1-2 hours | Small datasets |
418| Production (3-7B model) | 4-6 hours | Full datasets |
419| Large model with LoRA | 3-6 hours | Depends on dataset |
420 
421**Always add 20-30% buffer** for model/dataset loading, checkpoint saving, Hub push operations, and network delays.
422 
423**On timeout:** Job killed immediately, all unsaved progress lost, must restart from beginning
424 
425## Cost Estimation
426 
427**Offer to estimate cost when planning jobs with known parameters.** Use `scripts/estimate_cost.py`:
428 
429```bash
430uv run scripts/estimate_cost.py \
431  --model meta-llama/Llama-2-7b-hf \
432  --dataset trl-lib/Capybara \
433  --hardware a10g-large \
434  --dataset-size 16000 \
435  --epochs 3
436```
437 
438Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.
439 
440**When to offer:** User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5
441 
442## Example Training Scripts
443 
444**Production-ready templates with all best practices:**
445 
446Load these scripts for correctly:
447 
448- **`scripts/train_sft_example.py`** - Complete SFT training with Trackio, LoRA, checkpoints
449- **`scripts/train_dpo_example.py`** - DPO training for preference learning
450- **`scripts/train_grpo_example.py`** - GRPO training for online RL
451 
452These scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to `hf_jobs()` or use as templates for custom scripts.
453 
454## Monitoring and Tracking
455 
456**Trackio** provides real-time metrics visualization. See `references/trackio_guide.md` for complete setup guide.
457 
458**Key points:**
459- Add `trackio` to dependencies
460- Configure trainer with `report_to="trackio" and run_name="meaningful_name"`
461 
462### Trackio Configuration Defaults
463 
464**Use sensible defaults unless user specifies otherwise.** When generating training scripts with Trackio:
465 
466**Default Configuration:**
467- **Space ID**: `{username}/trackio` (use "trackio" as default space name)
468- **Run naming**: Unless otherwise specified, name the run in a way the user will recognize (e.g., descriptive of the task, model, or purpose)
469- **Config**: Keep minimal - only include hyperparameters and model/dataset info
470- **Project Name**: Use a Project Name to associate runs with a particular Project 
471 
472**User overrides:** If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.
473 
474 
475This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.
476 
477See `references/trackio_guide.md` for complete documentation including grouping runs for experiments.
478 
479### Check Job Status
480 
481```python
482# List all jobs
483hf_jobs("ps")
484 
485# Inspect specific job
486hf_jobs("inspect", {"job_id": "your-job-id"})
487 
488# View logs
489hf_jobs("logs", {"job_id": "your-job-id"})
490```
491 
492**Remember:** Wait for user to request status checks. Avoid polling repeatedly.
493 
494## Dataset Validation
495 
496**Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.**
497 
498### Why Validate
499 
500- 50%+ of training failures are due to dataset format issues
501- DPO especially strict: requires exact column names (`prompt`, `chosen`, `rejected`)
502- Failed GPU jobs waste $1-10 and 30-60 minutes
503- Validation on CPU costs ~$0.01 and takes <1 minute
504 
505### When to Validate
506 
507**ALWAYS validate for:**
508- Unknown or custom datasets
509- DPO training (CRITICAL - 90% of datasets need mapping)
510- Any dataset not explicitly TRL-compatible
511 
512**Skip validation for known TRL datasets:**
513- `trl-lib/ultrachat_200k`, `trl-lib/Capybara`, `HuggingFaceH4/ultrachat_200k`, etc.
514 
515### Usage
516 
517```python
518hf_jobs("uv", {
519    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
520    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
521})
522```
523 
524The script is fast, and will usually complete synchronously.
525 
526### Reading Results
527 
528The output shows compatibility for each training method:
529 
530- **`✓ READY`** - Dataset is compatible, use directly
531- **`✗ NEEDS MAPPING`** - Compatible but needs preprocessing (mapping code provided)
532- **`✗ INCOMPATIBLE`** - Cannot be used for this method
533 
534When mapping is needed, the output includes a **"MAPPING CODE"** section with copy-paste ready Python code.
535 
536### Example Workflow
537 
538```python
539# 1. Inspect dataset (costs ~$0.01, <1 min on CPU)
540hf_jobs("uv", {
541    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
542    "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]
543})
544 
545# 2. Check output markers:
546#    ✓ READY → proceed with training
547#    ✗ NEEDS MAPPING → apply mapping code below
548#    ✗ INCOMPATIBLE → choose different method/dataset
549 
550# 3. If mapping needed, apply before training:
551def format_for_dpo(example):
552    return {
553        'prompt': example['instruction'],
554        'chosen': example['chosen_response'],
555        'rejected': example['rejected_response'],
556    }
557dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)
558 
559# 4. Launch training job with confidence
560```
561 
562### Common Scenario: DPO Format Mismatch
563 
564Most DPO datasets use non-standard column names. Example:
565 
566```
567Dataset has: instruction, chosen_response, rejected_response
568DPO expects: prompt, chosen, rejected
569```
570 
571The validator detects this and provides exact mapping code to fix it.
572 
573## Converting Models to GGUF
574 
575After training, convert models to **GGUF format** for use with llama.cpp, Ollama, LM Studio, and other local inference tools.
576 
577**What is GGUF:**
578- Optimized for CPU/GPU inference with llama.cpp
579- Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size
580- Compatible with Ollama, LM Studio, Jan, GPT4All, llama.cpp
581- Typically 2-8GB for 7B models (vs 14GB unquantized)
582 
583**When to convert:**
584- Running models locally with Ollama or LM Studio
585- Reducing model size with quantization
586- Deploying to edge devices
587- Sharing models for local-first use
588 
589**See:** `references/gguf_conversion.md` for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.
590 
591**Quick conversion:**
592```python
593hf_jobs("uv", {
594    "script": "<see references/gguf_conversion.md for complete script>",
595    "flavor": "a10g-large",
596    "timeout": "45m",
597    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
598    "env": {
599        "ADAPTER_MODEL": "username/my-finetuned-model",
600        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
601        "OUTPUT_REPO": "username/my-model-gguf"
602    }
603})
604```
605 
606## Common Training Patterns
607 
608See `references/training_patterns.md` for detailed examples including:
609- Quick demo (5-10 minutes)
610- Production with checkpoints
611- Multi-GPU training
612- DPO training (preference learning)
613- GRPO training (online RL)
614 
615## Common Failure Modes
616 
617### Out of Memory (OOM)
618 
619**Fix (try in order):**
6201. Reduce batch size: `per_device_train_batch_size=1`, increase `gradient_accumulation_steps=8`. Effective batch size is `per_device_train_batch_size` x `gradient_accumulation_steps`. For best performance keep effective batch size close to 128. 
6212. Enable: `gradient_checkpointing=True`
6223. Upgrade hardware: t4-small → l4x1, a10g-small → a10g-large etc. 
623 
624### Dataset Misformatted
625 
626**Fix:**
6271. Validate first with dataset inspector:
628   ```bash
629   uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
630     --dataset name --split train
631   ```
6322. Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
6333. Apply mapping code from inspector output if needed
634 
635### Job Timeout
636 
637**Fix:**
6381. Check logs for actual runtime: `hf_jobs("logs", {"job_id": "..."})`
6392. Increase timeout with buffer: `"timeout": "3h"` (add 30% to estimated time)
6403. Or reduce training: lower `num_train_epochs`, use smaller dataset, enable `max_steps`
6414. Save checkpoints: `save_strategy="steps"`, `save_steps=500`, `hub_strategy="every_save"`
642 
643**Note:** Default 30min is insufficient for real training. Minimum 1-2 hours.
644 
645### Hub Push Failures
646 
647**Fix:**
6481. Add to job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`
6492. Add to config: `push_to_hub=True`, `hub_model_id="username/model-name"`
6503. Verify auth: `mcp__huggingface__hf_whoami()`
6514. Check token has write permissions and repo exists (or set `hub_private_repo=True`)
652 
653### Missing Dependencies
654 
655**Fix:**
656Add to PEP 723 header:
657```python
658# /// script
659# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]
660# ///
661```
662 
663## Troubleshooting
664 
665**Common issues:**
666- Job times out → Increase timeout, reduce epochs/dataset, use smaller model/LoRA
667- Model not saved to Hub → Check push_to_hub=True, hub_model_id, secrets=HF_TOKEN
668- Out of Memory (OOM) → Reduce batch size, increase gradient accumulation, enable LoRA, use larger GPU
669- Dataset format error → Validate with dataset inspector (see Dataset Validation section)
670- Import/module errors → Add PEP 723 header with dependencies, verify format
671- Authentication errors → Check `mcp__huggingface__hf_whoami()`, token permissions, secrets parameter
672 
673**See:** `references/troubleshooting.md` for complete troubleshooting guide
674 
675## Resources
676 
677### References (In This Skill)
678- `references/training_methods.md` - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modeling
679- `references/training_patterns.md` - Common training patterns and examples
680- `references/unsloth.md` - Unsloth for fast VLM training (~2x speed, 60% less VRAM)
681- `references/gguf_conversion.md` - Complete GGUF conversion guide
682- `references/trackio_guide.md` - Trackio monitoring setup
683- `references/hardware_guide.md` - Hardware specs and selection
684- `references/hub_saving.md` - Hub authentication troubleshooting
685- `references/troubleshooting.md` - Common issues and solutions
686 
687### Scripts (In This Skill)
688- `scripts/train_sft_example.py` - Production SFT template
689- `scripts/train_dpo_example.py` - Production DPO template
690- `scripts/train_grpo_example.py` - Production GRPO template
691- `scripts/unsloth_sft_example.py` - Unsloth text LLM training template (faster, less VRAM)
692- `scripts/estimate_cost.py` - Estimate time and cost (offer when appropriate)
693- `scripts/convert_to_gguf.py` - Complete GGUF conversion script
694 
695### External Scripts
696- [Dataset Inspector](https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py) - Validate dataset format before training (use via `uv run` or `hf_jobs`)
697 
698### External Links
699- [TRL Documentation](https://huggingface.co/docs/trl)
700- [TRL Jobs Training Guide](https://huggingface.co/docs/trl/en/jobs_training)
701- [TRL Jobs Package](https://github.com/huggingface/trl-jobs)
702- [HF Jobs Documentation](https://huggingface.co/docs/huggingface_hub/guides/jobs)
703- [TRL Example Scripts](https://github.com/huggingface/trl/tree/main/examples/scripts)
704- [UV Scripts Guide](https://docs.astral.sh/uv/guides/scripts/)
705- [UV Scripts Organization](https://huggingface.co/uv-scripts)
706 
707## Key Takeaways
708 
7091. **Submit scripts inline** - The `script` parameter accepts Python code directly; no file saving required unless user requests
7102. **Jobs are asynchronous** - Don't wait/poll; let user check when ready
7113. **Always set timeout** - Default 30 min is insufficient; minimum 1-2 hours recommended
7124. **Always enable Hub push** - Environment is ephemeral; without push, all results lost
7135. **Include Trackio** - Use example scripts as templates for real-time monitoring
7146. **Offer cost estimation** - When parameters are known, use `scripts/estimate_cost.py`
7157. **Use UV scripts (Approach 1)** - Default to `hf_jobs("uv", {...})` with inline scripts; TRL maintained scripts for standard training; avoid bash `trl-jobs` commands in Claude Code
7168. **Use hf_doc_fetch/hf_doc_search** for latest TRL documentation
7179. **Validate dataset format** before training with dataset inspector (see Dataset Validation section)
71810. **Choose appropriate hardware** for model size; use LoRA for models >7B
719
Full transparency — inspect the skill content before installing.
New to skill.md files?
See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.
Read the guide →