This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence.
Add this skill
npx mdskills install huggingface/hugging-face-model-trainerComprehensive TRL training guide with clear MCP integration, multi-method support, and practical examples
1---2name: hugging-face-model-trainer3description: This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.4license: Complete terms in LICENSE.txt5---67# TRL Training on Hugging Face Jobs89## Overview1011Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.1213**TRL provides multiple training methods:**14- **SFT** (Supervised Fine-Tuning) - Standard instruction tuning15- **DPO** (Direct Preference Optimization) - Alignment from preference data16- **GRPO** (Group Relative Policy Optimization) - Online RL training17- **Reward Modeling** - Train reward models for RLHF1819**For detailed TRL method documentation:**20```python21hf_doc_search("your query", product="trl")22hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT23hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO24# etc.25```2627**See also:** `references/training_methods.md` for method overviews and selection guidance2829## When to Use This Skill3031Use this skill when users want to:32- Fine-tune language models on cloud GPUs without local infrastructure33- Train with TRL methods (SFT, DPO, GRPO, etc.)34- Run training jobs on Hugging Face Jobs infrastructure35- Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)36- Ensure trained models are permanently saved to the Hub37- Use modern workflows with optimized defaults3839### When to Use Unsloth4041Use **Unsloth** (`references/unsloth.md`) instead of standard TRL when:42- **Limited GPU memory** - Unsloth uses ~60% less VRAM43- **Speed matters** - Unsloth is ~2x faster44- Training **large models (>13B)** - memory efficiency is critical45- Training **Vision-Language Models (VLMs)** - Unsloth has `FastVisionModel` support4647See `references/unsloth.md` for complete Unsloth documentation and `scripts/unsloth_sft_example.py` for a production-ready training script.4849## Key Directives5051When assisting with training jobs:52531. **ALWAYS use `hf_jobs()` MCP tool** - Submit jobs using `hf_jobs("uv", {...})`, NOT bash `trl-jobs` commands. The `script` parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to `hf_jobs()`. If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using `hf_jobs()`.54552. **Always include Trackio** - Every training script should include Trackio for real-time monitoring. Use example scripts in `scripts/` as templates.56573. **Provide job details after submission** - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.58594. **Use example scripts as templates** - Reference `scripts/train_sft_example.py`, `scripts/train_dpo_example.py`, etc. as starting points.6061## Local Script Dependencies6263To run scripts locally (like `estimate_cost.py`), install dependencies:64```bash65pip install -r requirements.txt66```6768## Prerequisites Checklist6970Before starting any training job, verify:7172### ✅ **Account & Authentication**73- Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)74- Authenticated login: Check with `hf_whoami()`75- **HF_TOKEN for Hub Push** ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost76- Token must have write permissions77- **MUST pass `secrets={"HF_TOKEN": "$HF_TOKEN"}` in job config** to make token available (the `$HF_TOKEN` syntax78 references your actual token value)7980### ✅ **Dataset Requirements**81- Dataset must exist on Hub or be loadable via `datasets.load_dataset()`82- Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)83- **ALWAYS validate unknown datasets** before GPU training to prevent format failures (see Dataset Validation section below)84- Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)8586### ⚠️ **Critical Settings**87- **Timeout must exceed expected training time** - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.88- **Hub push must be enabled** - Config: `push_to_hub=True`, `hub_model_id="username/model-name"`; Job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`8990## Asynchronous Job Guidelines9192**⚠️ IMPORTANT: Training jobs run asynchronously and can take hours**9394### Action Required9596**When user requests training:**971. **Create the training script** with Trackio included (use `scripts/train_sft_example.py` as template)982. **Submit immediately** using `hf_jobs()` MCP tool with script content inline - don't save to file unless user requests993. **Report submission** with job ID, monitoring URL, and estimated time1004. **Wait for user** to request status checks - don't poll automatically101102### Ground Rules103- **Jobs run in background** - Submission returns immediately; training continues independently104- **Initial logs delayed** - Can take 30-60 seconds for logs to appear105- **User checks status** - Wait for user to request status updates106- **Avoid polling** - Check logs only on user request; provide monitoring links instead107108### After Submission109110**Provide to user:**111- ✅ Job ID and monitoring URL112- ✅ Expected completion time113- ✅ Trackio dashboard URL114- ✅ Note that user can request status checks later115116**Example Response:**117```118✅ Job submitted successfully!119120Job ID: abc123xyz121Monitor: https://huggingface.co/jobs/username/abc123xyz122123Expected time: ~2 hours124Estimated cost: ~$10125126The job is running in the background. Ask me to check status/logs when ready!127```128129## Quick Start: Three Approaches130131**💡 Tip for Demos:** For quick demos on smaller GPUs (t4-small), omit `eval_dataset` and `eval_strategy` to save ~40% memory. You'll still see training loss and learning progress.132133### Sequence Length Configuration134135**TRL config classes use `max_length` (not `max_seq_length`)** to control tokenized sequence length:136137```python138# ✅ CORRECT - If you need to set sequence length139SFTConfig(max_length=512) # Truncate sequences to 512 tokens140DPOConfig(max_length=2048) # Longer context (2048 tokens)141142# ❌ WRONG - This parameter doesn't exist143SFTConfig(max_seq_length=512) # TypeError!144```145146**Default behavior:** `max_length=1024` (truncates from right). This works well for most training.147148**When to override:**149- **Longer context**: Set higher (e.g., `max_length=2048`)150- **Memory constraints**: Set lower (e.g., `max_length=512`)151- **Vision models**: Set `max_length=None` (prevents cutting image tokens)152153**Usually you don't need to set this parameter at all** - the examples below use the sensible default.154155### Approach 1: UV Scripts (Recommended—Default Choice)156157UV scripts use PEP 723 inline dependencies for clean, self-contained training. **This is the primary approach for Claude Code.**158159```python160hf_jobs("uv", {161 "script": """162# /// script163# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]164# ///165166from datasets import load_dataset167from peft import LoraConfig168from trl import SFTTrainer, SFTConfig169import trackio170171dataset = load_dataset("trl-lib/Capybara", split="train")172173# Create train/eval split for monitoring174dataset_split = dataset.train_test_split(test_size=0.1, seed=42)175176trainer = SFTTrainer(177 model="Qwen/Qwen2.5-0.5B",178 train_dataset=dataset_split["train"],179 eval_dataset=dataset_split["test"],180 peft_config=LoraConfig(r=16, lora_alpha=32),181 args=SFTConfig(182 output_dir="my-model",183 push_to_hub=True,184 hub_model_id="username/my-model",185 num_train_epochs=3,186 eval_strategy="steps",187 eval_steps=50,188 report_to="trackio",189 project="meaningful_prject_name", # project name for the training name (trackio)190 run_name="meaningful_run_name", # descriptive name for the specific training run (trackio)191 )192)193194trainer.train()195trainer.push_to_hub()196""",197 "flavor": "a10g-large",198 "timeout": "2h",199 "secrets": {"HF_TOKEN": "$HF_TOKEN"}200})201```202203**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control204**When to use:** Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring `hf_jobs()`205206#### Working with Scripts207208⚠️ **Important:** The `script` parameter accepts either inline code (as shown above) OR a URL. **Local file paths do NOT work.**209210**Why local paths don't work:**211Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:212- Inline code (recommended for custom training)213- Publicly accessible URLs214- Private repo URLs (with HF_TOKEN)215216**Common mistakes:**217```python218# ❌ These will all fail219hf_jobs("uv", {"script": "train.py"})220hf_jobs("uv", {"script": "./scripts/train.py"})221hf_jobs("uv", {"script": "/path/to/train.py"})222```223224**Correct approaches:**225```python226# ✅ Inline code (recommended)227hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})228229# ✅ From Hugging Face Hub230hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})231232# ✅ From GitHub233hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})234235# ✅ From Gist236hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})237```238239**To use local scripts:** Upload to HF Hub first:240```bash241huggingface-cli repo create my-training-scripts --type model242huggingface-cli upload my-training-scripts ./train.py train.py243# Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py244```245246### Approach 2: TRL Maintained Scripts (Official Examples)247248TRL provides battle-tested scripts for all methods. Can be run from URLs:249250```python251hf_jobs("uv", {252 "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",253 "script_args": [254 "--model_name_or_path", "Qwen/Qwen2.5-0.5B",255 "--dataset_name", "trl-lib/Capybara",256 "--output_dir", "my-model",257 "--push_to_hub",258 "--hub_model_id", "username/my-model"259 ],260 "flavor": "a10g-large",261 "timeout": "2h",262 "secrets": {"HF_TOKEN": "$HF_TOKEN"}263})264```265266**Benefits:** No code to write, maintained by TRL team, production-tested267**When to use:** Standard TRL training, quick experiments, don't need custom code268**Available:** Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts269270### Finding More UV Scripts on Hub271272The `uv-scripts` organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:273274```python275# Discover available UV script collections276dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})277278# Explore a specific collection279hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)280```281282**Popular collections:** ocr, classification, synthetic-data, vllm, dataset-creation283284### Approach 3: HF Jobs CLI (Direct Terminal Commands)285286When the `hf_jobs()` MCP tool is unavailable, use the `hf jobs` CLI directly.287288**⚠️ CRITICAL: CLI Syntax Rules**289290```bash291# ✅ CORRECT syntax - flags BEFORE script URL292hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"293294# ❌ WRONG - "run uv" instead of "uv run"295hf jobs run uv "https://example.com/train.py" --flavor a10g-large296297# ❌ WRONG - flags AFTER script URL (will be ignored!)298hf jobs uv run "https://example.com/train.py" --flavor a10g-large299300# ❌ WRONG - "--secret" instead of "--secrets" (plural)301hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"302```303304**Key syntax rules:**3051. Command order is `hf jobs uv run` (NOT `hf jobs run uv`)3062. All flags (`--flavor`, `--timeout`, `--secrets`) must come BEFORE the script URL3073. Use `--secrets` (plural), not `--secret`3084. Script URL must be the last positional argument309310**Complete CLI example:**311```bash312hf jobs uv run \313 --flavor a10g-large \314 --timeout 2h \315 --secrets HF_TOKEN \316 "https://huggingface.co/user/repo/resolve/main/train.py"317```318319**Check job status via CLI:**320```bash321hf jobs ps # List all jobs322hf jobs logs <job-id> # View logs323hf jobs inspect <job-id> # Job details324hf jobs cancel <job-id> # Cancel a job325```326327### Approach 4: TRL Jobs Package (Simplified Training)328329The `trl-jobs` package provides optimized defaults and one-liner training.330331```bash332# Install333pip install trl-jobs334335# Train with SFT (simplest possible)336trl-jobs sft \337 --model_name Qwen/Qwen2.5-0.5B \338 --dataset_name trl-lib/Capybara339```340341**Benefits:** Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands342**When to use:** User working in terminal directly (not Claude Code context), quick local experimentation343**Repository:** https://github.com/huggingface/trl-jobs344345⚠️ **In Claude Code context, prefer using `hf_jobs()` MCP tool (Approach 1) when available.**346347## Hardware Selection348349| Model Size | Recommended Hardware | Cost (approx/hr) | Use Case |350|------------|---------------------|------------------|----------|351| <1B params | `t4-small` | ~$0.75 | Demos, quick tests only without eval steps |352| 1-3B params | `t4-medium`, `l4x1` | ~$1.50-2.50 | Development |353| 3-7B params | `a10g-small`, `a10g-large` | ~$3.50-5.00 | Production training |354| 7-13B params | `a10g-large`, `a100-large` | ~$5-10 | Large models (use LoRA) |355| 13B+ params | `a100-large`, `a10g-largex2` | ~$10-20 | Very large (use LoRA) |356357**GPU Flavors:** cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8358359**Guidelines:**360- Use **LoRA/PEFT** for models >7B to reduce memory361- Multi-GPU automatically handled by TRL/Accelerate362- Start with smaller hardware for testing363364**See:** `references/hardware_guide.md` for detailed specifications365366## Critical: Saving Results to Hub367368**⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB**369370The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, **ALL TRAINING IS LOST**.371372### Required Configuration373374**In training script/config:**375```python376SFTConfig(377 push_to_hub=True,378 hub_model_id="username/model-name", # MUST specify379 hub_strategy="every_save", # Optional: push checkpoints380)381```382383**In job submission:**384```python385{386 "secrets": {"HF_TOKEN": "$HF_TOKEN"} # Enables authentication387}388```389390### Verification Checklist391392Before submitting:393- [ ] `push_to_hub=True` set in config394- [ ] `hub_model_id` includes username/repo-name395- [ ] `secrets` parameter includes HF_TOKEN396- [ ] User has write access to target repo397398**See:** `references/hub_saving.md` for detailed troubleshooting399400## Timeout Management401402**⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING**403404### Setting Timeouts405406```python407{408 "timeout": "2h" # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)409}410```411412### Timeout Guidelines413414| Scenario | Recommended | Notes |415|----------|-------------|-------|416| Quick demo (50-100 examples) | 10-30 min | Verify setup |417| Development training | 1-2 hours | Small datasets |418| Production (3-7B model) | 4-6 hours | Full datasets |419| Large model with LoRA | 3-6 hours | Depends on dataset |420421**Always add 20-30% buffer** for model/dataset loading, checkpoint saving, Hub push operations, and network delays.422423**On timeout:** Job killed immediately, all unsaved progress lost, must restart from beginning424425## Cost Estimation426427**Offer to estimate cost when planning jobs with known parameters.** Use `scripts/estimate_cost.py`:428429```bash430uv run scripts/estimate_cost.py \431 --model meta-llama/Llama-2-7b-hf \432 --dataset trl-lib/Capybara \433 --hardware a10g-large \434 --dataset-size 16000 \435 --epochs 3436```437438Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.439440**When to offer:** User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5441442## Example Training Scripts443444**Production-ready templates with all best practices:**445446Load these scripts for correctly:447448- **`scripts/train_sft_example.py`** - Complete SFT training with Trackio, LoRA, checkpoints449- **`scripts/train_dpo_example.py`** - DPO training for preference learning450- **`scripts/train_grpo_example.py`** - GRPO training for online RL451452These scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to `hf_jobs()` or use as templates for custom scripts.453454## Monitoring and Tracking455456**Trackio** provides real-time metrics visualization. See `references/trackio_guide.md` for complete setup guide.457458**Key points:**459- Add `trackio` to dependencies460- Configure trainer with `report_to="trackio" and run_name="meaningful_name"`461462### Trackio Configuration Defaults463464**Use sensible defaults unless user specifies otherwise.** When generating training scripts with Trackio:465466**Default Configuration:**467- **Space ID**: `{username}/trackio` (use "trackio" as default space name)468- **Run naming**: Unless otherwise specified, name the run in a way the user will recognize (e.g., descriptive of the task, model, or purpose)469- **Config**: Keep minimal - only include hyperparameters and model/dataset info470- **Project Name**: Use a Project Name to associate runs with a particular Project471472**User overrides:** If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.473474475This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.476477See `references/trackio_guide.md` for complete documentation including grouping runs for experiments.478479### Check Job Status480481```python482# List all jobs483hf_jobs("ps")484485# Inspect specific job486hf_jobs("inspect", {"job_id": "your-job-id"})487488# View logs489hf_jobs("logs", {"job_id": "your-job-id"})490```491492**Remember:** Wait for user to request status checks. Avoid polling repeatedly.493494## Dataset Validation495496**Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.**497498### Why Validate499500- 50%+ of training failures are due to dataset format issues501- DPO especially strict: requires exact column names (`prompt`, `chosen`, `rejected`)502- Failed GPU jobs waste $1-10 and 30-60 minutes503- Validation on CPU costs ~$0.01 and takes <1 minute504505### When to Validate506507**ALWAYS validate for:**508- Unknown or custom datasets509- DPO training (CRITICAL - 90% of datasets need mapping)510- Any dataset not explicitly TRL-compatible511512**Skip validation for known TRL datasets:**513- `trl-lib/ultrachat_200k`, `trl-lib/Capybara`, `HuggingFaceH4/ultrachat_200k`, etc.514515### Usage516517```python518hf_jobs("uv", {519 "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",520 "script_args": ["--dataset", "username/dataset-name", "--split", "train"]521})522```523524The script is fast, and will usually complete synchronously.525526### Reading Results527528The output shows compatibility for each training method:529530- **`✓ READY`** - Dataset is compatible, use directly531- **`✗ NEEDS MAPPING`** - Compatible but needs preprocessing (mapping code provided)532- **`✗ INCOMPATIBLE`** - Cannot be used for this method533534When mapping is needed, the output includes a **"MAPPING CODE"** section with copy-paste ready Python code.535536### Example Workflow537538```python539# 1. Inspect dataset (costs ~$0.01, <1 min on CPU)540hf_jobs("uv", {541 "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",542 "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]543})544545# 2. Check output markers:546# ✓ READY → proceed with training547# ✗ NEEDS MAPPING → apply mapping code below548# ✗ INCOMPATIBLE → choose different method/dataset549550# 3. If mapping needed, apply before training:551def format_for_dpo(example):552 return {553 'prompt': example['instruction'],554 'chosen': example['chosen_response'],555 'rejected': example['rejected_response'],556 }557dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)558559# 4. Launch training job with confidence560```561562### Common Scenario: DPO Format Mismatch563564Most DPO datasets use non-standard column names. Example:565566```567Dataset has: instruction, chosen_response, rejected_response568DPO expects: prompt, chosen, rejected569```570571The validator detects this and provides exact mapping code to fix it.572573## Converting Models to GGUF574575After training, convert models to **GGUF format** for use with llama.cpp, Ollama, LM Studio, and other local inference tools.576577**What is GGUF:**578- Optimized for CPU/GPU inference with llama.cpp579- Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size580- Compatible with Ollama, LM Studio, Jan, GPT4All, llama.cpp581- Typically 2-8GB for 7B models (vs 14GB unquantized)582583**When to convert:**584- Running models locally with Ollama or LM Studio585- Reducing model size with quantization586- Deploying to edge devices587- Sharing models for local-first use588589**See:** `references/gguf_conversion.md` for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.590591**Quick conversion:**592```python593hf_jobs("uv", {594 "script": "<see references/gguf_conversion.md for complete script>",595 "flavor": "a10g-large",596 "timeout": "45m",597 "secrets": {"HF_TOKEN": "$HF_TOKEN"},598 "env": {599 "ADAPTER_MODEL": "username/my-finetuned-model",600 "BASE_MODEL": "Qwen/Qwen2.5-0.5B",601 "OUTPUT_REPO": "username/my-model-gguf"602 }603})604```605606## Common Training Patterns607608See `references/training_patterns.md` for detailed examples including:609- Quick demo (5-10 minutes)610- Production with checkpoints611- Multi-GPU training612- DPO training (preference learning)613- GRPO training (online RL)614615## Common Failure Modes616617### Out of Memory (OOM)618619**Fix (try in order):**6201. Reduce batch size: `per_device_train_batch_size=1`, increase `gradient_accumulation_steps=8`. Effective batch size is `per_device_train_batch_size` x `gradient_accumulation_steps`. For best performance keep effective batch size close to 128.6212. Enable: `gradient_checkpointing=True`6223. Upgrade hardware: t4-small → l4x1, a10g-small → a10g-large etc.623624### Dataset Misformatted625626**Fix:**6271. Validate first with dataset inspector:628 ```bash629 uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \630 --dataset name --split train631 ```6322. Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)6333. Apply mapping code from inspector output if needed634635### Job Timeout636637**Fix:**6381. Check logs for actual runtime: `hf_jobs("logs", {"job_id": "..."})`6392. Increase timeout with buffer: `"timeout": "3h"` (add 30% to estimated time)6403. Or reduce training: lower `num_train_epochs`, use smaller dataset, enable `max_steps`6414. Save checkpoints: `save_strategy="steps"`, `save_steps=500`, `hub_strategy="every_save"`642643**Note:** Default 30min is insufficient for real training. Minimum 1-2 hours.644645### Hub Push Failures646647**Fix:**6481. Add to job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`6492. Add to config: `push_to_hub=True`, `hub_model_id="username/model-name"`6503. Verify auth: `mcp__huggingface__hf_whoami()`6514. Check token has write permissions and repo exists (or set `hub_private_repo=True`)652653### Missing Dependencies654655**Fix:**656Add to PEP 723 header:657```python658# /// script659# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]660# ///661```662663## Troubleshooting664665**Common issues:**666- Job times out → Increase timeout, reduce epochs/dataset, use smaller model/LoRA667- Model not saved to Hub → Check push_to_hub=True, hub_model_id, secrets=HF_TOKEN668- Out of Memory (OOM) → Reduce batch size, increase gradient accumulation, enable LoRA, use larger GPU669- Dataset format error → Validate with dataset inspector (see Dataset Validation section)670- Import/module errors → Add PEP 723 header with dependencies, verify format671- Authentication errors → Check `mcp__huggingface__hf_whoami()`, token permissions, secrets parameter672673**See:** `references/troubleshooting.md` for complete troubleshooting guide674675## Resources676677### References (In This Skill)678- `references/training_methods.md` - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modeling679- `references/training_patterns.md` - Common training patterns and examples680- `references/unsloth.md` - Unsloth for fast VLM training (~2x speed, 60% less VRAM)681- `references/gguf_conversion.md` - Complete GGUF conversion guide682- `references/trackio_guide.md` - Trackio monitoring setup683- `references/hardware_guide.md` - Hardware specs and selection684- `references/hub_saving.md` - Hub authentication troubleshooting685- `references/troubleshooting.md` - Common issues and solutions686687### Scripts (In This Skill)688- `scripts/train_sft_example.py` - Production SFT template689- `scripts/train_dpo_example.py` - Production DPO template690- `scripts/train_grpo_example.py` - Production GRPO template691- `scripts/unsloth_sft_example.py` - Unsloth text LLM training template (faster, less VRAM)692- `scripts/estimate_cost.py` - Estimate time and cost (offer when appropriate)693- `scripts/convert_to_gguf.py` - Complete GGUF conversion script694695### External Scripts696- [Dataset Inspector](https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py) - Validate dataset format before training (use via `uv run` or `hf_jobs`)697698### External Links699- [TRL Documentation](https://huggingface.co/docs/trl)700- [TRL Jobs Training Guide](https://huggingface.co/docs/trl/en/jobs_training)701- [TRL Jobs Package](https://github.com/huggingface/trl-jobs)702- [HF Jobs Documentation](https://huggingface.co/docs/huggingface_hub/guides/jobs)703- [TRL Example Scripts](https://github.com/huggingface/trl/tree/main/examples/scripts)704- [UV Scripts Guide](https://docs.astral.sh/uv/guides/scripts/)705- [UV Scripts Organization](https://huggingface.co/uv-scripts)706707## Key Takeaways7087091. **Submit scripts inline** - The `script` parameter accepts Python code directly; no file saving required unless user requests7102. **Jobs are asynchronous** - Don't wait/poll; let user check when ready7113. **Always set timeout** - Default 30 min is insufficient; minimum 1-2 hours recommended7124. **Always enable Hub push** - Environment is ephemeral; without push, all results lost7135. **Include Trackio** - Use example scripts as templates for real-time monitoring7146. **Offer cost estimation** - When parameters are known, use `scripts/estimate_cost.py`7157. **Use UV scripts (Approach 1)** - Default to `hf_jobs("uv", {...})` with inline scripts; TRL maintained scripts for standard training; avoid bash `trl-jobs` commands in Claude Code7168. **Use hf_doc_fetch/hf_doc_search** for latest TRL documentation7179. **Validate dataset format** before training with dataset inspector (see Dataset Validation section)71810. **Choose appropriate hardware** for model size; use LoRA for models >7B719
Full transparency — inspect the skill content before installing.