This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks.
Add this skill
npx mdskills install huggingface/hugging-face-jobsComprehensive guide for running workloads on Hugging Face cloud infrastructure with detailed token security
1---2name: hugging-face-jobs3description: This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.4license: Complete terms in LICENSE.txt5---67# Running Workloads on Hugging Face Jobs89## Overview1011Run any workload on fully managed Hugging Face infrastructure. No local setup required—jobs run on cloud CPUs, GPUs, or TPUs and can persist results to the Hugging Face Hub.1213**Common use cases:**14- **Data Processing** - Transform, filter, or analyze large datasets15- **Batch Inference** - Run inference on thousands of samples16- **Experiments & Benchmarks** - Reproducible ML experiments17- **Model Training** - Fine-tune models (see `model-trainer` skill for TRL-specific training)18- **Synthetic Data Generation** - Generate datasets using LLMs19- **Development & Testing** - Test code without local GPU setup20- **Scheduled Jobs** - Automate recurring tasks2122**For model training specifically:** See the `model-trainer` skill for TRL-based training workflows.2324## When to Use This Skill2526Use this skill when users want to:27- Run Python workloads on cloud infrastructure28- Execute jobs without local GPU/TPU setup29- Process data at scale30- Run batch inference or experiments31- Schedule recurring tasks32- Use GPUs/TPUs for any workload33- Persist results to the Hugging Face Hub3435## Key Directives3637When assisting with jobs:38391. **ALWAYS use `hf_jobs()` MCP tool** - Submit jobs using `hf_jobs("uv", {...})` or `hf_jobs("run", {...})`. The `script` parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to `hf_jobs()`.40412. **Always handle authentication** - Jobs that interact with the Hub require `HF_TOKEN` via secrets. See Token Usage section below.42433. **Provide job details after submission** - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.44454. **Set appropriate timeouts** - Default 30min may be insufficient for long-running tasks.4647## Prerequisites Checklist4849Before starting any job, verify:5051### ✅ **Account & Authentication**52- Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)53- Authenticated login: Check with `hf_whoami()`54- **HF_TOKEN for Hub Access** ⚠️ CRITICAL - Required for any Hub operations (push models/datasets, download private repos, etc.)55- Token must have appropriate permissions (read for downloads, write for uploads)5657### ✅ **Token Usage** (See Token Usage section for details)5859**When tokens are required:**60- Pushing models/datasets to Hub61- Accessing private repositories62- Using Hub APIs in scripts63- Any authenticated Hub operations6465**How to provide tokens:**66```python67{68 "secrets": {"HF_TOKEN": "$HF_TOKEN"} # Recommended: automatic token69}70```7172**⚠️ CRITICAL:** The `$HF_TOKEN` placeholder is automatically replaced with your logged-in token. Never hardcode tokens in scripts.7374## Token Usage Guide7576### Understanding Tokens7778**What are HF Tokens?**79- Authentication credentials for Hugging Face Hub80- Required for authenticated operations (push, private repos, API access)81- Stored securely on your machine after `hf auth login`8283**Token Types:**84- **Read Token** - Can download models/datasets, read private repos85- **Write Token** - Can push models/datasets, create repos, modify content86- **Organization Token** - Can act on behalf of an organization8788### When Tokens Are Required8990**Always Required:**91- Pushing models/datasets to Hub92- Accessing private repositories93- Creating new repositories94- Modifying existing repositories95- Using Hub APIs programmatically9697**Not Required:**98- Downloading public models/datasets99- Running jobs that don't interact with Hub100- Reading public repository information101102### How to Provide Tokens to Jobs103104#### Method 1: Automatic Token (Recommended)105106```python107hf_jobs("uv", {108 "script": "your_script.py",109 "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Automatic replacement110})111```112113**How it works:**114- `$HF_TOKEN` is a placeholder that gets replaced with your actual token115- Uses the token from your logged-in session (`hf auth login`)116- Most secure and convenient method117- Token is encrypted server-side when passed as a secret118119**Benefits:**120- No token exposure in code121- Uses your current login session122- Automatically updated if you re-login123- Works seamlessly with MCP tools124125#### Method 2: Explicit Token (Not Recommended)126127```python128hf_jobs("uv", {129 "script": "your_script.py",130 "secrets": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Hardcoded token131})132```133134**When to use:**135- Only if automatic token doesn't work136- Testing with a specific token137- Organization tokens (use with caution)138139**Security concerns:**140- Token visible in code/logs141- Must manually update if token rotates142- Risk of token exposure143144#### Method 3: Environment Variable (Less Secure)145146```python147hf_jobs("uv", {148 "script": "your_script.py",149 "env": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Less secure than secrets150})151```152153**Difference from secrets:**154- `env` variables are visible in job logs155- `secrets` are encrypted server-side156- Always prefer `secrets` for tokens157158### Using Tokens in Scripts159160**In your Python script, tokens are available as environment variables:**161162```python163# /// script164# dependencies = ["huggingface-hub"]165# ///166167import os168from huggingface_hub import HfApi169170# Token is automatically available if passed via secrets171token = os.environ.get("HF_TOKEN")172173# Use with Hub API174api = HfApi(token=token)175176# Or let huggingface_hub auto-detect177api = HfApi() # Automatically uses HF_TOKEN env var178```179180**Best practices:**181- Don't hardcode tokens in scripts182- Use `os.environ.get("HF_TOKEN")` to access183- Let `huggingface_hub` auto-detect when possible184- Verify token exists before Hub operations185186### Token Verification187188**Check if you're logged in:**189```python190from huggingface_hub import whoami191user_info = whoami() # Returns your username if authenticated192```193194**Verify token in job:**195```python196import os197assert "HF_TOKEN" in os.environ, "HF_TOKEN not found!"198token = os.environ["HF_TOKEN"]199print(f"Token starts with: {token[:7]}...") # Should start with "hf_"200```201202### Common Token Issues203204**Error: 401 Unauthorized**205- **Cause:** Token missing or invalid206- **Fix:** Add `secrets={"HF_TOKEN": "$HF_TOKEN"}` to job config207- **Verify:** Check `hf_whoami()` works locally208209**Error: 403 Forbidden**210- **Cause:** Token lacks required permissions211- **Fix:** Ensure token has write permissions for push operations212- **Check:** Token type at https://huggingface.co/settings/tokens213214**Error: Token not found in environment**215- **Cause:** `secrets` not passed or wrong key name216- **Fix:** Use `secrets={"HF_TOKEN": "$HF_TOKEN"}` (not `env`)217- **Verify:** Script checks `os.environ.get("HF_TOKEN")`218219**Error: Repository access denied**220- **Cause:** Token doesn't have access to private repo221- **Fix:** Use token from account with access222- **Check:** Verify repo visibility and your permissions223224### Token Security Best Practices2252261. **Never commit tokens** - Use `$HF_TOKEN` placeholder or environment variables2272. **Use secrets, not env** - Secrets are encrypted server-side2283. **Rotate tokens regularly** - Generate new tokens periodically2294. **Use minimal permissions** - Create tokens with only needed permissions2305. **Don't share tokens** - Each user should use their own token2316. **Monitor token usage** - Check token activity in Hub settings232233### Complete Token Example234235```python236# Example: Push results to Hub237hf_jobs("uv", {238 "script": """239# /// script240# dependencies = ["huggingface-hub", "datasets"]241# ///242243import os244from huggingface_hub import HfApi245from datasets import Dataset246247# Verify token is available248assert "HF_TOKEN" in os.environ, "HF_TOKEN required!"249250# Use token for Hub operations251api = HfApi(token=os.environ["HF_TOKEN"])252253# Create and push dataset254data = {"text": ["Hello", "World"]}255dataset = Dataset.from_dict(data)256dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])257258print("✅ Dataset pushed successfully!")259""",260 "flavor": "cpu-basic",261 "timeout": "30m",262 "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided securely263})264```265266## Quick Start: Two Approaches267268### Approach 1: UV Scripts (Recommended)269270UV scripts use PEP 723 inline dependencies for clean, self-contained workloads.271272**MCP Tool:**273```python274hf_jobs("uv", {275 "script": """276# /// script277# dependencies = ["transformers", "torch"]278# ///279280from transformers import pipeline281import torch282283# Your workload here284classifier = pipeline("sentiment-analysis")285result = classifier("I love Hugging Face!")286print(result)287""",288 "flavor": "cpu-basic",289 "timeout": "30m"290})291```292293**CLI Equivalent:**294```bash295hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m296```297298**Python API:**299```python300from huggingface_hub import run_uv_job301run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")302```303304**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline, no file saving required305306**When to use:** Default choice for all workloads, custom logic, any scenario requiring `hf_jobs()`307308#### Custom Docker Images for UV Scripts309310By default, UV scripts use `ghcr.io/astral-sh/uv:python3.12-bookworm-slim`. For ML workloads with complex dependencies, use pre-built images:311312```python313hf_jobs("uv", {314 "script": "inference.py",315 "image": "vllm/vllm-openai:latest", # Pre-built image with vLLM316 "flavor": "a10g-large"317})318```319320**CLI:**321```bash322hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py323```324325**Benefits:** Faster startup, pre-installed dependencies, optimized for specific frameworks326327#### Python Version328329By default, UV scripts use Python 3.12. Specify a different version:330331```python332hf_jobs("uv", {333 "script": "my_script.py",334 "python": "3.11", # Use Python 3.11335 "flavor": "cpu-basic"336})337```338339**Python API:**340```python341from huggingface_hub import run_uv_job342run_uv_job("my_script.py", python="3.11")343```344345#### Working with Scripts346347⚠️ **Important:** There are *two* "script path" stories depending on how you run Jobs:348349- **Using the `hf_jobs()` MCP tool (recommended in this repo)**: the `script` value must be **inline code** (a string) or a **URL**. A local filesystem path (like `"./scripts/foo.py"`) won't exist inside the remote container.350- **Using the `hf jobs uv run` CLI**: local file paths **do work** (the CLI uploads your script).351352**Common mistake with `hf_jobs()` MCP tool:**353354```python355# ❌ Will fail (remote container can't see your local path)356hf_jobs("uv", {"script": "./scripts/foo.py"})357```358359**Correct patterns with `hf_jobs()` MCP tool:**360361```python362# ✅ Inline: read the local script file and pass its *contents*363from pathlib import Path364script = Path("hf-jobs/scripts/foo.py").read_text()365hf_jobs("uv", {"script": script})366367# ✅ URL: host the script somewhere reachable368hf_jobs("uv", {"script": "https://huggingface.co/datasets/uv-scripts/.../raw/main/foo.py"})369370# ✅ URL from GitHub371hf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})372```373374**CLI equivalent (local paths supported):**375376```bash377hf jobs uv run ./scripts/foo.py -- --your --args378```379380#### Adding Dependencies at Runtime381382Add extra dependencies beyond what's in the PEP 723 header:383384```python385hf_jobs("uv", {386 "script": "inference.py",387 "dependencies": ["transformers", "torch>=2.0"], # Extra deps388 "flavor": "a10g-small"389})390```391392**Python API:**393```python394from huggingface_hub import run_uv_job395run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])396```397398### Approach 2: Docker-Based Jobs399400Run jobs with custom Docker images and commands.401402**MCP Tool:**403```python404hf_jobs("run", {405 "image": "python:3.12",406 "command": ["python", "-c", "print('Hello from HF Jobs!')"],407 "flavor": "cpu-basic",408 "timeout": "30m"409})410```411412**CLI Equivalent:**413```bash414hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"415```416417**Python API:**418```python419from huggingface_hub import run_job420run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")421```422423**Benefits:** Full Docker control, use pre-built images, run any command424**When to use:** Need specific Docker images, non-Python workloads, complex environments425426**Example with GPU:**427```python428hf_jobs("run", {429 "image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",430 "command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],431 "flavor": "a10g-small",432 "timeout": "1h"433})434```435436**Using Hugging Face Spaces as Images:**437438You can use Docker images from HF Spaces:439```python440hf_jobs("run", {441 "image": "hf.co/spaces/lhoestq/duckdb", # Space as Docker image442 "command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],443 "flavor": "cpu-basic"444})445```446447**CLI:**448```bash449hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"450```451452### Finding More UV Scripts on Hub453454The `uv-scripts` organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:455456```python457# Discover available UV script collections458dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})459460# Explore a specific collection461hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)462```463464**Popular collections:** OCR, classification, synthetic-data, vLLM, dataset-creation465466## Hardware Selection467468> **Reference:** [HF Jobs Hardware Docs](https://huggingface.co/docs/hub/en/spaces-config-reference) (updated 07/2025)469470| Workload Type | Recommended Hardware | Use Case |471|---------------|---------------------|----------|472| Data processing, testing | `cpu-basic`, `cpu-upgrade` | Lightweight tasks |473| Small models, demos | `t4-small` | <1B models, quick tests |474| Medium models | `t4-medium`, `l4x1` | 1-7B models |475| Large models, production | `a10g-small`, `a10g-large` | 7-13B models |476| Very large models | `a100-large` | 13B+ models |477| Batch inference | `a10g-large`, `a100-large` | High-throughput |478| Multi-GPU workloads | `l4x4`, `a10g-largex2`, `a10g-largex4` | Parallel/large models |479| TPU workloads | `v5e-1x1`, `v5e-2x2`, `v5e-2x4` | JAX/Flax, TPU-optimized |480481**All Available Flavors:**482- **CPU:** `cpu-basic`, `cpu-upgrade`483- **GPU:** `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`, `a100-large`484- **TPU:** `v5e-1x1`, `v5e-2x2`, `v5e-2x4`485486**Guidelines:**487- Start with smaller hardware for testing488- Scale up based on actual needs489- Use multi-GPU for parallel workloads or large models490- Use TPUs for JAX/Flax workloads491- See `references/hardware_guide.md` for detailed specifications492493## Critical: Saving Results494495**⚠️ EPHEMERAL ENVIRONMENT—MUST PERSIST RESULTS**496497The Jobs environment is temporary. All files are deleted when the job ends. If results aren't persisted, **ALL WORK IS LOST**.498499### Persistence Options500501**1. Push to Hugging Face Hub (Recommended)**502503```python504# Push models505model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])506507# Push datasets508dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])509510# Push artifacts511api.upload_file(512 path_or_fileobj="results.json",513 path_in_repo="results.json",514 repo_id="username/results",515 token=os.environ["HF_TOKEN"]516)517```518519**2. Use External Storage**520521```python522# Upload to S3, GCS, etc.523import boto3524s3 = boto3.client('s3')525s3.upload_file('results.json', 'my-bucket', 'results.json')526```527528**3. Send Results via API**529530```python531# POST results to your API532import requests533requests.post("https://your-api.com/results", json=results)534```535536### Required Configuration for Hub Push537538**In job submission:**539```python540{541 "secrets": {"HF_TOKEN": "$HF_TOKEN"} # Enables authentication542}543```544545**In script:**546```python547import os548from huggingface_hub import HfApi549550# Token automatically available from secrets551api = HfApi(token=os.environ.get("HF_TOKEN"))552553# Push your results554api.upload_file(...)555```556557### Verification Checklist558559Before submitting:560- [ ] Results persistence method chosen561- [ ] `secrets={"HF_TOKEN": "$HF_TOKEN"}` if using Hub562- [ ] Script handles missing token gracefully563- [ ] Test persistence path works564565**See:** `references/hub_saving.md` for detailed Hub persistence guide566567## Timeout Management568569**⚠️ DEFAULT: 30 MINUTES**570571Jobs automatically stop after the timeout. For long-running tasks like training, always set a custom timeout.572573### Setting Timeouts574575**MCP Tool:**576```python577{578 "timeout": "2h" # 2 hours579}580```581582**Supported formats:**583- Integer/float: seconds (e.g., `300` = 5 minutes)584- String with suffix: `"5m"` (minutes), `"2h"` (hours), `"1d"` (days)585- Examples: `"90m"`, `"2h"`, `"1.5h"`, `300`, `"1d"`586587**Python API:**588```python589from huggingface_hub import run_job, run_uv_job590591run_job(image="python:3.12", command=[...], timeout="2h")592run_uv_job("script.py", timeout=7200) # 2 hours in seconds593```594595### Timeout Guidelines596597| Scenario | Recommended | Notes |598|----------|-------------|-------|599| Quick test | 10-30 min | Verify setup |600| Data processing | 1-2 hours | Depends on data size |601| Batch inference | 2-4 hours | Large batches |602| Experiments | 4-8 hours | Multiple runs |603| Long-running | 8-24 hours | Production workloads |604605**Always add 20-30% buffer** for setup, network delays, and cleanup.606607**On timeout:** Job killed immediately, all unsaved progress lost608609## Cost Estimation610611**General guidelines:**612613```614Total Cost = (Hours of runtime) × (Cost per hour)615```616617**Example calculations:**618619**Quick test:**620- Hardware: cpu-basic ($0.10/hour)621- Time: 15 minutes (0.25 hours)622- Cost: $0.03623624**Data processing:**625- Hardware: l4x1 ($2.50/hour)626- Time: 2 hours627- Cost: $5.00628629**Batch inference:**630- Hardware: a10g-large ($5/hour)631- Time: 4 hours632- Cost: $20.00633634**Cost optimization tips:**6351. Start small - Test on cpu-basic or t4-small6362. Monitor runtime - Set appropriate timeouts6373. Use checkpoints - Resume if job fails6384. Optimize code - Reduce unnecessary compute6395. Choose right hardware - Don't over-provision640641## Monitoring and Tracking642643### Check Job Status644645**MCP Tool:**646```python647# List all jobs648hf_jobs("ps")649650# Inspect specific job651hf_jobs("inspect", {"job_id": "your-job-id"})652653# View logs654hf_jobs("logs", {"job_id": "your-job-id"})655656# Cancel a job657hf_jobs("cancel", {"job_id": "your-job-id"})658```659660**Python API:**661```python662from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job663664# List your jobs665jobs = list_jobs()666667# List running jobs only668running = [j for j in list_jobs() if j.status.stage == "RUNNING"]669670# Inspect specific job671job_info = inspect_job(job_id="your-job-id")672673# View logs674for log in fetch_job_logs(job_id="your-job-id"):675 print(log)676677# Cancel a job678cancel_job(job_id="your-job-id")679```680681**CLI:**682```bash683hf jobs ps # List jobs684hf jobs logs <job-id> # View logs685hf jobs cancel <job-id> # Cancel job686```687688**Remember:** Wait for user to request status checks. Avoid polling repeatedly.689690### Job URLs691692After submission, jobs have monitoring URLs:693```694https://huggingface.co/jobs/username/job-id695```696697View logs, status, and details in the browser.698699### Wait for Multiple Jobs700701```python702import time703from huggingface_hub import inspect_job, run_job704705# Run multiple jobs706jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]707708# Wait for all to complete709for job in jobs:710 while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"):711 time.sleep(10)712```713714## Scheduled Jobs715716Run jobs on a schedule using CRON expressions or predefined schedules.717718**MCP Tool:**719```python720# Schedule a UV script that runs every hour721hf_jobs("scheduled uv", {722 "script": "your_script.py",723 "schedule": "@hourly",724 "flavor": "cpu-basic"725})726727# Schedule with CRON syntax728hf_jobs("scheduled uv", {729 "script": "your_script.py",730 "schedule": "0 9 * * 1", # 9 AM every Monday731 "flavor": "cpu-basic"732})733734# Schedule a Docker-based job735hf_jobs("scheduled run", {736 "image": "python:3.12",737 "command": ["python", "-c", "print('Scheduled!')"],738 "schedule": "@daily",739 "flavor": "cpu-basic"740})741```742743**Python API:**744```python745from huggingface_hub import create_scheduled_job, create_scheduled_uv_job746747# Schedule a Docker job748create_scheduled_job(749 image="python:3.12",750 command=["python", "-c", "print('Running on schedule!')"],751 schedule="@hourly"752)753754# Schedule a UV script755create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")756757# Schedule with GPU758create_scheduled_uv_job(759 "ml_inference.py",760 schedule="0 */6 * * *", # Every 6 hours761 flavor="a10g-small"762)763```764765**Available schedules:**766- `@annually`, `@yearly` - Once per year767- `@monthly` - Once per month768- `@weekly` - Once per week769- `@daily` - Once per day770- `@hourly` - Once per hour771- CRON expression - Custom schedule (e.g., `"*/5 * * * *"` for every 5 minutes)772773**Manage scheduled jobs:**774```python775# MCP Tool776hf_jobs("scheduled ps") # List scheduled jobs777hf_jobs("scheduled inspect", {"job_id": "..."}) # Inspect details778hf_jobs("scheduled suspend", {"job_id": "..."}) # Pause779hf_jobs("scheduled resume", {"job_id": "..."}) # Resume780hf_jobs("scheduled delete", {"job_id": "..."}) # Delete781```782783**Python API for management:**784```python785from huggingface_hub import (786 list_scheduled_jobs,787 inspect_scheduled_job,788 suspend_scheduled_job,789 resume_scheduled_job,790 delete_scheduled_job791)792793# List all scheduled jobs794scheduled = list_scheduled_jobs()795796# Inspect a scheduled job797info = inspect_scheduled_job(scheduled_job_id)798799# Suspend (pause) a scheduled job800suspend_scheduled_job(scheduled_job_id)801802# Resume a scheduled job803resume_scheduled_job(scheduled_job_id)804805# Delete a scheduled job806delete_scheduled_job(scheduled_job_id)807```808809## Webhooks: Trigger Jobs on Events810811Trigger jobs automatically when changes happen in Hugging Face repositories.812813**Python API:**814```python815from huggingface_hub import create_webhook816817# Create webhook that triggers a job when a repo changes818webhook = create_webhook(819 job_id=job.id,820 watched=[821 {"type": "user", "name": "your-username"},822 {"type": "org", "name": "your-org-name"}823 ],824 domains=["repo", "discussion"],825 secret="your-secret"826)827```828829**How it works:**8301. Webhook listens for changes in watched repositories8312. When triggered, the job runs with `WEBHOOK_PAYLOAD` environment variable8323. Your script can parse the payload to understand what changed833834**Use cases:**835- Auto-process new datasets when uploaded836- Trigger inference when models are updated837- Run tests when code changes838- Generate reports on repository activity839840**Access webhook payload in script:**841```python842import os843import json844845payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))846print(f"Event type: {payload.get('event', {}).get('action')}")847```848849See [Webhooks Documentation](https://huggingface.co/docs/huggingface_hub/guides/webhooks) for more details.850851## Common Workload Patterns852853This repository ships ready-to-run UV scripts in `hf-jobs/scripts/`. Prefer using them instead of inventing new templates.854855### Pattern 1: Dataset → Model Responses (vLLM) — `scripts/generate-responses.py`856857**What it does:** loads a Hub dataset (chat `messages` or a `prompt` column), applies a model chat template, generates responses with vLLM, and **pushes** the output dataset + dataset card back to the Hub.858859**Requires:** GPU + **write** token (it pushes a dataset).860861```python862from pathlib import Path863864script = Path("hf-jobs/scripts/generate-responses.py").read_text()865hf_jobs("uv", {866 "script": script,867 "script_args": [868 "username/input-dataset",869 "username/output-dataset",870 "--messages-column", "messages",871 "--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",872 "--temperature", "0.7",873 "--top-p", "0.8",874 "--max-tokens", "2048",875 ],876 "flavor": "a10g-large",877 "timeout": "4h",878 "secrets": {"HF_TOKEN": "$HF_TOKEN"},879})880```881882### Pattern 2: CoT Self-Instruct Synthetic Data — `scripts/cot-self-instruct.py`883884**What it does:** generates synthetic prompts/answers via CoT Self-Instruct, optionally filters outputs (answer-consistency / RIP), then **pushes** the generated dataset + dataset card to the Hub.885886**Requires:** GPU + **write** token (it pushes a dataset).887888```python889from pathlib import Path890891script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()892hf_jobs("uv", {893 "script": script,894 "script_args": [895 "--seed-dataset", "davanstrien/s1k-reasoning",896 "--output-dataset", "username/synthetic-math",897 "--task-type", "reasoning",898 "--num-samples", "5000",899 "--filter-method", "answer-consistency",900 ],901 "flavor": "l4x4",902 "timeout": "8h",903 "secrets": {"HF_TOKEN": "$HF_TOKEN"},904})905```906907### Pattern 3: Streaming Dataset Stats (Polars + HF Hub) — `scripts/finepdfs-stats.py`908909**What it does:** scans parquet directly from Hub (no 300GB download), computes temporal stats, and (optionally) uploads results to a Hub dataset repo.910911**Requires:** CPU is often enough; token needed **only** if you pass `--output-repo` (upload).912913```python914from pathlib import Path915916script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()917hf_jobs("uv", {918 "script": script,919 "script_args": [920 "--limit", "10000",921 "--show-plan",922 "--output-repo", "username/finepdfs-temporal-stats",923 ],924 "flavor": "cpu-upgrade",925 "timeout": "2h",926 "env": {"HF_XET_HIGH_PERFORMANCE": "1"},927 "secrets": {"HF_TOKEN": "$HF_TOKEN"},928})929```930931## Common Failure Modes932933### Out of Memory (OOM)934935**Fix:**9361. Reduce batch size or data chunk size9372. Process data in smaller batches9383. Upgrade hardware: cpu → t4 → a10g → a100939940### Job Timeout941942**Fix:**9431. Check logs for actual runtime9442. Increase timeout with buffer: `"timeout": "3h"`9453. Optimize code for faster execution9464. Process data in chunks947948### Hub Push Failures949950**Fix:**9511. Add to job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`9522. Verify token in script: `assert "HF_TOKEN" in os.environ`9533. Check token permissions9544. Verify repo exists or can be created955956### Missing Dependencies957958**Fix:**959Add to PEP 723 header:960```python961# /// script962# dependencies = ["package1", "package2>=1.0.0"]963# ///964```965966### Authentication Errors967968**Fix:**9691. Check `hf_whoami()` works locally9702. Verify `secrets={"HF_TOKEN": "$HF_TOKEN"}` in job config9713. Re-login: `hf auth login`9724. Check token has required permissions973974## Troubleshooting975976**Common issues:**977- Job times out → Increase timeout, optimize code978- Results not saved → Check persistence method, verify HF_TOKEN979- Out of Memory → Reduce batch size, upgrade hardware980- Import errors → Add dependencies to PEP 723 header981- Authentication errors → Check token, verify secrets parameter982983**See:** `references/troubleshooting.md` for complete troubleshooting guide984985## Resources986987### References (In This Skill)988- `references/token_usage.md` - Complete token usage guide989- `references/hardware_guide.md` - Hardware specs and selection990- `references/hub_saving.md` - Hub persistence guide991- `references/troubleshooting.md` - Common issues and solutions992993### Scripts (In This Skill)994- `scripts/generate-responses.py` - vLLM batch generation: dataset → responses → push to Hub995- `scripts/cot-self-instruct.py` - CoT Self-Instruct synthetic data generation + filtering → push to Hub996- `scripts/finepdfs-stats.py` - Polars streaming stats over `finepdfs-edu` parquet on Hub (optional push)997998### External Links9991000**Official Documentation:**1001- [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs) - Main documentation1002- [HF Jobs CLI Reference](https://huggingface.co/docs/huggingface_hub/guides/cli#hf-jobs) - Command line interface1003- [HF Jobs API Reference](https://huggingface.co/docs/huggingface_hub/package_reference/hf_api) - Python API details1004- [Hardware Flavors Reference](https://huggingface.co/docs/hub/en/spaces-config-reference) - Available hardware10051006**Related Tools:**1007- [UV Scripts Guide](https://docs.astral.sh/uv/guides/scripts/) - PEP 723 inline dependencies1008- [UV Scripts Organization](https://huggingface.co/uv-scripts) - Community UV script collection1009- [HF Hub Authentication](https://huggingface.co/docs/huggingface_hub/quick-start#authentication) - Token setup1010- [Webhooks Documentation](https://huggingface.co/docs/huggingface_hub/guides/webhooks) - Event triggers10111012## Key Takeaways101310141. **Submit scripts inline** - The `script` parameter accepts Python code directly; no file saving required unless user requests10152. **Jobs are asynchronous** - Don't wait/poll; let user check when ready10163. **Always set timeout** - Default 30 min may be insufficient; set appropriate timeout10174. **Always persist results** - Environment is ephemeral; without persistence, all work is lost10185. **Use tokens securely** - Always use `secrets={"HF_TOKEN": "$HF_TOKEN"}` for Hub operations10196. **Choose appropriate hardware** - Start small, scale up based on needs (see hardware guide)10207. **Use UV scripts** - Default to `hf_jobs("uv", {...})` with inline scripts for Python workloads10218. **Handle authentication** - Verify tokens are available before Hub operations10229. **Monitor jobs** - Provide job URLs and status check commands102310. **Optimize costs** - Choose right hardware, set appropriate timeouts10241025## Quick Reference: MCP Tool vs CLI vs Python API10261027| Operation | MCP Tool | CLI | Python API |1028|-----------|----------|-----|------------|1029| Run UV script | `hf_jobs("uv", {...})` | `hf jobs uv run script.py` | `run_uv_job("script.py")` |1030| Run Docker job | `hf_jobs("run", {...})` | `hf jobs run image cmd` | `run_job(image, command)` |1031| List jobs | `hf_jobs("ps")` | `hf jobs ps` | `list_jobs()` |1032| View logs | `hf_jobs("logs", {...})` | `hf jobs logs <id>` | `fetch_job_logs(job_id)` |1033| Cancel job | `hf_jobs("cancel", {...})` | `hf jobs cancel <id>` | `cancel_job(job_id)` |1034| Schedule UV | `hf_jobs("scheduled uv", {...})` | - | `create_scheduled_uv_job()` |1035| Schedule Docker | `hf_jobs("scheduled run", {...})` | - | `create_scheduled_job()` |10361037
Full transparency — inspect the skill content before installing.