Design durable workflows with Temporal for distributed systems. Covers workflow vs activity separation, saga patterns, state management, and determinism constraints. Use when building long-running processes, distributed transactions, or microservice orchestration.
Add this skill
npx mdskills install sickn33/workflow-orchestration-patternsComprehensive workflow orchestration reference with clear patterns and best practices
1---2name: workflow-orchestration-patterns3description: Design durable workflows with Temporal for distributed systems. Covers workflow vs activity separation, saga patterns, state management, and determinism constraints. Use when building long-running processes, distributed transactions, or microservice orchestration.4---56# Workflow Orchestration Patterns78Master workflow orchestration architecture with Temporal, covering fundamental design decisions, resilience patterns, and best practices for building reliable distributed systems.910## Use this skill when1112- Working on workflow orchestration patterns tasks or workflows13- Needing guidance, best practices, or checklists for workflow orchestration patterns1415## Do not use this skill when1617- The task is unrelated to workflow orchestration patterns18- You need a different domain or tool outside this scope1920## Instructions2122- Clarify goals, constraints, and required inputs.23- Apply relevant best practices and validate outcomes.24- Provide actionable steps and verification.25- If detailed examples are required, open `resources/implementation-playbook.md`.2627## When to Use Workflow Orchestration2829### Ideal Use Cases (Source: docs.temporal.io)3031- **Multi-step processes** spanning machines/services/databases32- **Distributed transactions** requiring all-or-nothing semantics33- **Long-running workflows** (hours to years) with automatic state persistence34- **Failure recovery** that must resume from last successful step35- **Business processes**: bookings, orders, campaigns, approvals36- **Entity lifecycle management**: inventory tracking, account management, cart workflows37- **Infrastructure automation**: CI/CD pipelines, provisioning, deployments38- **Human-in-the-loop** systems requiring timeouts and escalations3940### When NOT to Use4142- Simple CRUD operations (use direct API calls)43- Pure data processing pipelines (use Airflow, batch processing)44- Stateless request/response (use standard APIs)45- Real-time streaming (use Kafka, event processors)4647## Critical Design Decision: Workflows vs Activities4849**The Fundamental Rule** (Source: temporal.io/blog/workflow-engine-principles):5051- **Workflows** = Orchestration logic and decision-making52- **Activities** = External interactions (APIs, databases, network calls)5354### Workflows (Orchestration)5556**Characteristics:**5758- Contain business logic and coordination59- **MUST be deterministic** (same inputs → same outputs)60- **Cannot** perform direct external calls61- State automatically preserved across failures62- Can run for years despite infrastructure failures6364**Example workflow tasks:**6566- Decide which steps to execute67- Handle compensation logic68- Manage timeouts and retries69- Coordinate child workflows7071### Activities (External Interactions)7273**Characteristics:**7475- Handle all external system interactions76- Can be non-deterministic (API calls, DB writes)77- Include built-in timeouts and retry logic78- **Must be idempotent** (calling N times = calling once)79- Short-lived (seconds to minutes typically)8081**Example activity tasks:**8283- Call payment gateway API84- Write to database85- Send emails or notifications86- Query external services8788### Design Decision Framework8990```91Does it touch external systems? → Activity92Is it orchestration/decision logic? → Workflow93```9495## Core Workflow Patterns9697### 1. Saga Pattern with Compensation9899**Purpose**: Implement distributed transactions with rollback capability100101**Pattern** (Source: temporal.io/blog/compensating-actions-part-of-a-complete-breakfast-with-sagas):102103```104For each step:105 1. Register compensation BEFORE executing106 2. Execute the step (via activity)107 3. On failure, run all compensations in reverse order (LIFO)108```109110**Example: Payment Workflow**1111121. Reserve inventory (compensation: release inventory)1132. Charge payment (compensation: refund payment)1143. Fulfill order (compensation: cancel fulfillment)115116**Critical Requirements:**117118- Compensations must be idempotent119- Register compensation BEFORE executing step120- Run compensations in reverse order121- Handle partial failures gracefully122123### 2. Entity Workflows (Actor Model)124125**Purpose**: Long-lived workflow representing single entity instance126127**Pattern** (Source: docs.temporal.io/evaluate/use-cases-design-patterns):128129- One workflow execution = one entity (cart, account, inventory item)130- Workflow persists for entity lifetime131- Receives signals for state changes132- Supports queries for current state133134**Example Use Cases:**135136- Shopping cart (add items, checkout, expiration)137- Bank account (deposits, withdrawals, balance checks)138- Product inventory (stock updates, reservations)139140**Benefits:**141142- Encapsulates entity behavior143- Guarantees consistency per entity144- Natural event sourcing145146### 3. Fan-Out/Fan-In (Parallel Execution)147148**Purpose**: Execute multiple tasks in parallel, aggregate results149150**Pattern:**151152- Spawn child workflows or parallel activities153- Wait for all to complete154- Aggregate results155- Handle partial failures156157**Scaling Rule** (Source: temporal.io/blog/workflow-engine-principles):158159- Don't scale individual workflows160- For 1M tasks: spawn 1K child workflows × 1K tasks each161- Keep each workflow bounded162163### 4. Async Callback Pattern164165**Purpose**: Wait for external event or human approval166167**Pattern:**168169- Workflow sends request and waits for signal170- External system processes asynchronously171- Sends signal to resume workflow172- Workflow continues with response173174**Use Cases:**175176- Human approval workflows177- Webhook callbacks178- Long-running external processes179180## State Management and Determinism181182### Automatic State Preservation183184**How Temporal Works** (Source: docs.temporal.io/workflows):185186- Complete program state preserved automatically187- Event History records every command and event188- Seamless recovery from crashes189- Applications restore pre-failure state190191### Determinism Constraints192193**Workflows Execute as State Machines**:194195- Replay behavior must be consistent196- Same inputs → identical outputs every time197198**Prohibited in Workflows** (Source: docs.temporal.io/workflows):199200- ❌ Threading, locks, synchronization primitives201- ❌ Random number generation (`random()`)202- ❌ Global state or static variables203- ❌ System time (`datetime.now()`)204- ❌ Direct file I/O or network calls205- ❌ Non-deterministic libraries206207**Allowed in Workflows**:208209- ✅ `workflow.now()` (deterministic time)210- ✅ `workflow.random()` (deterministic random)211- ✅ Pure functions and calculations212- ✅ Calling activities (non-deterministic operations)213214### Versioning Strategies215216**Challenge**: Changing workflow code while old executions still running217218**Solutions**:2192201. **Versioning API**: Use `workflow.get_version()` for safe changes2212. **New Workflow Type**: Create new workflow, route new executions to it2223. **Backward Compatibility**: Ensure old events replay correctly223224## Resilience and Error Handling225226### Retry Policies227228**Default Behavior**: Temporal retries activities forever229230**Configure Retry**:231232- Initial retry interval233- Backoff coefficient (exponential backoff)234- Maximum interval (cap retry delay)235- Maximum attempts (eventually fail)236237**Non-Retryable Errors**:238239- Invalid input (validation failures)240- Business rule violations241- Permanent failures (resource not found)242243### Idempotency Requirements244245**Why Critical** (Source: docs.temporal.io/activities):246247- Activities may execute multiple times248- Network failures trigger retries249- Duplicate execution must be safe250251**Implementation Strategies**:252253- Idempotency keys (deduplication)254- Check-then-act with unique constraints255- Upsert operations instead of insert256- Track processed request IDs257258### Activity Heartbeats259260**Purpose**: Detect stalled long-running activities261262**Pattern**:263264- Activity sends periodic heartbeat265- Includes progress information266- Timeout if no heartbeat received267- Enables progress-based retry268269## Best Practices270271### Workflow Design2722731. **Keep workflows focused** - Single responsibility per workflow2742. **Small workflows** - Use child workflows for scalability2753. **Clear boundaries** - Workflow orchestrates, activities execute2764. **Test locally** - Use time-skipping test environment277278### Activity Design2792801. **Idempotent operations** - Safe to retry2812. **Short-lived** - Seconds to minutes, not hours2823. **Timeout configuration** - Always set timeouts2834. **Heartbeat for long tasks** - Report progress2845. **Error handling** - Distinguish retryable vs non-retryable285286### Common Pitfalls287288**Workflow Violations**:289290- Using `datetime.now()` instead of `workflow.now()`291- Threading or async operations in workflow code292- Calling external APIs directly from workflow293- Non-deterministic logic in workflows294295**Activity Mistakes**:296297- Non-idempotent operations (can't handle retries)298- Missing timeouts (activities run forever)299- No error classification (retry validation errors)300- Ignoring payload limits (2MB per argument)301302### Operational Considerations303304**Monitoring**:305306- Workflow execution duration307- Activity failure rates308- Retry attempts and backoff309- Pending workflow counts310311**Scalability**:312313- Horizontal scaling with workers314- Task queue partitioning315- Child workflow decomposition316- Activity batching when appropriate317318## Additional Resources319320**Official Documentation**:321322- Temporal Core Concepts: docs.temporal.io/workflows323- Workflow Patterns: docs.temporal.io/evaluate/use-cases-design-patterns324- Best Practices: docs.temporal.io/develop/best-practices325- Saga Pattern: temporal.io/blog/saga-pattern-made-easy326327**Key Principles**:3283291. Workflows = orchestration, Activities = external calls3302. Determinism is non-negotiable for workflows3313. Idempotency is critical for activities3324. State preservation is automatic3335. Design for failure and recovery334
Full transparency — inspect the skill content before installing.