How do I install On Call Handoff Patterns?

Install On Call Handoff Patterns with a single command: npx mdskills install sickn33/on-call-handoff-patterns. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support On Call Handoff Patterns?

On Call Handoff Patterns works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

On Call Handoff Patterns

Name: On Call Handoff Patterns: AI Agent Skill
Brand: sickn33
Availability: InStock
Rating: 8 (1 reviews)
Author: sickn33

Verified

Writing & DocsIntermediate

Master on-call shift handoffs with context transfer, escalation procedures, and documentation. Use when transitioning on-call responsibilities, documenting shift summaries, or improving on-call processes.

by @sickn331 downloads13,166Updated 2/20/2026

Add this skill

npx mdskills install sickn33/on-call-handoff-patterns

Fork & Edit

Are you @sickn33? Sign in with GitHub to claim this listing.

Skill Advisor8.0

Comprehensive on-call handoff guidance with excellent templates and actionable checklists

+Provides three detailed templates for different handoff scenarios
+Includes practical checklists for pre-shift, during-shift, and post-shift activities
+Covers escalation procedures and best practices with clear do's and don'ts
-Declares shell/network/write permissions but instructions don't specify when or how to use them

SKILL.md

Edit in Browser

1---
2name: on-call-handoff-patterns
3description: Master on-call shift handoffs with context transfer, escalation procedures, and documentation. Use when transitioning on-call responsibilities, documenting shift summaries, or improving on-call processes.
4---
5 
6# On-Call Handoff Patterns
7 
8Effective patterns for on-call shift transitions, ensuring continuity, context transfer, and reliable incident response across shifts.
9 
10## Do not use this skill when
11 
12- The task is unrelated to on-call handoff patterns
13- You need a different domain or tool outside this scope
14 
15## Instructions
16 
17- Clarify goals, constraints, and required inputs.
18- Apply relevant best practices and validate outcomes.
19- Provide actionable steps and verification.
20- If detailed examples are required, open `resources/implementation-playbook.md`.
21 
22## Use this skill when
23 
24- Transitioning on-call responsibilities
25- Writing shift handoff summaries
26- Documenting ongoing investigations
27- Establishing on-call rotation procedures
28- Improving handoff quality
29- Onboarding new on-call engineers
30 
31## Core Concepts
32 
33### 1. Handoff Components
34 
35| Component | Purpose |
36|-----------|---------|
37| **Active Incidents** | What's currently broken |
38| **Ongoing Investigations** | Issues being debugged |
39| **Recent Changes** | Deployments, configs |
40| **Known Issues** | Workarounds in place |
41| **Upcoming Events** | Maintenance, releases |
42 
43### 2. Handoff Timing
44 
45```
46Recommended: 30 min overlap between shifts
47 
48Outgoing:
49├── 15 min: Write handoff document
50└── 15 min: Sync call with incoming
51 
52Incoming:
53├── 15 min: Review handoff document
54├── 15 min: Sync call with outgoing
55└── 5 min: Verify alerting setup
56```
57 
58## Templates
59 
60### Template 1: Shift Handoff Document
61 
62```markdown
63# On-Call Handoff: Platform Team
64 
65**Outgoing**: @alice (2024-01-15 to 2024-01-22)
66**Incoming**: @bob (2024-01-22 to 2024-01-29)
67**Handoff Time**: 2024-01-22 09:00 UTC
68 
69---
70 
71## 🔴 Active Incidents
72 
73### None currently active
74No active incidents at handoff time.
75 
76---
77 
78## 🟡 Ongoing Investigations
79 
80### 1. Intermittent API Timeouts (ENG-1234)
81**Status**: Investigating
82**Started**: 2024-01-20
83**Impact**: ~0.1% of requests timing out
84 
85**Context**:
86- Timeouts correlate with database backup window (02:00-03:00 UTC)
87- Suspect backup process causing lock contention
88- Added extra logging in PR #567 (deployed 01/21)
89 
90**Next Steps**:
91- [ ] Review new logs after tonight's backup
92- [ ] Consider moving backup window if confirmed
93 
94**Resources**:
95- Dashboard: [API Latency](https://grafana/d/api-latency)
96- Thread: #platform-eng (01/20, 14:32)
97 
98---
99 
100### 2. Memory Growth in Auth Service (ENG-1235)
101**Status**: Monitoring
102**Started**: 2024-01-18
103**Impact**: None yet (proactive)
104 
105**Context**:
106- Memory usage growing ~5% per day
107- No memory leak found in profiling
108- Suspect connection pool not releasing properly
109 
110**Next Steps**:
111- [ ] Review heap dump from 01/21
112- [ ] Consider restart if usage > 80%
113 
114**Resources**:
115- Dashboard: [Auth Service Memory](https://grafana/d/auth-memory)
116- Analysis doc: [Memory Investigation](https://docs/eng-1235)
117 
118---
119 
120## 🟢 Resolved This Shift
121 
122### Payment Service Outage (2024-01-19)
123- **Duration**: 23 minutes
124- **Root Cause**: Database connection exhaustion
125- **Resolution**: Rolled back v2.3.4, increased pool size
126- **Postmortem**: [POSTMORTEM-89](https://docs/postmortem-89)
127- **Follow-up tickets**: ENG-1230, ENG-1231
128 
129---
130 
131## 📋 Recent Changes
132 
133### Deployments
134| Service | Version | Time | Notes |
135|---------|---------|------|-------|
136| api-gateway | v3.2.1 | 01/21 14:00 | Bug fix for header parsing |
137| user-service | v2.8.0 | 01/20 10:00 | New profile features |
138| auth-service | v4.1.2 | 01/19 16:00 | Security patch |
139 
140### Configuration Changes
141- 01/21: Increased API rate limit from 1000 to 1500 RPS
142- 01/20: Updated database connection pool max from 50 to 75
143 
144### Infrastructure
145- 01/20: Added 2 nodes to Kubernetes cluster
146- 01/19: Upgraded Redis from 6.2 to 7.0
147 
148---
149 
150## ⚠️ Known Issues & Workarounds
151 
152### 1. Slow Dashboard Loading
153**Issue**: Grafana dashboards slow on Monday mornings
154**Workaround**: Wait 5 min after 08:00 UTC for cache warm-up
155**Ticket**: OPS-456 (P3)
156 
157### 2. Flaky Integration Test
158**Issue**: `test_payment_flow` fails intermittently in CI
159**Workaround**: Re-run failed job (usually passes on retry)
160**Ticket**: ENG-1200 (P2)
161 
162---
163 
164## 📅 Upcoming Events
165 
166| Date | Event | Impact | Contact |
167|------|-------|--------|---------|
168| 01/23 02:00 | Database maintenance | 5 min read-only | @dba-team |
169| 01/24 14:00 | Major release v5.0 | Monitor closely | @release-team |
170| 01/25 | Marketing campaign | 2x traffic expected | @platform |
171 
172---
173 
174## 📞 Escalation Reminders
175 
176| Issue Type | First Escalation | Second Escalation |
177|------------|------------------|-------------------|
178| Payment issues | @payments-oncall | @payments-manager |
179| Auth issues | @auth-oncall | @security-team |
180| Database issues | @dba-team | @infra-manager |
181| Unknown/severe | @engineering-manager | @vp-engineering |
182 
183---
184 
185## 🔧 Quick Reference
186 
187### Common Commands
188```bash
189# Check service health
190kubectl get pods -A | grep -v Running
191 
192# Recent deployments
193kubectl get events --sort-by='.lastTimestamp' | tail -20
194 
195# Database connections
196psql -c "SELECT count(*) FROM pg_stat_activity;"
197 
198# Clear cache (emergency only)
199redis-cli FLUSHDB
200```
201 
202### Important Links
203- [Runbooks](https://wiki/runbooks)
204- [Service Catalog](https://wiki/services)
205- [Incident Slack](https://slack.com/incidents)
206- [PagerDuty](https://pagerduty.com/schedules)
207 
208---
209 
210## Handoff Checklist
211 
212### Outgoing Engineer
213- [x] Document active incidents
214- [x] Document ongoing investigations
215- [x] List recent changes
216- [x] Note known issues
217- [x] Add upcoming events
218- [x] Sync with incoming engineer
219 
220### Incoming Engineer
221- [ ] Read this document
222- [ ] Join sync call
223- [ ] Verify PagerDuty is routing to you
224- [ ] Verify Slack notifications working
225- [ ] Check VPN/access working
226- [ ] Review critical dashboards
227```
228 
229### Template 2: Quick Handoff (Async)
230 
231```markdown
232# Quick Handoff: @alice → @bob
233 
234## TL;DR
235- No active incidents
236- 1 investigation ongoing (API timeouts, see ENG-1234)
237- Major release tomorrow (01/24) - be ready for issues
238 
239## Watch List
2401. API latency around 02:00-03:00 UTC (backup window)
2412. Auth service memory (restart if > 80%)
242 
243## Recent
244- Deployed api-gateway v3.2.1 yesterday (stable)
245- Increased rate limits to 1500 RPS
246 
247## Coming Up
248- 01/23 02:00 - DB maintenance (5 min read-only)
249- 01/24 14:00 - v5.0 release
250 
251## Questions?
252I'll be available on Slack until 17:00 today.
253```
254 
255### Template 3: Incident Handoff (Mid-Incident)
256 
257```markdown
258# INCIDENT HANDOFF: Payment Service Degradation
259 
260**Incident Start**: 2024-01-22 08:15 UTC
261**Current Status**: Mitigating
262**Severity**: SEV2
263 
264---
265 
266## Current State
267- Error rate: 15% (down from 40%)
268- Mitigation in progress: scaling up pods
269- ETA to resolution: ~30 min
270 
271## What We Know
2721. Root cause: Memory pressure on payment-service pods
2732. Triggered by: Unusual traffic spike (3x normal)
2743. Contributing: Inefficient query in checkout flow
275 
276## What We've Done
277- Scaled payment-service from 5 → 15 pods
278- Enabled rate limiting on checkout endpoint
279- Disabled non-critical features
280 
281## What Needs to Happen
2821. Monitor error rate - should reach <1% in ~15 min
2832. If not improving, escalate to @payments-manager
2843. Once stable, begin root cause investigation
285 
286## Key People
287- Incident Commander: @alice (handing off)
288- Comms Lead: @charlie
289- Technical Lead: @bob (incoming)
290 
291## Communication
292- Status page: Updated at 08:45
293- Customer support: Notified
294- Exec team: Aware
295 
296## Resources
297- Incident channel: #inc-20240122-payment
298- Dashboard: [Payment Service](https://grafana/d/payments)
299- Runbook: [Payment Degradation](https://wiki/runbooks/payments)
300 
301---
302 
303**Incoming on-call (@bob) - Please confirm you have:**
304- [ ] Joined #inc-20240122-payment
305- [ ] Access to dashboards
306- [ ] Understand current state
307- [ ] Know escalation path
308```
309 
310## Handoff Sync Meeting
311 
312### Agenda (15 minutes)
313 
314```markdown
315## Handoff Sync: @alice → @bob
316 
3171. **Active Issues** (5 min)
318   - Walk through any ongoing incidents
319   - Discuss investigation status
320   - Transfer context and theories
321 
3222. **Recent Changes** (3 min)
323   - Deployments to watch
324   - Config changes
325   - Known regressions
326 
3273. **Upcoming Events** (3 min)
328   - Maintenance windows
329   - Expected traffic changes
330   - Releases planned
331 
3324. **Questions** (4 min)
333   - Clarify anything unclear
334   - Confirm access and alerting
335   - Exchange contact info
336```
337 
338## On-Call Best Practices
339 
340### Before Your Shift
341 
342```markdown
343## Pre-Shift Checklist
344 
345### Access Verification
346- [ ] VPN working
347- [ ] kubectl access to all clusters
348- [ ] Database read access
349- [ ] Log aggregator access (Splunk/Datadog)
350- [ ] PagerDuty app installed and logged in
351 
352### Alerting Setup
353- [ ] PagerDuty schedule shows you as primary
354- [ ] Phone notifications enabled
355- [ ] Slack notifications for incident channels
356- [ ] Test alert received and acknowledged
357 
358### Knowledge Refresh
359- [ ] Review recent incidents (past 2 weeks)
360- [ ] Check service changelog
361- [ ] Skim critical runbooks
362- [ ] Know escalation contacts
363 
364### Environment Ready
365- [ ] Laptop charged and accessible
366- [ ] Phone charged
367- [ ] Quiet space available for calls
368- [ ] Secondary contact identified (if traveling)
369```
370 
371### During Your Shift
372 
373```markdown
374## Daily On-Call Routine
375 
376### Morning (start of day)
377- [ ] Check overnight alerts
378- [ ] Review dashboards for anomalies
379- [ ] Check for any P0/P1 tickets created
380- [ ] Skim incident channels for context
381 
382### Throughout Day
383- [ ] Respond to alerts within SLA
384- [ ] Document investigation progress
385- [ ] Update team on significant issues
386- [ ] Triage incoming pages
387 
388### End of Day
389- [ ] Hand off any active issues
390- [ ] Update investigation docs
391- [ ] Note anything for next shift
392```
393 
394### After Your Shift
395 
396```markdown
397## Post-Shift Checklist
398 
399- [ ] Complete handoff document
400- [ ] Sync with incoming on-call
401- [ ] Verify PagerDuty routing changed
402- [ ] Close/update investigation tickets
403- [ ] File postmortems for any incidents
404- [ ] Take time off if shift was stressful
405```
406 
407## Escalation Guidelines
408 
409### When to Escalate
410 
411```markdown
412## Escalation Triggers
413 
414### Immediate Escalation
415- SEV1 incident declared
416- Data breach suspected
417- Unable to diagnose within 30 min
418- Customer or legal escalation received
419 
420### Consider Escalation
421- Issue spans multiple teams
422- Requires expertise you don't have
423- Business impact exceeds threshold
424- You're uncertain about next steps
425 
426### How to Escalate
4271. Page the appropriate escalation path
4282. Provide brief context in Slack
4293. Stay engaged until escalation acknowledges
4304. Hand off cleanly, don't just disappear
431```
432 
433## Best Practices
434 
435### Do's
436- **Document everything** - Future you will thank you
437- **Escalate early** - Better safe than sorry
438- **Take breaks** - Alert fatigue is real
439- **Keep handoffs synchronous** - Async loses context
440- **Test your setup** - Before incidents, not during
441 
442### Don'ts
443- **Don't skip handoffs** - Context loss causes incidents
444- **Don't hero** - Escalate when needed
445- **Don't ignore alerts** - Even if they seem minor
446- **Don't work sick** - Swap shifts instead
447- **Don't disappear** - Stay reachable during shift
448 
449## Resources
450 
451- [Google SRE - Being On-Call](https://sre.google/sre-book/being-on-call/)
452- [PagerDuty On-Call Guide](https://www.pagerduty.com/resources/learn/on-call-management/)
453- [Increment On-Call Issue](https://increment.com/on-call/)
454

Full transparency — inspect the skill content before installing.

New to skill.md files?

See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.

Read the guide →