How do I install Slo Implementation?

Install Slo Implementation with a single command: npx mdskills install sickn33/slo-implementation. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support Slo Implementation?

Slo Implementation works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

Slo Implementation

Name: Slo Implementation: AI Agent Skill
Rating: 9 (1 reviews)
Author: sickn33

Verified

Monitoring & DebuggingIntermediate

Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.

by @sickn331 installs0Updated 2/20/2026

Add this skill

npx mdskills install sickn33/slo-implementation

Fork & Edit

Skill Advisor9.0

Comprehensive SRE framework with concrete Prometheus queries, error budget policies, and multi-window alerting

+Provides production-ready Prometheus recording rules and alert configurations
+Includes clear error budget calculations with actionable policy thresholds
+Offers specific SLO targets with downtime tables and burn rate formulas
-Requests network access without clear justification for SLO implementation tasks

SKILL.md

Edit in Browser

1---
2name: slo-implementation
3description: Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.
4---
5 
6# SLO Implementation
7 
8Framework for defining and implementing Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.
9 
10## Do not use this skill when
11 
12- The task is unrelated to slo implementation
13- You need a different domain or tool outside this scope
14 
15## Instructions
16 
17- Clarify goals, constraints, and required inputs.
18- Apply relevant best practices and validate outcomes.
19- Provide actionable steps and verification.
20- If detailed examples are required, open `resources/implementation-playbook.md`.
21 
22## Purpose
23 
24Implement measurable reliability targets using SLIs, SLOs, and error budgets to balance reliability with innovation velocity.
25 
26## Use this skill when
27 
28- Define service reliability targets
29- Measure user-perceived reliability
30- Implement error budgets
31- Create SLO-based alerts
32- Track reliability goals
33 
34## SLI/SLO/SLA Hierarchy
35 
36```
37SLA (Service Level Agreement)
38  ↓ Contract with customers
39SLO (Service Level Objective)
40  ↓ Internal reliability target
41SLI (Service Level Indicator)
42  ↓ Actual measurement
43```
44 
45## Defining SLIs
46 
47### Common SLI Types
48 
49#### 1. Availability SLI
50```promql
51# Successful requests / Total requests
52sum(rate(http_requests_total{status!~"5.."}[28d]))
53/
54sum(rate(http_requests_total[28d]))
55```
56 
57#### 2. Latency SLI
58```promql
59# Requests below latency threshold / Total requests
60sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d]))
61/
62sum(rate(http_request_duration_seconds_count[28d]))
63```
64 
65#### 3. Durability SLI
66```
67# Successful writes / Total writes
68sum(storage_writes_successful_total)
69/
70sum(storage_writes_total)
71```
72 
73**Reference:** See `references/slo-definitions.md`
74 
75## Setting SLO Targets
76 
77### Availability SLO Examples
78 
79| SLO % | Downtime/Month | Downtime/Year |
80|-------|----------------|---------------|
81| 99%   | 7.2 hours      | 3.65 days     |
82| 99.9% | 43.2 minutes   | 8.76 hours    |
83| 99.95%| 21.6 minutes   | 4.38 hours    |
84| 99.99%| 4.32 minutes   | 52.56 minutes |
85 
86### Choose Appropriate SLOs
87 
88**Consider:**
89- User expectations
90- Business requirements
91- Current performance
92- Cost of reliability
93- Competitor benchmarks
94 
95**Example SLOs:**
96```yaml
97slos:
98  - name: api_availability
99    target: 99.9
100    window: 28d
101    sli: |
102      sum(rate(http_requests_total{status!~"5.."}[28d]))
103      /
104      sum(rate(http_requests_total[28d]))
105 
106  - name: api_latency_p95
107    target: 99
108    window: 28d
109    sli: |
110      sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d]))
111      /
112      sum(rate(http_request_duration_seconds_count[28d]))
113```
114 
115## Error Budget Calculation
116 
117### Error Budget Formula
118 
119```
120Error Budget = 1 - SLO Target
121```
122 
123**Example:**
124- SLO: 99.9% availability
125- Error Budget: 0.1% = 43.2 minutes/month
126- Current Error: 0.05% = 21.6 minutes/month
127- Remaining Budget: 50%
128 
129### Error Budget Policy
130 
131```yaml
132error_budget_policy:
133  - remaining_budget: 100%
134    action: Normal development velocity
135  - remaining_budget: 50%
136    action: Consider postponing risky changes
137  - remaining_budget: 10%
138    action: Freeze non-critical changes
139  - remaining_budget: 0%
140    action: Feature freeze, focus on reliability
141```
142 
143**Reference:** See `references/error-budget.md`
144 
145## SLO Implementation
146 
147### Prometheus Recording Rules
148 
149```yaml
150# SLI Recording Rules
151groups:
152  - name: sli_rules
153    interval: 30s
154    rules:
155      # Availability SLI
156      - record: sli:http_availability:ratio
157        expr: |
158          sum(rate(http_requests_total{status!~"5.."}[28d]))
159          /
160          sum(rate(http_requests_total[28d]))
161 
162      # Latency SLI (requests < 500ms)
163      - record: sli:http_latency:ratio
164        expr: |
165          sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d]))
166          /
167          sum(rate(http_request_duration_seconds_count[28d]))
168 
169  - name: slo_rules
170    interval: 5m
171    rules:
172      # SLO compliance (1 = meeting SLO, 0 = violating)
173      - record: slo:http_availability:compliance
174        expr: sli:http_availability:ratio >= bool 0.999
175 
176      - record: slo:http_latency:compliance
177        expr: sli:http_latency:ratio >= bool 0.99
178 
179      # Error budget remaining (percentage)
180      - record: slo:http_availability:error_budget_remaining
181        expr: |
182          (sli:http_availability:ratio - 0.999) / (1 - 0.999) * 100
183 
184      # Error budget burn rate
185      - record: slo:http_availability:burn_rate_5m
186        expr: |
187          (1 - (
188            sum(rate(http_requests_total{status!~"5.."}[5m]))
189            /
190            sum(rate(http_requests_total[5m]))
191          )) / (1 - 0.999)
192```
193 
194### SLO Alerting Rules
195 
196```yaml
197groups:
198  - name: slo_alerts
199    interval: 1m
200    rules:
201      # Fast burn: 14.4x rate, 1 hour window
202      # Consumes 2% error budget in 1 hour
203      - alert: SLOErrorBudgetBurnFast
204        expr: |
205          slo:http_availability:burn_rate_1h > 14.4
206          and
207          slo:http_availability:burn_rate_5m > 14.4
208        for: 2m
209        labels:
210          severity: critical
211        annotations:
212          summary: "Fast error budget burn detected"
213          description: "Error budget burning at {{ $value }}x rate"
214 
215      # Slow burn: 6x rate, 6 hour window
216      # Consumes 5% error budget in 6 hours
217      - alert: SLOErrorBudgetBurnSlow
218        expr: |
219          slo:http_availability:burn_rate_6h > 6
220          and
221          slo:http_availability:burn_rate_30m > 6
222        for: 15m
223        labels:
224          severity: warning
225        annotations:
226          summary: "Slow error budget burn detected"
227          description: "Error budget burning at {{ $value }}x rate"
228 
229      # Error budget exhausted
230      - alert: SLOErrorBudgetExhausted
231        expr: slo:http_availability:error_budget_remaining < 0
232        for: 5m
233        labels:
234          severity: critical
235        annotations:
236          summary: "SLO error budget exhausted"
237          description: "Error budget remaining: {{ $value }}%"
238```
239 
240## SLO Dashboard
241 
242**Grafana Dashboard Structure:**
243 
244```
245┌────────────────────────────────────┐
246│ SLO Compliance (Current)           │
247│ ✓ 99.95% (Target: 99.9%)          │
248├────────────────────────────────────┤
249│ Error Budget Remaining: 65%        │
250│ ████████░░ 65%                     │
251├────────────────────────────────────┤
252│ SLI Trend (28 days)                │
253│ [Time series graph]                │
254├────────────────────────────────────┤
255│ Burn Rate Analysis                 │
256│ [Burn rate by time window]         │
257└────────────────────────────────────┘
258```
259 
260**Example Queries:**
261 
262```promql
263# Current SLO compliance
264sli:http_availability:ratio * 100
265 
266# Error budget remaining
267slo:http_availability:error_budget_remaining
268 
269# Days until error budget exhausted (at current burn rate)
270(slo:http_availability:error_budget_remaining / 100)
271*
27228
273/
274(1 - sli:http_availability:ratio) * (1 - 0.999)
275```
276 
277## Multi-Window Burn Rate Alerts
278 
279```yaml
280# Combination of short and long windows reduces false positives
281rules:
282  - alert: SLOBurnRateHigh
283    expr: |
284      (
285        slo:http_availability:burn_rate_1h > 14.4
286        and
287        slo:http_availability:burn_rate_5m > 14.4
288      )
289      or
290      (
291        slo:http_availability:burn_rate_6h > 6
292        and
293        slo:http_availability:burn_rate_30m > 6
294      )
295    labels:
296      severity: critical
297```
298 
299## SLO Review Process
300 
301### Weekly Review
302- Current SLO compliance
303- Error budget status
304- Trend analysis
305- Incident impact
306 
307### Monthly Review
308- SLO achievement
309- Error budget usage
310- Incident postmortems
311- SLO adjustments
312 
313### Quarterly Review
314- SLO relevance
315- Target adjustments
316- Process improvements
317- Tooling enhancements
318 
319## Best Practices
320 
3211. **Start with user-facing services**
3222. **Use multiple SLIs** (availability, latency, etc.)
3233. **Set achievable SLOs** (don't aim for 100%)
3244. **Implement multi-window alerts** to reduce noise
3255. **Track error budget** consistently
3266. **Review SLOs regularly**
3277. **Document SLO decisions**
3288. **Align with business goals**
3299. **Automate SLO reporting**
33010. **Use SLOs for prioritization**
331 
332## Reference Files
333 
334- `assets/slo-template.md` - SLO definition template
335- `references/slo-definitions.md` - SLO definition patterns
336- `references/error-budget.md` - Error budget calculations
337 
338## Related Skills
339 
340- `prometheus-configuration` - For metric collection
341- `grafana-dashboards` - For SLO visualization
342

Full transparency — inspect the skill content before installing.