How do I install Prometheus Configuration?

Install Prometheus Configuration with a single command: npx mdskills install sickn33/prometheus-configuration. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support Prometheus Configuration?

Prometheus Configuration works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

Prometheus Configuration

Name: Prometheus Configuration: AI Agent Skill
Rating: 8 (1 reviews)
Author: sickn33

Verified

Monitoring & DebuggingIntermediate

Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.

by @sickn330Updated 2/20/2026

Add this skill

npx mdskills install sickn33/prometheus-configuration

Fork & Edit

Skill Advisor8.0

Comprehensive Prometheus guide with detailed configs, rules, and multiple deployment patterns

+Provides complete configuration examples for diverse deployment scenarios
+Includes validation commands and troubleshooting steps
+Documents recording and alert rules with clear metric patterns
-References assets/scripts that may not exist in the actual skill package

SKILL.md

Edit in Browser

1---
2name: prometheus-configuration
3description: Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
4---
5 
6# Prometheus Configuration
7 
8Complete guide to Prometheus setup, metric collection, scrape configuration, and recording rules.
9 
10## Do not use this skill when
11 
12- The task is unrelated to prometheus configuration
13- You need a different domain or tool outside this scope
14 
15## Instructions
16 
17- Clarify goals, constraints, and required inputs.
18- Apply relevant best practices and validate outcomes.
19- Provide actionable steps and verification.
20- If detailed examples are required, open `resources/implementation-playbook.md`.
21 
22## Purpose
23 
24Configure Prometheus for comprehensive metric collection, alerting, and monitoring of infrastructure and applications.
25 
26## Use this skill when
27 
28- Set up Prometheus monitoring
29- Configure metric scraping
30- Create recording rules
31- Design alert rules
32- Implement service discovery
33 
34## Prometheus Architecture
35 
36```
37┌──────────────┐
38│ Applications │ ← Instrumented with client libraries
39└──────┬───────┘
40       │ /metrics endpoint
41       ↓
42┌──────────────┐
43│  Prometheus  │ ← Scrapes metrics periodically
44│    Server    │
45└──────┬───────┘
46       │
47       ├─→ AlertManager (alerts)
48       ├─→ Grafana (visualization)
49       └─→ Long-term storage (Thanos/Cortex)
50```
51 
52## Installation
53 
54### Kubernetes with Helm
55 
56```bash
57helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
58helm repo update
59 
60helm install prometheus prometheus-community/kube-prometheus-stack \
61  --namespace monitoring \
62  --create-namespace \
63  --set prometheus.prometheusSpec.retention=30d \
64  --set prometheus.prometheusSpec.storageVolumeSize=50Gi
65```
66 
67### Docker Compose
68 
69```yaml
70version: '3.8'
71services:
72  prometheus:
73    image: prom/prometheus:latest
74    ports:
75      - "9090:9090"
76    volumes:
77      - ./prometheus.yml:/etc/prometheus/prometheus.yml
78      - prometheus-data:/prometheus
79    command:
80      - '--config.file=/etc/prometheus/prometheus.yml'
81      - '--storage.tsdb.path=/prometheus'
82      - '--storage.tsdb.retention.time=30d'
83 
84volumes:
85  prometheus-data:
86```
87 
88## Configuration File
89 
90**prometheus.yml:**
91```yaml
92global:
93  scrape_interval: 15s
94  evaluation_interval: 15s
95  external_labels:
96    cluster: 'production'
97    region: 'us-west-2'
98 
99# Alertmanager configuration
100alerting:
101  alertmanagers:
102    - static_configs:
103        - targets:
104          - alertmanager:9093
105 
106# Load rules files
107rule_files:
108  - /etc/prometheus/rules/*.yml
109 
110# Scrape configurations
111scrape_configs:
112  # Prometheus itself
113  - job_name: 'prometheus'
114    static_configs:
115      - targets: ['localhost:9090']
116 
117  # Node exporters
118  - job_name: 'node-exporter'
119    static_configs:
120      - targets:
121        - 'node1:9100'
122        - 'node2:9100'
123        - 'node3:9100'
124    relabel_configs:
125      - source_labels: [__address__]
126        target_label: instance
127        regex: '([^:]+)(:[0-9]+)?'
128        replacement: '${1}'
129 
130  # Kubernetes pods with annotations
131  - job_name: 'kubernetes-pods'
132    kubernetes_sd_configs:
133      - role: pod
134    relabel_configs:
135      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
136        action: keep
137        regex: true
138      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
139        action: replace
140        target_label: __metrics_path__
141        regex: (.+)
142      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
143        action: replace
144        regex: ([^:]+)(?::\d+)?;(\d+)
145        replacement: $1:$2
146        target_label: __address__
147      - source_labels: [__meta_kubernetes_namespace]
148        action: replace
149        target_label: namespace
150      - source_labels: [__meta_kubernetes_pod_name]
151        action: replace
152        target_label: pod
153 
154  # Application metrics
155  - job_name: 'my-app'
156    static_configs:
157      - targets:
158        - 'app1.example.com:9090'
159        - 'app2.example.com:9090'
160    metrics_path: '/metrics'
161    scheme: 'https'
162    tls_config:
163      ca_file: /etc/prometheus/ca.crt
164      cert_file: /etc/prometheus/client.crt
165      key_file: /etc/prometheus/client.key
166```
167 
168**Reference:** See `assets/prometheus.yml.template`
169 
170## Scrape Configurations
171 
172### Static Targets
173 
174```yaml
175scrape_configs:
176  - job_name: 'static-targets'
177    static_configs:
178      - targets: ['host1:9100', 'host2:9100']
179        labels:
180          env: 'production'
181          region: 'us-west-2'
182```
183 
184### File-based Service Discovery
185 
186```yaml
187scrape_configs:
188  - job_name: 'file-sd'
189    file_sd_configs:
190      - files:
191        - /etc/prometheus/targets/*.json
192        - /etc/prometheus/targets/*.yml
193        refresh_interval: 5m
194```
195 
196**targets/production.json:**
197```json
198[
199  {
200    "targets": ["app1:9090", "app2:9090"],
201    "labels": {
202      "env": "production",
203      "service": "api"
204    }
205  }
206]
207```
208 
209### Kubernetes Service Discovery
210 
211```yaml
212scrape_configs:
213  - job_name: 'kubernetes-services'
214    kubernetes_sd_configs:
215      - role: service
216    relabel_configs:
217      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
218        action: keep
219        regex: true
220      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
221        action: replace
222        target_label: __scheme__
223        regex: (https?)
224      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
225        action: replace
226        target_label: __metrics_path__
227        regex: (.+)
228```
229 
230**Reference:** See `references/scrape-configs.md`
231 
232## Recording Rules
233 
234Create pre-computed metrics for frequently queried expressions:
235 
236```yaml
237# /etc/prometheus/rules/recording_rules.yml
238groups:
239  - name: api_metrics
240    interval: 15s
241    rules:
242      # HTTP request rate per service
243      - record: job:http_requests:rate5m
244        expr: sum by (job) (rate(http_requests_total[5m]))
245 
246      # Error rate percentage
247      - record: job:http_requests_errors:rate5m
248        expr: sum by (job) (rate(http_requests_total{status=~"5.."}[5m]))
249 
250      - record: job:http_requests_error_rate:percentage
251        expr: |
252          (job:http_requests_errors:rate5m / job:http_requests:rate5m) * 100
253 
254      # P95 latency
255      - record: job:http_request_duration:p95
256        expr: |
257          histogram_quantile(0.95,
258            sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))
259          )
260 
261  - name: resource_metrics
262    interval: 30s
263    rules:
264      # CPU utilization percentage
265      - record: instance:node_cpu:utilization
266        expr: |
267          100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
268 
269      # Memory utilization percentage
270      - record: instance:node_memory:utilization
271        expr: |
272          100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100)
273 
274      # Disk usage percentage
275      - record: instance:node_disk:utilization
276        expr: |
277          100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)
278```
279 
280**Reference:** See `references/recording-rules.md`
281 
282## Alert Rules
283 
284```yaml
285# /etc/prometheus/rules/alert_rules.yml
286groups:
287  - name: availability
288    interval: 30s
289    rules:
290      - alert: ServiceDown
291        expr: up{job="my-app"} == 0
292        for: 1m
293        labels:
294          severity: critical
295        annotations:
296          summary: "Service {{ $labels.instance }} is down"
297          description: "{{ $labels.job }} has been down for more than 1 minute"
298 
299      - alert: HighErrorRate
300        expr: job:http_requests_error_rate:percentage > 5
301        for: 5m
302        labels:
303          severity: warning
304        annotations:
305          summary: "High error rate for {{ $labels.job }}"
306          description: "Error rate is {{ $value }}% (threshold: 5%)"
307 
308      - alert: HighLatency
309        expr: job:http_request_duration:p95 > 1
310        for: 5m
311        labels:
312          severity: warning
313        annotations:
314          summary: "High latency for {{ $labels.job }}"
315          description: "P95 latency is {{ $value }}s (threshold: 1s)"
316 
317  - name: resources
318    interval: 1m
319    rules:
320      - alert: HighCPUUsage
321        expr: instance:node_cpu:utilization > 80
322        for: 5m
323        labels:
324          severity: warning
325        annotations:
326          summary: "High CPU usage on {{ $labels.instance }}"
327          description: "CPU usage is {{ $value }}%"
328 
329      - alert: HighMemoryUsage
330        expr: instance:node_memory:utilization > 85
331        for: 5m
332        labels:
333          severity: warning
334        annotations:
335          summary: "High memory usage on {{ $labels.instance }}"
336          description: "Memory usage is {{ $value }}%"
337 
338      - alert: DiskSpaceLow
339        expr: instance:node_disk:utilization > 90
340        for: 5m
341        labels:
342          severity: critical
343        annotations:
344          summary: "Low disk space on {{ $labels.instance }}"
345          description: "Disk usage is {{ $value }}%"
346```
347 
348## Validation
349 
350```bash
351# Validate configuration
352promtool check config prometheus.yml
353 
354# Validate rules
355promtool check rules /etc/prometheus/rules/*.yml
356 
357# Test query
358promtool query instant http://localhost:9090 'up'
359```
360 
361**Reference:** See `scripts/validate-prometheus.sh`
362 
363## Best Practices
364 
3651. **Use consistent naming** for metrics (prefix_name_unit)
3662. **Set appropriate scrape intervals** (15-60s typical)
3673. **Use recording rules** for expensive queries
3684. **Implement high availability** (multiple Prometheus instances)
3695. **Configure retention** based on storage capacity
3706. **Use relabeling** for metric cleanup
3717. **Monitor Prometheus itself**
3728. **Implement federation** for large deployments
3739. **Use Thanos/Cortex** for long-term storage
37410. **Document custom metrics**
375 
376## Troubleshooting
377 
378**Check scrape targets:**
379```bash
380curl http://localhost:9090/api/v1/targets
381```
382 
383**Check configuration:**
384```bash
385curl http://localhost:9090/api/v1/status/config
386```
387 
388**Test query:**
389```bash
390curl 'http://localhost:9090/api/v1/query?query=up'
391```
392 
393## Reference Files
394 
395- `assets/prometheus.yml.template` - Complete configuration template
396- `references/scrape-configs.md` - Scrape configuration patterns
397- `references/recording-rules.md` - Recording rule examples
398- `scripts/validate-prometheus.sh` - Validation script
399 
400## Related Skills
401 
402- `grafana-dashboards` - For visualization
403- `slo-implementation` - For SLO monitoring
404- `distributed-tracing` - For request tracing
405

Full transparency — inspect the skill content before installing.