How do I install Ab Test Setup?

Install Ab Test Setup with a single command: npx mdskills install coreyhaines31/ab-test-setup. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support Ab Test Setup?

Ab Test Setup works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code, Factory. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

Ab Test Setup

Name: Ab Test Setup: AI Agent Skill
Brand: coreyhaines31
Availability: InStock
Rating: 8 (1 reviews)
Author: coreyhaines31

Verified

Writing & DocsIntermediate

When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," or "hypothesis." For tracking implementation, see analytics-tracking.

by @coreyhaines31 9,036Updated 2/20/2026

Add this skill

npx mdskills install coreyhaines31/ab-test-setup

Fork & Edit

Are you @coreyhaines31? Sign in with GitHub to claim this listing.

Skill Advisor8.0

Comprehensive A/B testing workflow with strong rigor gates and clear decision frameworks

+Enforces hard gates at critical decision points to prevent invalid tests
+Provides explicit refusal conditions and safety checks for statistical validity
+Includes practical decision tables and documentation requirements
-Declares shell execution and network permissions that aren't referenced in the skill

SKILL.md

Edit in Browser

1---
2name: ab-test-setup
3description: When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," or "hypothesis." For tracking implementation, see analytics-tracking.
4metadata:
5  version: 1.0.0
6---
7 
8# A/B Test Setup
9 
10You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.
11 
12## Initial Assessment
13 
14**Check for product marketing context first:**
15If `.claude/product-marketing-context.md` exists, read it before asking questions. Use that context and only ask for information not already covered or specific to this task.
16 
17Before designing a test, understand:
18 
191. **Test Context** - What are you trying to improve? What change are you considering?
202. **Current State** - Baseline conversion rate? Current traffic volume?
213. **Constraints** - Technical complexity? Timeline? Tools available?
22 
23---
24 
25## Core Principles
26 
27### 1. Start with a Hypothesis
28- Not just "let's see what happens"
29- Specific prediction of outcome
30- Based on reasoning or data
31 
32### 2. Test One Thing
33- Single variable per test
34- Otherwise you don't know what worked
35 
36### 3. Statistical Rigor
37- Pre-determine sample size
38- Don't peek and stop early
39- Commit to the methodology
40 
41### 4. Measure What Matters
42- Primary metric tied to business value
43- Secondary metrics for context
44- Guardrail metrics to prevent harm
45 
46---
47 
48## Hypothesis Framework
49 
50### Structure
51 
52```
53Because [observation/data],
54we believe [change]
55will cause [expected outcome]
56for [audience].
57We'll know this is true when [metrics].
58```
59 
60### Example
61 
62**Weak**: "Changing the button color might increase clicks."
63 
64**Strong**: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."
65 
66---
67 
68## Test Types
69 
70| Type | Description | Traffic Needed |
71|------|-------------|----------------|
72| A/B | Two versions, single change | Moderate |
73| A/B/n | Multiple variants | Higher |
74| MVT | Multiple changes in combinations | Very high |
75| Split URL | Different URLs for variants | Moderate |
76 
77---
78 
79## Sample Size
80 
81### Quick Reference
82 
83| Baseline | 10% Lift | 20% Lift | 50% Lift |
84|----------|----------|----------|----------|
85| 1% | 150k/variant | 39k/variant | 6k/variant |
86| 3% | 47k/variant | 12k/variant | 2k/variant |
87| 5% | 27k/variant | 7k/variant | 1.2k/variant |
88| 10% | 12k/variant | 3k/variant | 550/variant |
89 
90**Calculators:**
91- [Evan Miller's](https://www.evanmiller.org/ab-testing/sample-size.html)
92- [Optimizely's](https://www.optimizely.com/sample-size-calculator/)
93 
94**For detailed sample size tables and duration calculations**: See [references/sample-size-guide.md](references/sample-size-guide.md)
95 
96---
97 
98## Metrics Selection
99 
100### Primary Metric
101- Single metric that matters most
102- Directly tied to hypothesis
103- What you'll use to call the test
104 
105### Secondary Metrics
106- Support primary metric interpretation
107- Explain why/how the change worked
108 
109### Guardrail Metrics
110- Things that shouldn't get worse
111- Stop test if significantly negative
112 
113### Example: Pricing Page Test
114- **Primary**: Plan selection rate
115- **Secondary**: Time on page, plan distribution
116- **Guardrail**: Support tickets, refund rate
117 
118---
119 
120## Designing Variants
121 
122### What to Vary
123 
124| Category | Examples |
125|----------|----------|
126| Headlines/Copy | Message angle, value prop, specificity, tone |
127| Visual Design | Layout, color, images, hierarchy |
128| CTA | Button copy, size, placement, number |
129| Content | Information included, order, amount, social proof |
130 
131### Best Practices
132- Single, meaningful change
133- Bold enough to make a difference
134- True to the hypothesis
135 
136---
137 
138## Traffic Allocation
139 
140| Approach | Split | When to Use |
141|----------|-------|-------------|
142| Standard | 50/50 | Default for A/B |
143| Conservative | 90/10, 80/20 | Limit risk of bad variant |
144| Ramping | Start small, increase | Technical risk mitigation |
145 
146**Considerations:**
147- Consistency: Users see same variant on return
148- Balanced exposure across time of day/week
149 
150---
151 
152## Implementation
153 
154### Client-Side
155- JavaScript modifies page after load
156- Quick to implement, can cause flicker
157- Tools: PostHog, Optimizely, VWO
158 
159### Server-Side
160- Variant determined before render
161- No flicker, requires dev work
162- Tools: PostHog, LaunchDarkly, Split
163 
164---
165 
166## Running the Test
167 
168### Pre-Launch Checklist
169- [ ] Hypothesis documented
170- [ ] Primary metric defined
171- [ ] Sample size calculated
172- [ ] Variants implemented correctly
173- [ ] Tracking verified
174- [ ] QA completed on all variants
175 
176### During the Test
177 
178**DO:**
179- Monitor for technical issues
180- Check segment quality
181- Document external factors
182 
183**DON'T:**
184- Peek at results and stop early
185- Make changes to variants
186- Add traffic from new sources
187 
188### The Peeking Problem
189Looking at results before reaching sample size and stopping early leads to false positives and wrong decisions. Pre-commit to sample size and trust the process.
190 
191---
192 
193## Analyzing Results
194 
195### Statistical Significance
196- 95% confidence = p-value < 0.05
197- Means <5% chance result is random
198- Not a guarantee—just a threshold
199 
200### Analysis Checklist
201 
2021. **Reach sample size?** If not, result is preliminary
2032. **Statistically significant?** Check confidence intervals
2043. **Effect size meaningful?** Compare to MDE, project impact
2054. **Secondary metrics consistent?** Support the primary?
2065. **Guardrail concerns?** Anything get worse?
2076. **Segment differences?** Mobile vs. desktop? New vs. returning?
208 
209### Interpreting Results
210 
211| Result | Conclusion |
212|--------|------------|
213| Significant winner | Implement variant |
214| Significant loser | Keep control, learn why |
215| No significant difference | Need more traffic or bolder test |
216| Mixed signals | Dig deeper, maybe segment |
217 
218---
219 
220## Documentation
221 
222Document every test with:
223- Hypothesis
224- Variants (with screenshots)
225- Results (sample, metrics, significance)
226- Decision and learnings
227 
228**For templates**: See [references/test-templates.md](references/test-templates.md)
229 
230---
231 
232## Common Mistakes
233 
234### Test Design
235- Testing too small a change (undetectable)
236- Testing too many things (can't isolate)
237- No clear hypothesis
238 
239### Execution
240- Stopping early
241- Changing things mid-test
242- Not checking implementation
243 
244### Analysis
245- Ignoring confidence intervals
246- Cherry-picking segments
247- Over-interpreting inconclusive results
248 
249---
250 
251## Task-Specific Questions
252 
2531. What's your current conversion rate?
2542. How much traffic does this page get?
2553. What change are you considering and why?
2564. What's the smallest improvement worth detecting?
2575. What tools do you have for testing?
2586. Have you tested this area before?
259 
260---
261 
262## Related Skills
263 
264- **page-cro**: For generating test ideas based on CRO principles
265- **analytics-tracking**: For setting up test measurement
266- **copywriting**: For creating variant copy
267

Full transparency — inspect the skill content before installing.

New to skill.md files?

See what a SKILL.md file is, how to install one, and how it differs from AGENTS.md or cursorrules.

Read the guide →