How do I install LLM Council?

Install LLM Council with a single command: npx mdskills install elhamid/llm-council. This downloads the skill files into your project and your AI agent picks them up automatically.

What platforms support LLM Council?

LLM Council works with Claude Code, Claude Desktop, Cursor, Vscode Copilot, Windsurf, Continue Dev, Codex, Gemini Cli, Amp, Roo Code, Goose, Opencode, Trae, Qodo, Command Code, Chatgpt, Grok. Skills use the open SKILL.md format which is compatible with any AI coding agent that reads markdown instructions.

← Back to skills

LLM Council

Name: LLM Council: AI Agent Skill
Rating: 6 (1 reviews)
Author: elhamid

AI & Machine LearningIntermediate

This is a fork of karpathy/llm-council. New in this fork since the previously published state: - Conversation history is complete again: storage loads older runs from backend/data/conversations.json and supports the legacy data/ layout (including data/conversations/ per-conversation JSON files when present), so older runs show up in the UI. - Sidebar no longer truncates early: the frontend now req

by @elhamid4 downloads0Updated 2/24/2026

Add this skill

npx mdskills install elhamid/llm-council

Fork & Edit

Skill Advisor6.0

Multi-LLM collaborative system with decision traceability and anonymized peer review

+Implements sophisticated three-stage consensus protocol with role separation
+Provides decision auditability through preserved rankings and model mappings
+Reduces model-brand bias via anonymized Stage 2 evaluations
-Documentation describes a tool/app, not agent-executable skill instructions
-Declared shell execution permission appears unused in described workflow

SKILL.md

Edit in Browser

1# LLM Council
2 
3![llmcouncil](header.jpg)
4 
5## Fork notes (elhamid)
6 
7This is a fork of `karpathy/llm-council`.
8 
9### Update — 2025-12-19 (Stability + auditability pass)
10 
11New in this fork since the previously published state:
12 
13- **Conversation history is complete again:** storage loads older runs from `backend/data/conversations.json` and supports the legacy `data/` layout (including `data/conversations/` per-conversation JSON files when present), so older runs show up in the UI.
14- **Sidebar no longer truncates early:** the frontend now requests `/api/conversations?limit=500` so the conversation list reliably shows full history.
15- **Stage 2 judge duplication fixed:** Stage 2 now dedupes judge models so evaluation does not silently double-count the same judge.
16- **Titles persist correctly:** after Stage 3 completes, a title is derived and saved to the conversation record, so it sticks across refreshes and appears in history.
17- **Tooling + reproducibility:** added Stage 2 smoke/quality scripts and supporting evaluation artifacts to make regressions repeatable.
18- **Roles updated + clarified:** earlier fork notes described roles as **Analyst / Researcher / Critic / Provocateur**; the implemented role set is now **Builder / Reviewer / Synthesizer / Contrarian** (with a provider-default mapping in `backend/roles.py`). This keeps the “multi-perspective” intent, but with clearer, more actionable role behavior.
19 
20### Value added — 2025-12-14 (decision-quality focused)
21- **Decision-auditable runs:** every council response is saved with a compact decision trace (Stage 1 answers, Stage 2 rankings, and the Stage 2→model mapping), so you can inspect *why* the Chairman concluded what it did — not just the final text.
22- **Reduced “model-brand” bias in judging:** Stage 2 rankings operate on anonymized responses (Response A/B/C/…), and the label→model mapping is preserved for post-hoc review. This keeps peer review focused on content quality rather than model identity.
23- **Role-separated council behavior:** explicit role specs (Analyst / Researcher / Critic / Provocateur + Chairman) make the council behave more like a real review board: one pushes rigor, one hunts missing facts, one stress-tests, one challenges assumptions — then the Chairman synthesizes.
24- **Repeatable scoring across runs:** aggregated ranks (average rank + count) are persisted so you can compare council behavior over time and across prompts, instead of treating each run as a one-off chat.
25 
26### Implementation notes (supporting the above)
27- Real SSE endpoint (`text/event-stream`) for `/api/conversations/{id}/message/stream` with incremental `stage*_start/complete` events.
28- Persist `meta/metadata` (`label_to_model`, `aggregate_rankings`, `model_roles`) so Stage2 renders correctly and the run is reviewable later.
29- Frontend Stage2 reads `msg.meta || msg.metadata` so fork/upstream payload shapes both render.
30 
31----
32 
33The idea of this repo is that instead of asking a question to your favorite LLM provider (e.g. OpenAI GPT 5.1, Google Gemini 3.0 Pro, Anthropic Claude Sonnet 4.5, xAI Grok 4, eg.c), you can group them into your "LLM Council". This repo is a simple, local web app that essentially looks like ChatGPT except it uses OpenRouter to send your query to multiple LLMs, it then asks them to review and rank each other's work, and finally a Chairman LLM produces the final response.
34 
35In a bit more detail, here is what happens when you submit a query:
36 
371. **Stage 1: First opinions**. The user query is given to all LLMs individually, and the responses are collected. The individual responses are shown in a "tab view", so that the user can inspect them all one by one.
382. **Stage 2: Review**. Each individual LLM is given the responses of the other LLMs. Under the hood, the LLM identities are anonymized so that the LLM can't play favorites when judging their outputs. The LLM is asked to rank them in accuracy and insight.
393. **Stage 3: Final response**. The designated Chairman of the LLM Council takes all of the model's responses and compiles them into a single final answer that is presented to the user.
40 
41## Vibe Code Alert
42 
43This project was 99% vibe coded as a fun Saturday hack because I wanted to explore and evaluate a number of LLMs side by side in the process of [reading books together with LLMs](https://x.com/karpathy/status/1990577951671509438). It's nice and useful to see multiple responses side by side, and also the cross-opinions of all LLMs on each other's outputs. I'm not going to support it in any way, it's provided here as is for other people's inspiration and I don't intend to improve it. Code is ephemeral now and libraries are over, ask your LLM to change it in whatever way you like.
44 
45## Setup
46 
47### 1. Install Dependencies
48 
49The project uses [uv](https://docs.astral.sh/uv/) for project management.
50 
51**Backend:**
52```bash
53uv sync
54```
55 
56**Frontend:**
57```bash
58cd frontend
59npm install
60cd ..
61```
62 
63### 2. Configure API Key
64 
65Create a `.env` file in the project root:
66 
67```bash
68OPENROUTER_API_KEY=sk-or-v1-...
69```
70 
71Get your API key at [openrouter.ai](https://openrouter.ai/). Make sure to purchase the credits you need, or sign up for automatic top up.
72 
73### 3. Configure Models (Optional)
74 
75Edit `backend/config.py` to customize the council:
76 
77```python
78COUNCIL_MODELS = [
79    "openai/gpt-5.1",
80    "google/gemini-3-pro-preview",
81    "anthropic/claude-sonnet-4.5",
82    "x-ai/grok-4",
83]
84 
85CHAIRMAN_MODEL = "google/gemini-3-pro-preview"
86```
87 
88## Running the Application
89 
90**Option 1: Use the start script**
91```bash
92./start.sh
93```
94 
95**Option 2: Run manually**
96 
97Terminal 1 (Backend):
98```bash
99uv run python -m backend.main
100```
101 
102Terminal 2 (Frontend):
103```bash
104cd frontend
105npm run dev
106```
107 
108Then open http://localhost:5173 in your browser.
109 
110## Tech Stack
111 
112- **Backend:** FastAPI (Python 3.10+), async httpx, OpenRouter API
113- **Frontend:** React + Vite, react-markdown for rendering
114- **Storage:** JSON files in `data/conversations/`
115- **Package Management:** uv for Python, npm for JavaScript
116

Full transparency — inspect the skill content before installing.