Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring. Use when designing AI applications, implementing RAG, building agents, or setting up LLM observability.
Add this skill
npx mdskills install sickn33/llm-app-patternsComprehensive reference guide covering RAG, agents, and LLMOps with code examples and architecture patterns
1---2name: llm-app-patterns3description: "Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring. Use when designing AI applications, implementing RAG, building agents, or setting up LLM observability."4---56# π€ LLM Application Patterns78> Production-ready patterns for building LLM applications, inspired by [Dify](https://github.com/langgenius/dify) and industry best practices.910## When to Use This Skill1112Use this skill when:1314- Designing LLM-powered applications15- Implementing RAG (Retrieval-Augmented Generation)16- Building AI agents with tools17- Setting up LLMOps monitoring18- Choosing between agent architectures1920---2122## 1. RAG Pipeline Architecture2324### Overview2526RAG (Retrieval-Augmented Generation) grounds LLM responses in your data.2728```29βββββββββββββββ βββββββββββββββ βββββββββββββββ30β Ingest ββββββΆβ Retrieve ββββββΆβ Generate β31β Documents β β Context β β Response β32βββββββββββββββ βββββββββββββββ βββββββββββββββ33 β β β34 βΌ βΌ βΌ35 βββββββββββ βββββββββββββ βββββββββββββ36 β Chunkingβ β Vector β β LLM β37 βEmbeddingβ β Search β β + Contextβ38 βββββββββββ βββββββββββββ βββββββββββββ39```4041### 1.1 Document Ingestion4243```python44# Chunking strategies45class ChunkingStrategy:46 # Fixed-size chunks (simple but may break context)47 FIXED_SIZE = "fixed_size" # e.g., 512 tokens4849 # Semantic chunking (preserves meaning)50 SEMANTIC = "semantic" # Split on paragraphs/sections5152 # Recursive splitting (tries multiple separators)53 RECURSIVE = "recursive" # ["\n\n", "\n", " ", ""]5455 # Document-aware (respects structure)56 DOCUMENT_AWARE = "document_aware" # Headers, lists, etc.5758# Recommended settings59CHUNK_CONFIG = {60 "chunk_size": 512, # tokens61 "chunk_overlap": 50, # token overlap between chunks62 "separators": ["\n\n", "\n", ". ", " "],63}64```6566### 1.2 Embedding & Storage6768```python69# Vector database selection70VECTOR_DB_OPTIONS = {71 "pinecone": {72 "use_case": "Production, managed service",73 "scale": "Billions of vectors",74 "features": ["Hybrid search", "Metadata filtering"]75 },76 "weaviate": {77 "use_case": "Self-hosted, multi-modal",78 "scale": "Millions of vectors",79 "features": ["GraphQL API", "Modules"]80 },81 "chromadb": {82 "use_case": "Development, prototyping",83 "scale": "Thousands of vectors",84 "features": ["Simple API", "In-memory option"]85 },86 "pgvector": {87 "use_case": "Existing Postgres infrastructure",88 "scale": "Millions of vectors",89 "features": ["SQL integration", "ACID compliance"]90 }91}9293# Embedding model selection94EMBEDDING_MODELS = {95 "openai/text-embedding-3-small": {96 "dimensions": 1536,97 "cost": "$0.02/1M tokens",98 "quality": "Good for most use cases"99 },100 "openai/text-embedding-3-large": {101 "dimensions": 3072,102 "cost": "$0.13/1M tokens",103 "quality": "Best for complex queries"104 },105 "local/bge-large": {106 "dimensions": 1024,107 "cost": "Free (compute only)",108 "quality": "Comparable to OpenAI small"109 }110}111```112113### 1.3 Retrieval Strategies114115```python116# Basic semantic search117def semantic_search(query: str, top_k: int = 5):118 query_embedding = embed(query)119 results = vector_db.similarity_search(120 query_embedding,121 top_k=top_k122 )123 return results124125# Hybrid search (semantic + keyword)126def hybrid_search(query: str, top_k: int = 5, alpha: float = 0.5):127 """128 alpha=1.0: Pure semantic129 alpha=0.0: Pure keyword (BM25)130 alpha=0.5: Balanced131 """132 semantic_results = vector_db.similarity_search(query)133 keyword_results = bm25_search(query)134135 # Reciprocal Rank Fusion136 return rrf_merge(semantic_results, keyword_results, alpha)137138# Multi-query retrieval139def multi_query_retrieval(query: str):140 """Generate multiple query variations for better recall"""141 queries = llm.generate_query_variations(query, n=3)142 all_results = []143 for q in queries:144 all_results.extend(semantic_search(q))145 return deduplicate(all_results)146147# Contextual compression148def compressed_retrieval(query: str):149 """Retrieve then compress to relevant parts only"""150 docs = semantic_search(query, top_k=10)151 compressed = llm.extract_relevant_parts(docs, query)152 return compressed153```154155### 1.4 Generation with Context156157```python158RAG_PROMPT_TEMPLATE = """159Answer the user's question based ONLY on the following context.160If the context doesn't contain enough information, say "I don't have enough information to answer that."161162Context:163{context}164165Question: {question}166167Answer:"""168169def generate_with_rag(question: str):170 # Retrieve171 context_docs = hybrid_search(question, top_k=5)172 context = "\n\n".join([doc.content for doc in context_docs])173174 # Generate175 prompt = RAG_PROMPT_TEMPLATE.format(176 context=context,177 question=question178 )179180 response = llm.generate(prompt)181182 # Return with citations183 return {184 "answer": response,185 "sources": [doc.metadata for doc in context_docs]186 }187```188189---190191## 2. Agent Architectures192193### 2.1 ReAct Pattern (Reasoning + Acting)194195```196Thought: I need to search for information about X197Action: search("X")198Observation: [search results]199Thought: Based on the results, I should...200Action: calculate(...)201Observation: [calculation result]202Thought: I now have enough information203Action: final_answer("The answer is...")204```205206```python207REACT_PROMPT = """208You are an AI assistant that can use tools to answer questions.209210Available tools:211{tools_description}212213Use this format:214Thought: [your reasoning about what to do next]215Action: [tool_name(arguments)]216Observation: [tool result - this will be filled in]217... (repeat Thought/Action/Observation as needed)218Thought: I have enough information to answer219Final Answer: [your final response]220221Question: {question}222"""223224class ReActAgent:225 def __init__(self, tools: list, llm):226 self.tools = {t.name: t for t in tools}227 self.llm = llm228 self.max_iterations = 10229230 def run(self, question: str) -> str:231 prompt = REACT_PROMPT.format(232 tools_description=self._format_tools(),233 question=question234 )235236 for _ in range(self.max_iterations):237 response = self.llm.generate(prompt)238239 if "Final Answer:" in response:240 return self._extract_final_answer(response)241242 action = self._parse_action(response)243 observation = self._execute_tool(action)244 prompt += f"\nObservation: {observation}\n"245246 return "Max iterations reached"247```248249### 2.2 Function Calling Pattern250251```python252# Define tools as functions with schemas253TOOLS = [254 {255 "name": "search_web",256 "description": "Search the web for current information",257 "parameters": {258 "type": "object",259 "properties": {260 "query": {261 "type": "string",262 "description": "Search query"263 }264 },265 "required": ["query"]266 }267 },268 {269 "name": "calculate",270 "description": "Perform mathematical calculations",271 "parameters": {272 "type": "object",273 "properties": {274 "expression": {275 "type": "string",276 "description": "Math expression to evaluate"277 }278 },279 "required": ["expression"]280 }281 }282]283284class FunctionCallingAgent:285 def run(self, question: str) -> str:286 messages = [{"role": "user", "content": question}]287288 while True:289 response = self.llm.chat(290 messages=messages,291 tools=TOOLS,292 tool_choice="auto"293 )294295 if response.tool_calls:296 for tool_call in response.tool_calls:297 result = self._execute_tool(298 tool_call.name,299 tool_call.arguments300 )301 messages.append({302 "role": "tool",303 "tool_call_id": tool_call.id,304 "content": str(result)305 })306 else:307 return response.content308```309310### 2.3 Plan-and-Execute Pattern311312```python313class PlanAndExecuteAgent:314 """315 1. Create a plan (list of steps)316 2. Execute each step317 3. Replan if needed318 """319320 def run(self, task: str) -> str:321 # Planning phase322 plan = self.planner.create_plan(task)323 # Returns: ["Step 1: ...", "Step 2: ...", ...]324325 results = []326 for step in plan:327 # Execute each step328 result = self.executor.execute(step, context=results)329 results.append(result)330331 # Check if replan needed332 if self._needs_replan(task, results):333 new_plan = self.planner.replan(334 task,335 completed=results,336 remaining=plan[len(results):]337 )338 plan = new_plan339340 # Synthesize final answer341 return self.synthesizer.summarize(task, results)342```343344### 2.4 Multi-Agent Collaboration345346```python347class AgentTeam:348 """349 Specialized agents collaborating on complex tasks350 """351352 def __init__(self):353 self.agents = {354 "researcher": ResearchAgent(),355 "analyst": AnalystAgent(),356 "writer": WriterAgent(),357 "critic": CriticAgent()358 }359 self.coordinator = CoordinatorAgent()360361 def solve(self, task: str) -> str:362 # Coordinator assigns subtasks363 assignments = self.coordinator.decompose(task)364365 results = {}366 for assignment in assignments:367 agent = self.agents[assignment.agent]368 result = agent.execute(369 assignment.subtask,370 context=results371 )372 results[assignment.id] = result373374 # Critic reviews375 critique = self.agents["critic"].review(results)376377 if critique.needs_revision:378 # Iterate with feedback379 return self.solve_with_feedback(task, results, critique)380381 return self.coordinator.synthesize(results)382```383384---385386## 3. Prompt IDE Patterns387388### 3.1 Prompt Templates with Variables389390```python391class PromptTemplate:392 def __init__(self, template: str, variables: list[str]):393 self.template = template394 self.variables = variables395396 def format(self, **kwargs) -> str:397 # Validate all variables provided398 missing = set(self.variables) - set(kwargs.keys())399 if missing:400 raise ValueError(f"Missing variables: {missing}")401402 return self.template.format(**kwargs)403404 def with_examples(self, examples: list[dict]) -> str:405 """Add few-shot examples"""406 example_text = "\n\n".join([407 f"Input: {ex['input']}\nOutput: {ex['output']}"408 for ex in examples409 ])410 return f"{example_text}\n\n{self.template}"411412# Usage413summarizer = PromptTemplate(414 template="Summarize the following text in {style} style:\n\n{text}",415 variables=["style", "text"]416)417418prompt = summarizer.format(419 style="professional",420 text="Long article content..."421)422```423424### 3.2 Prompt Versioning & A/B Testing425426```python427class PromptRegistry:428 def __init__(self, db):429 self.db = db430431 def register(self, name: str, template: str, version: str):432 """Store prompt with version"""433 self.db.save({434 "name": name,435 "template": template,436 "version": version,437 "created_at": datetime.now(),438 "metrics": {}439 })440441 def get(self, name: str, version: str = "latest") -> str:442 """Retrieve specific version"""443 return self.db.get(name, version)444445 def ab_test(self, name: str, user_id: str) -> str:446 """Return variant based on user bucket"""447 variants = self.db.get_all_versions(name)448 bucket = hash(user_id) % len(variants)449 return variants[bucket]450451 def record_outcome(self, prompt_id: str, outcome: dict):452 """Track prompt performance"""453 self.db.update_metrics(prompt_id, outcome)454```455456### 3.3 Prompt Chaining457458```python459class PromptChain:460 """461 Chain prompts together, passing output as input to next462 """463464 def __init__(self, steps: list[dict]):465 self.steps = steps466467 def run(self, initial_input: str) -> dict:468 context = {"input": initial_input}469 results = []470471 for step in self.steps:472 prompt = step["prompt"].format(**context)473 output = llm.generate(prompt)474475 # Parse output if needed476 if step.get("parser"):477 output = step["parser"](output)478479 context[step["output_key"]] = output480 results.append({481 "step": step["name"],482 "output": output483 })484485 return {486 "final_output": context[self.steps[-1]["output_key"]],487 "intermediate_results": results488 }489490# Example: Research β Analyze β Summarize491chain = PromptChain([492 {493 "name": "research",494 "prompt": "Research the topic: {input}",495 "output_key": "research"496 },497 {498 "name": "analyze",499 "prompt": "Analyze these findings:\n{research}",500 "output_key": "analysis"501 },502 {503 "name": "summarize",504 "prompt": "Summarize this analysis in 3 bullet points:\n{analysis}",505 "output_key": "summary"506 }507])508```509510---511512## 4. LLMOps & Observability513514### 4.1 Metrics to Track515516```python517LLM_METRICS = {518 # Performance519 "latency_p50": "50th percentile response time",520 "latency_p99": "99th percentile response time",521 "tokens_per_second": "Generation speed",522523 # Quality524 "user_satisfaction": "Thumbs up/down ratio",525 "task_completion": "% tasks completed successfully",526 "hallucination_rate": "% responses with factual errors",527528 # Cost529 "cost_per_request": "Average $ per API call",530 "tokens_per_request": "Average tokens used",531 "cache_hit_rate": "% requests served from cache",532533 # Reliability534 "error_rate": "% failed requests",535 "timeout_rate": "% requests that timed out",536 "retry_rate": "% requests needing retry"537}538```539540### 4.2 Logging & Tracing541542```python543import logging544from opentelemetry import trace545546tracer = trace.get_tracer(__name__)547548class LLMLogger:549 def log_request(self, request_id: str, data: dict):550 """Log LLM request for debugging and analysis"""551 log_entry = {552 "request_id": request_id,553 "timestamp": datetime.now().isoformat(),554 "model": data["model"],555 "prompt": data["prompt"][:500], # Truncate for storage556 "prompt_tokens": data["prompt_tokens"],557 "temperature": data.get("temperature", 1.0),558 "user_id": data.get("user_id"),559 }560 logging.info(f"LLM_REQUEST: {json.dumps(log_entry)}")561562 def log_response(self, request_id: str, data: dict):563 """Log LLM response"""564 log_entry = {565 "request_id": request_id,566 "completion_tokens": data["completion_tokens"],567 "total_tokens": data["total_tokens"],568 "latency_ms": data["latency_ms"],569 "finish_reason": data["finish_reason"],570 "cost_usd": self._calculate_cost(data),571 }572 logging.info(f"LLM_RESPONSE: {json.dumps(log_entry)}")573574# Distributed tracing575@tracer.start_as_current_span("llm_call")576def call_llm(prompt: str) -> str:577 span = trace.get_current_span()578 span.set_attribute("prompt.length", len(prompt))579580 response = llm.generate(prompt)581582 span.set_attribute("response.length", len(response))583 span.set_attribute("tokens.total", response.usage.total_tokens)584585 return response.content586```587588### 4.3 Evaluation Framework589590```python591class LLMEvaluator:592 """593 Evaluate LLM outputs for quality594 """595596 def evaluate_response(self,597 question: str,598 response: str,599 ground_truth: str = None) -> dict:600 scores = {}601602 # Relevance: Does it answer the question?603 scores["relevance"] = self._score_relevance(question, response)604605 # Coherence: Is it well-structured?606 scores["coherence"] = self._score_coherence(response)607608 # Groundedness: Is it based on provided context?609 scores["groundedness"] = self._score_groundedness(response)610611 # Accuracy: Does it match ground truth?612 if ground_truth:613 scores["accuracy"] = self._score_accuracy(response, ground_truth)614615 # Harmfulness: Is it safe?616 scores["safety"] = self._score_safety(response)617618 return scores619620 def run_benchmark(self, test_cases: list[dict]) -> dict:621 """Run evaluation on test set"""622 results = []623 for case in test_cases:624 response = llm.generate(case["prompt"])625 scores = self.evaluate_response(626 question=case["prompt"],627 response=response,628 ground_truth=case.get("expected")629 )630 results.append(scores)631632 return self._aggregate_scores(results)633```634635---636637## 5. Production Patterns638639### 5.1 Caching Strategy640641```python642import hashlib643from functools import lru_cache644645class LLMCache:646 def __init__(self, redis_client, ttl_seconds=3600):647 self.redis = redis_client648 self.ttl = ttl_seconds649650 def _cache_key(self, prompt: str, model: str, **kwargs) -> str:651 """Generate deterministic cache key"""652 content = f"{model}:{prompt}:{json.dumps(kwargs, sort_keys=True)}"653 return hashlib.sha256(content.encode()).hexdigest()654655 def get_or_generate(self, prompt: str, model: str, **kwargs) -> str:656 key = self._cache_key(prompt, model, **kwargs)657658 # Check cache659 cached = self.redis.get(key)660 if cached:661 return cached.decode()662663 # Generate664 response = llm.generate(prompt, model=model, **kwargs)665666 # Cache (only cache deterministic outputs)667 if kwargs.get("temperature", 1.0) == 0:668 self.redis.setex(key, self.ttl, response)669670 return response671```672673### 5.2 Rate Limiting & Retry674675```python676import time677from tenacity import retry, wait_exponential, stop_after_attempt678679class RateLimiter:680 def __init__(self, requests_per_minute: int):681 self.rpm = requests_per_minute682 self.timestamps = []683684 def acquire(self):685 """Wait if rate limit would be exceeded"""686 now = time.time()687688 # Remove old timestamps689 self.timestamps = [t for t in self.timestamps if now - t < 60]690691 if len(self.timestamps) >= self.rpm:692 sleep_time = 60 - (now - self.timestamps[0])693 time.sleep(sleep_time)694695 self.timestamps.append(time.time())696697# Retry with exponential backoff698@retry(699 wait=wait_exponential(multiplier=1, min=4, max=60),700 stop=stop_after_attempt(5)701)702def call_llm_with_retry(prompt: str) -> str:703 try:704 return llm.generate(prompt)705 except RateLimitError:706 raise # Will trigger retry707 except APIError as e:708 if e.status_code >= 500:709 raise # Retry server errors710 raise # Don't retry client errors711```712713### 5.3 Fallback Strategy714715```python716class LLMWithFallback:717 def __init__(self, primary: str, fallbacks: list[str]):718 self.primary = primary719 self.fallbacks = fallbacks720721 def generate(self, prompt: str, **kwargs) -> str:722 models = [self.primary] + self.fallbacks723724 for model in models:725 try:726 return llm.generate(prompt, model=model, **kwargs)727 except (RateLimitError, APIError) as e:728 logging.warning(f"Model {model} failed: {e}")729 continue730731 raise AllModelsFailedError("All models exhausted")732733# Usage734llm_client = LLMWithFallback(735 primary="gpt-4-turbo",736 fallbacks=["gpt-3.5-turbo", "claude-3-sonnet"]737)738```739740---741742## Architecture Decision Matrix743744| Pattern | Use When | Complexity | Cost |745| :------------------- | :--------------- | :--------- | :-------- |746| **Simple RAG** | FAQ, docs search | Low | Low |747| **Hybrid RAG** | Mixed queries | Medium | Medium |748| **ReAct Agent** | Multi-step tasks | Medium | Medium |749| **Function Calling** | Structured tools | Low | Low |750| **Plan-Execute** | Complex tasks | High | High |751| **Multi-Agent** | Research tasks | Very High | Very High |752753---754755## Resources756757- [Dify Platform](https://github.com/langgenius/dify)758- [LangChain Docs](https://python.langchain.com/)759- [LlamaIndex](https://www.llamaindex.ai/)760- [Anthropic Cookbook](https://github.com/anthropics/anthropic-cookbook)761
Full transparency β inspect the skill content before installing.