This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge, multi-dimensional evaluation, agent testing, or quality gates for agent pipelines.
Add this skill
npx mdskills install muratcankoylan/evaluationComprehensive evaluation framework guide with multi-dimensional rubrics and practical implementation patterns
No forks yet. Be the first to fork and customize this skill.
Visual fork tree and fork list coming soon.