Benchmarking AI Agent Skills
Browse AI agent skills tagged "Benchmarking". Find and install skills, MCP servers, and plugins for your AI coding assistant.
2 listings
Autoresearch ML
PluginEdit code → commit → run benchmark → measure metric → keep improvement or revert → repeat forever. Works for any optimization target: LLM training loss, test speed, bundle size, build time, Lighthouse scores, and more. Inspired by Karpathy's autoresearch, pi-autoresearch, and litesearch. This plugin provides two skills that work together. Autoresearch is the core engine (works for any metric), and
Testing With Pitlane
MCP ServerA feedback loop for people building AI skills and MCP servers. You're building a skill, an MCP server, or a custom prompt strategy that's supposed to make an AI coding assistant better at a specific job. But how do you know it actually works? How do you know your latest commit made things better and not worse? Pitlane gives you the answer. Define the tasks your skill should help with, set up a bas