Benchmarking AI Agent Skills

Browse AI agent skills tagged "Benchmarking". Find and install skills, MCP servers, and plugins for your AI coding assistant.

2 listings

Autoresearch ML

Skill

Edit code → commit → run benchmark → measure metric → keep improvement or revert → repeat forever. Works for any optimization target: LLM training loss, test speed, bundle size, build time, Lighthouse scores, and more. Inspired by Karpathy's autoresearch, pi-autoresearch, and litesearch. This plugin provides two skills that work together. Autoresearch is the core engine (works for any metric), and

8.74422by proyecto26

Testing With Pitlane

MCP Server

A feedback loop for people building AI skills and MCP servers. You're building a skill, an MCP server, or a custom prompt strategy that's supposed to make an AI coding assistant better at a specific job. But how do you know it actually works? How do you know your latest commit made things better and not worse? Pitlane gives you the answer. Define the tasks your skill should help with, set up a bas

8.7440by pitlane-ai