SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.
Add this skill
npx mdskills install sickn33/computer-vision-expertWell-organized reference guide for modern CV techniques, but lacks actionable agent instructions
1---2name: computer-vision-expert3description: SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.4---56# Computer Vision Expert (SOTA 2026)78**Role**: Advanced Vision Systems Architect & Spatial Intelligence Expert910## Purpose11To provide expert guidance on designing, implementing, and optimizing state-of-the-art computer vision pipelines. From real-time object detection with YOLO26 to foundation model-based segmentation with SAM 3 and visual reasoning with VLMs.1213## When to Use14- Designing high-performance real-time detection systems (YOLO26).15- Implementing zero-shot or text-guided segmentation tasks (SAM 3).16- Building spatial awareness, depth estimation, or 3D reconstruction systems.17- Optimizing vision models for edge device deployment (ONNX, TensorRT, NPU).18- Needing to bridge classical geometry (calibration) with modern deep learning.1920## Capabilities2122### 1. Unified Real-Time Detection (YOLO26)23- **NMS-Free Architecture**: Mastery of end-to-end inference without Non-Maximum Suppression (reducing latency and complexity).24- **Edge Deployment**: Optimization for low-power hardware using Distribution Focal Loss (DFL) removal and MuSGD optimizer.25- **Improved Small-Object Recognition**: Expertise in using ProgLoss and STAL assignment for high precision in IoT and industrial settings.2627### 2. Promptable Segmentation (SAM 3)28- **Text-to-Mask**: Ability to segment objects using natural language descriptions (e.g., "the blue container on the right").29- **SAM 3D**: Reconstructing objects, scenes, and human bodies in 3D from single/multi-view images.30- **Unified Logic**: One model for detection, segmentation, and tracking with 2x accuracy over SAM 2.3132### 3. Vision Language Models (VLMs)33- **Visual Grounding**: Leveraging Florence-2, PaliGemma 2, or Qwen2-VL for semantic scene understanding.34- **Visual Question Answering (VQA)**: Extracting structured data from visual inputs through conversational reasoning.3536### 4. Geometry & Reconstruction37- **Depth Anything V2**: State-of-the-art monocular depth estimation for spatial awareness.38- **Sub-pixel Calibration**: Chessboard/Charuco pipelines for high-precision stereo/multi-camera rigs.39- **Visual SLAM**: Real-time localization and mapping for autonomous systems.4041## Patterns4243### 1. Text-Guided Vision Pipelines44- Use SAM 3's text-to-mask capability to isolate specific parts during inspection without needing custom detectors for every variation.45- Combine YOLO26 for fast "candidate proposal" and SAM 3 for "precise mask refinement".4647### 2. Deployment-First Design48- Leverage YOLO26's simplified ONNX/TensorRT exports (NMS-free).49- Use MuSGD for significantly faster training convergence on custom datasets.5051### 3. Progressive 3D Scene Reconstruction52- Integrate monocular depth maps with geometric homographies to build accurate 2.5D/3D representations of scenes.5354## Anti-Patterns5556- **Manual NMS Post-processing**: Stick to NMS-free architectures (YOLO26/v10+) for lower overhead.57- **Click-Only Segmentation**: Forgetting that SAM 3 eliminates the need for manual point prompts in many scenarios via text grounding.58- **Legacy DFL Exports**: Using outdated export pipelines that don't take advantage of YOLO26's simplified module structure.5960## Sharp Edges (2026)6162| Issue | Severity | Solution |63|-------|----------|----------|64| SAM 3 VRAM Usage | Medium | Use quantized/distilled versions for local GPU inference. |65| Text Ambiguity | Low | Use descriptive prompts ("the 5mm bolt" instead of just "bolt"). |66| Motion Blur | Medium | Optimize shutter speed or use SAM 3's temporal tracking consistency. |67| Hardware Compatibility | Low | YOLO26 simplified architecture is highly compatible with NPU/TPUs. |6869## Related Skills70`ai-engineer`, `robotics-expert`, `research-engineer`, `embedded-systems`71
Full transparency — inspect the skill content before installing.