STACKQUADRANT

Evaluation & Testing

Frameworks for evaluating, benchmarking, and testing AI systems

29 repos

rhesis-ai/rhesis

5.4

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

31123Python

PetroIvaniuk/llms-tools

4.7

A list of LLMs Tools & Projects

30640

athina-ai/athina-evals

4.1

Python SDK for running evaluations on LLM generated responses

29921Python

ai-dashboad/flutter-skill

5.1

AI-powered E2E testing for 10 platforms. 253 MCP tools. Zero config. Works with Claude, Cursor, Windsurf, Copilot. Test Flutter, React Native, iOS, Android, Web, Electron, Tauri, KMP, .NET MAUI — all from natural language.

19023Dart

PramodDutta/qaskills

4.0

QA Skills Directory QA Skills is a curated directory of testing-specific skills for AI coding agents (Claude Code, Cursor, Copilot, etc.).

1024TypeScript