STACKQUADRANT

Evaluation & Testing

Frameworks for evaluating, benchmarking, and testing AI systems

36 repos

PacificAI/langtest

5.9

Deliver safe & effective language models

55949Python

Pacific-AI-Corp/langtest

5.9

Deliver safe & effective language models

55949Python

relari-ai/continuous-eval

4.7

Data-Driven Evaluation for LLM-Powered Applications

51638Python

ifixai-ai/iFixAi

5.9

The open-source diagnostic for AI misalignment. 32 tests across fabrication, manipulation, deception, unpredictability, and opacity. Provider-agnostic. Runs against OpenAI, Anthropic, Bedrock, Azure, Gemini, and more. Letter grade in under 5 minutes, content-addressed manifest for bit-identical replay. Built by iMe.

46287Python

JonathanChavezTamales/llm-leaderboard

4.7

A comprehensive set of LLM benchmark scores and provider prices. (deprecated, read more in README)

36140JavaScript

rhesis-ai/rhesis

5.5

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

35824Python

palico-ai/palico-ai

4.5

Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework

34228TypeScript

PetroIvaniuk/llms-tools

4.9

A list of LLMs Tools & Projects

31744

faiscadev/fakecloud

5.4

Free, open-source AWS emulator. LocalStack alternative: 26 services, 1,924 operations, 100% conformance. No account, no auth token, no paid tier.

31119Rust

athina-ai/athina-evals

4.0

Python SDK for running evaluations on LLM generated responses

30022Python

ai-dashboad/flutter-skill

5.3

AI-powered E2E testing for 10 platforms. 253 MCP tools. Zero config. Works with Claude, Cursor, Windsurf, Copilot. Test Flutter, React Native, iOS, Android, Web, Electron, Tauri, KMP, .NET MAUI — all from natural language.

27836Dart

PramodDutta/qaskills

4.2

QA Skills Directory QA Skills is a curated directory of testing-specific skills for AI coding agents (Claude Code, Cursor, Copilot, etc.).

13311TypeScript