deepeval
confident-ai/deepeval
8.0
Evaluation & Testing
★ 15.9k◇ 1.5kPythonApache-2.01d ago
Ragas
explodinggradients/ragas
7.5
Evaluation & Testing
★ 14.2k◇ 1.5kPythonApache-2.03mo ago
garak
NVIDIA/garak
7.5
Evaluation & Testing
★ 8.0k◇ 989PythonApache-2.0today
chinese-llm-benchmark
jeinlee1991/chinese-llm-benchmark
6.3
Evaluation & Testing
★ 6.1k◇ 247today
LLM-Engineers-Handbook
PacktPublishing/LLM-Engineers-Handbook
6.7
Evaluation & Testing
★ 5.1k◇ 1.2kPythonMIT1mo ago
lmms-eval
EvolvingLMMs-Lab/lmms-eval
7.5
Evaluation & Testing
★ 4.2k◇ 597PythonNOASSERTION1d ago
agenta
Agenta-AI/agenta
7.8
Evaluation & Testing
★ 4.2k◇ 531TypeScriptNOASSERTIONtoday
AI-Infra-Guard
Tencent/AI-Infra-Guard
7.4
Evaluation & Testing
★ 3.8k◇ 375PythonApache-2.0today
trulens
truera/trulens
7.3
Evaluation & Testing
★ 3.4k◇ 284PythonMITtoday
lmnr
lmnr-ai/lmnr
6.9
Evaluation & Testing
★ 3.0k◇ 203TypeScriptApache-2.0today
Observal
BlazeUp-AI/Observal
5.9
Evaluation & Testing
★ 1.9k◇ 327PythonNOASSERTIONtoday
aisheets
huggingface/aisheets
6.2
Evaluation & Testing
★ 1.6k◇ 141TypeScriptApache-2.08d ago
FuzzyAI
cyberark/FuzzyAI
5.5
Evaluation & Testing
★ 1.5k◇ 203Jupyter NotebookApache-2.03mo ago
prompty
microsoft/prompty
6.7
Evaluation & Testing
★ 1.2k◇ 116TypeScriptMIT1d ago
uqlm
cvs-health/uqlm
6.6
Evaluation & Testing
★ 1.2k◇ 125PythonApache-2.02d ago
judgeval
JudgmentLabs/judgeval
6.7
Evaluation & Testing
★ 1.0k◇ 92PythonApache-2.02d ago
WHartTest
MGdaasLab/WHartTest
6.2
Evaluation & Testing
★ 925◇ 123PythonMITtoday
passmark
bug0inc/passmark
5.7
Evaluation & Testing
★ 897◇ 163TypeScriptNOASSERTION5d ago
scenario
langwatch/scenario
5.9
Evaluation & Testing
★ 893◇ 65PythonMITtoday
FinSight-AI
juanjuandog/FinSight-AI
5.1
Evaluation & Testing
★ 842◇ 56JavaMIT8d ago
Awesome-LLM-Eval
onejune2018/Awesome-LLM-Eval
4.8
Evaluation & Testing
★ 639◇ 70MIT6mo ago
Awesome-LLM-in-Social-Science
ValueByte-AI/Awesome-LLM-in-Social-Science
4.8
Evaluation & Testing
★ 624◇ 47MIT3mo ago
aimock
CopilotKit/aimock
6.3
Evaluation & Testing
★ 614◇ 40TypeScriptMITtoday
agent-skills-eval
darkrishabh/agent-skills-eval
5.3
Evaluation & Testing
★ 560◇ 28TypeScriptMIT13d ago
langtest
PacificAI/langtest
5.9
Evaluation & Testing
★ 559◇ 49PythonApache-2.01mo ago
langtest
Pacific-AI-Corp/langtest
5.9
Evaluation & Testing
★ 559◇ 49PythonApache-2.01mo ago
continuous-eval
relari-ai/continuous-eval
4.7
Evaluation & Testing
★ 516◇ 38PythonApache-2.01y ago
iFixAi
ifixai-ai/iFixAi
5.9
Evaluation & Testing
★ 462◇ 87PythonApache-2.0today
llm-leaderboard
JonathanChavezTamales/llm-leaderboard
4.7
Evaluation & Testing
★ 361◇ 40JavaScriptNOASSERTION7mo ago
rhesis
rhesis-ai/rhesis
5.5
Evaluation & Testing
★ 358◇ 24PythonNOASSERTIONtoday
palico-ai
palico-ai/palico-ai
4.5
Evaluation & Testing
★ 342◇ 28TypeScriptMIT1y ago
llms-tools
PetroIvaniuk/llms-tools
4.9
Evaluation & Testing
★ 317◇ 44Apache-2.01d ago
fakecloud
faiscadev/fakecloud
5.4
Evaluation & Testing
★ 311◇ 19RustAGPL-3.0today
athina-evals
athina-ai/athina-evals
4.0
Evaluation & Testing
★ 300◇ 22Python12mo ago
flutter-skill
ai-dashboad/flutter-skill
5.3
Evaluation & Testing
★ 278◇ 36DartMIT12d ago
qaskills
PramodDutta/qaskills
4.2
Evaluation & Testing
★ 133◇ 11TypeScript8d ago