Page Comparison

...

Idea is to see if the current AI setup if price effective in a chatbot/llm application. We want a way to benchark to see if an llm app is correct AND also see at what price.
tool that will run simulation of tests against a chatbot
it simulates multiple types of tests:
1. happy path
2. confusing questions
3. inappropriate questions
4. abort scenarios
it measures chatbot accuracy: did it give correct responses or not?
it also measures number of words and tokens
between test runs you get a price/performance reportlreport
https://llm-performance-benchmarking.lovable.app/

Versions Compared