...
Idea is to see if the current AI setup if price effective in a chatbot/llm application. We want a way to benchark to see if an llm app is correct AND also see at what price.
tool that will run simulation of tests against a chatbot
it simulates multiple types of tests:
happy path
confusing questions
inappropriate questions
abort scenarios
it measures chatbot accuracy: did it give correct responses or not?
it also measures number of words and tokens
between test runs you get a price/performance reportlreport