Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Idea is to see if the current AI setup if price effective in a chatbot/llm application. We want a way to benchark to see if an llm app is correct AND also see at what price.

  2. tool that will run simulation of tests against a chatbot

  3. it simulates multiple types of tests:

    1. happy path

    2. confusing questions

    3. inappropriate questions

    4. abort scenarios

  4. it measures chatbot accuracy: did it give correct responses or not?

  5. it also measures number of words and tokens

  6. between test runs you get a price/performance reportlreport

  7. https://llm-performance-benchmarking.lovable.app/