2023 AI Prototype

Requires user to describe what scenario it needs to create
It works on understanding the flow, extracting context from the html and generating test steps that are compatible with gotestpro test steps.
We spent more than 1 mo on this step generation
Success Outcomes:
- generated test steps
- worked in gotestpro
Failure Outcomes:
- inconsistent results, steps don’t always work and scenarios don’t work
Next steps:
- chatgpt3.5 has changed, prompt engineering prompts have changed quite a bit
- narrow scope to Salesforce Commerce as per conversation w Asif Lala
- combine w Zerostep prototype

ZeroStep Prototype

We have prototype of taking a text request where user says “click on product info” and the library interacts with the browser and clicks on the product and proceeds to the product detail steps.

Success Outcomes:
- Generated test steps, it executes the steps in the browser
- Can import steps from a CSV file, so if user has manual test cases, then they can input test cases into the sequence
- tested with sample shoe store ecommerce (air birds current customer)
Failure Outcomes:
- Inconsistent results, especially with menu items or items where labels aren’t clear.
- Same result that works one time, may not work successfully the next time.
- Prompt engineering the text command makes a huge difference.

Idea is to see if the current AI setup if price effective in a chatbot/llm application. We want a way to benchark to see if an llm app is correct AND also see at what price.
tool that will run simulation of tests against a chatbot
it simulates multiple types of tests:
1. happy path
2. confusing questions
3. inappropriate questions
4. abort scenarios
it measures chatbot accuracy: did it give correct responses or not?
it also measures number of words and tokens
between test runs you get a price/performance report
https://llm-performance-benchmarking.lovable.app/