Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

2023 AI Prototype

  • Requires user to describe what scenario it needs to create

  • It works on understanding the flow, extracting context from the html and generating test steps that are compatible with gotestpro test steps.

  • We spent more than 1 mo on this step generation

  • Success Outcomes:

    • generated test steps

    • worked in gotestpro

  • Failure Outcomes:

    • inconsistent results, steps don’t always work and scenarios don’t work

  • Next steps:

    • chatgpt3.5 has changed, prompt engineering prompts have changed quite a bit

    • narrow scope to Salesforce Commerce as per conversation w Asif Lala

    • combine w Zerostep prototype

ZeroStep Prototype

  1. We have prototype of taking a text request where user says “click on product info” and the library interacts with the browser and clicks on the product and proceeds to the product detail steps.

  • Success Outcomes:

    • Generated test steps, it executes the steps in the browser

    • Can import steps from a CSV file, so if user has manual test cases, then they can input test cases into the sequence

    • tested with sample shoe store ecommerce (air birds current customer)

  • Failure Outcomes:

    • Inconsistent results, especially with menu items or items where labels aren’t clear.

    • Same result that works one time, may not work successfully the next time.

    • Prompt engineering the text command makes a huge difference.

LLM Cost Performance Benchmarking

  1. Idea is to see if the current AI setup if price effective in a chatbot/llm application. We want a way to benchark to see if an llm app is correct AND also see at what price.

  2. tool that will run simulation of tests against a chatbot

  3. it simulates multiple types of tests:

    1. happy path

    2. confusing questions

    3. inappropriate questions

    4. abort scenarios

  4. it measures chatbot accuracy: did it give correct responses or not?

  5. it also measures number of words and tokens

  6. between test runs you get a price/performance report

  7. https://llm-performance-benchmarking.lovable.app/

  • No labels