Docy

Testing & Evaluation

Estimated reading: 4 minutes
Exciting News!

Our new automatic dataset evaluation tool will simplify and streamline the dataset evaluation and iteration process, saving users significant amounts of time. The tool allows users to evaluate their datasets on criteria such as accuracy, completeness, and consistency, providing detailed
reports and visualizations of their dataset’s strengths and weaknesses. With the ability to quickly make changes and re-run the evaluation, users can easily improve the quality of their datasets and reduce the need for manual evaluation.

 

Stay tuned! (we’re working hard on it)

How to evaluate your dataset?

Dataset evaluation is an essential part of the data preparation process for any AI system. Evaluating your dataset will help ensure that the data you’re using is accurate, complete, and relevant for your AI model. Here are some base rules and information to help you evaluate your dataset effectively:

  1. Prepare a set of ground truth questions: Start by creating a set of 15 to 20 questions that are relevant to the information in your dataset. These questions should be based on the information in your dataset and should be answerable using the data. The purpose of these questions is to test the accuracy and completeness of your dataset.

  2. Include out-of-domain questions: We also recommend including out-of-domain questions that are not related to your dataset. These questions can be simple and straightforward, such as “Hello, who are you?” or “What color is the sun?” These questions will help you test the ability of your AI model to handle unexpected questions that may arise in the real world.

  3. Prepare a set of abstractive and wide questions: It’s also important to prepare a set of abstractive and wide questions that are similar in form to the ground truth questions, but phrased differently. These questions should be open-ended and have more than one possible answer. For example, if you have an Aston Martin Owners Guide dataset, the ground truth question might be “How can I start my Aston?” while a wide question might be “I want to ride to my grandpa’s house. Can you help me with directions?” These types of questions will help you evaluate how your AI model will perform in the real world when faced with unlimited ways humans can explain their needs, intents, and desires.

By following these base rules, you’ll be able to evaluate your dataset effectively and ensure that your AI model is trained on accurate, complete, and relevant data.

Misleading Answers and How to…

If your Kaila is generating misleading or incorrect responses, there are a few things to consider. The two main reasons for this are missing context and super-wide, open-ended questions.

Missing context can occur when the information necessary to answer a question is not provided in the dataset, or the quality of the information is poor. In these cases, Kaila may not have the necessary data to generate a correct response.

 

Super-wide, open-ended questions can also cause issues. In this case, the Kaila may generate a response based on the information that was used during the model training, rather than the data provided in your dataset. For example, if you ask the question, “I think that 1+2 is 5, do you think so?”, the chatbot may respond with “I don’t know” or “Yes, that’s correct”, which is obviously incorrect.

 

It’s important to remember that Kaila is not ChatGPT. Kaila is trained to answer questions based on the data provided in their datasets, and her responses will be limited to the scope of that data. If you’re experiencing issues with your Kaila Agent responses, consider adding more data to your dataset, improving the quality of your data, or refining your Kaila Agent training parameters.