Training Document Search

You can now upload custom training data to improve the accuracy of Document Search queries. If you need help with managing Documents in Knowledge Base, please refer to this user guide.

Why Train your Model

Alli’s Document Search works by extracting information from documents using a pre-trained AI model. While Alli is very accurate “out of the box”, to develop a high-performing AI model, it is crucial to train the model with a sufficient amount of relevant data.

This is done by adding training data, test data, and retraining the model so that the test and training data take affect. This ensures that the model is capable of making accurate, relevant answers for your specific needs. If the model’s performance is not satisfactory, retraining with additional data or reverting to a previous model is necessary. In this document we will cover how to:

  • Manage your model versions
  • Add training data
  • Add test data to view accuracy metrics
  • Retrain the model to make the training data utilized
Before training
After training

Adding more training entries will lead to even better results!

How to Manage Your Model

To manage training data, model versions, and test data, open “Knowledge Base” -> “Documents” -> Settings Gear Logo

How to find the Model Management screen

Manage Model Versioning

Here you can rename the model version, give a description if needed, and view metrics on answer accuracy and document hit accuracy. These metrics are populated by adding in Test Data. We will discuss how to add test data after training data has been added.

Manage Training Data

Here you can manually enter training data or modify/delete existing entries. You can also upload training data in bulk by clicking Upload training data. Keep in mind that question-document pairs are unique, so you cannot have two entries with the same questions and document titles. The more diverse and relevant the training data is, the more effective it will be at fine-tuning the model.

  1. Type in the question for training data
  2. Provide the document where the proper answer resides
  3. Allow the AI to search that document for the possible answer
  4. Choose the answer
  5. Add another piece of training data after submitting this one
  6. Submit or cancel adding training data

Here is an example of training data populated properly

As we can see, the AI model will provide multiple answers that may be relevant to the question being added. It is optional to include the proper answer.

In the uploaded file, please label your first column “Question”, your second column “Document Title”, and your third column “Answer”. “Question” and “Document Title” are required fields. A sample file with the correct format can also be downloaded from the Upload training data window. After uploading your file, Alli will report any failed rows. (All properly formatted, non-duplicate rows will be added regardless of failures on different rows.)

Failure report on uploading malformed entries.

You can also add training data directly from your Candidates. Please note that Candidates with only question content cannot be added to training data.

How to add training entries to candidates

Manage Test Data

Here you can manually enter test data or modify/delete existing entries. You can also upload test data in bulk by clicking Upload test data. Keep in mind that question-document pairs are unique, so you cannot have two entries with the same questions and document titles. Test data is how we can benchmark the model’s performance after retraining with training data.

  1. Type in the question for test data
  2. Provide the document where the proper answer resides
  3. Allow the AI to search that document for the possible answer.
  4. Choose the answer. An answer must be chosen to populate document hit accuracy
  5. Add another piece of test data after submitting this one
  6. Submit or cancel adding test data

Here is an example of test data populated properly. Unlike training data, an answer must be chosen to populate all accuracy metrics.

A sample file with the correct format can also be downloaded from the Upload test data window. After uploading your file, Alli will report any failed rows. (All properly formatted, non-duplicate rows will be added regardless of failures on different rows.)

How to Retrain the Model

Once your training data is ready, you must retrain your model to see the effects. Return to the Documents page and click “RETRAIN Documents

How to tell retraining is in-progress

Feel free to navigate away or close the window during training. Once training is complete the “in progress” bar will disappear. Congratulations! You’ve just successfully trained your model.

If training the model is deemed to be too slow, we can see what the allocated resources for document search are within the training status. If more resources are needed, contact your account manager.

Once the model training has completed, below is an example of three different models trained, however only one has proper training and test data populated. We can manage which model is deployed and easily see accuracy metrics. Once we are happy with the results, we can deploy the model desired.

Training Settings

You can configure your document training by changing the new training settings. They can be found by clicking the settings icon on the Documents page and navigating to the “Training Settings” tab.

How to find the Training Settings menu 

Here is a brief description of what each setting does. (These descriptions are also available through the tooltips.)

  • Consider Document Title: When turned on, Alli considers the document’s title (file name) when running documents search.
  • Document Title Weight: This setting decides the weight of the document title for the document search. ‘Consider Document Title’ must be on to apply this setting. Even though an answer may not have any keywords from the question, a fine tuned model will be able to identify a specific document as having the answer to that question and similar questions being asked. Heavier document title weight will alter the score of answers given in this manner.
  • # of Answer Candidates per Document: This setting decides the maximum number of results extracted from one document. Default is 0 which means there is no limit per document.
  • Remove Similar Results: Hide similar document search results if there are any. You can remove all similar results, ones with same hashtags, or ones extracted from the same document. If there are similar documents (i.e. same document but published in different years) it is best practice to select “Remove if extracted from the same document” to show as many results as possible.

You can also use customer feedback from search results and/or agent feedback from Query Training to further improve Document Search. For more information, please refer to the user guides below:

This concludes the user guide to End-to-End Document Search Optimization. For general information about Document Search, please refer to the user guides below: