Training Document Search

You can now upload custom training data to improve the accuracy of Document Search queries. If you need help with managing Documents in Knowledge Base, please refer to this user guide.

Why Train your Model #

Alli’s Document Search works by extracting information from documents using a pre-trained AI model. While Alli is very accurate “out of the box”, to develop a high-performing AI model, it is crucial to train the model with a sufficient amount of relevant data.

This is done by adding training data, test data, and retraining the model so that the test and training data take affect. This ensures that the model is capable of making accurate, relevant answers for your specific needs. If the model’s performance is not satisfactory, retraining with additional data or reverting to a previous model is necessary. In this document we will cover how to:

Manage your model versions
Add training data
Add test data to view accuracy metrics
Retrain the model to make the training data utilized

Adding more training entries will lead to even better results!

How to Manage Your Model #

To manage training data, model versions, and test data, open “Knowledge Base” -> “Documents” -> Settings Gear Logo

Manage Model Versioning #

Here you can rename the model version, give a description if needed, and view metrics on answer accuracy and document hit accuracy. These metrics are populated by adding in Test Data. We will discuss how to add test data after training data has been added.

Manage Training Data #

Here you can manually enter training data or modify/delete existing entries. You can also upload training data in bulk by clicking Upload training data. Keep in mind that question-document pairs are unique, so you cannot have two entries with the same questions and document titles. The more diverse and relevant the training data is, the more effective it will be at fine-tuning the model.

Type in the question for training data
Provide the document where the proper answer resides
Allow the AI to search that document for the possible answer
Choose the answer
Add another piece of training data after submitting this one
Submit or cancel adding training data

Here is an example of training data populated properly

As we can see, the AI model will provide multiple answers that may be relevant to the question being added. It is optional to include the proper answer.

In the uploaded file, please label your first column “Question”, your second column “Document Title”, and your third column “Answer”. “Question” and “Document Title” are required fields. A sample file with the correct format can also be downloaded from the Upload training data window. After uploading your file, Alli will report any failed rows. (All properly formatted, non-duplicate rows will be added regardless of failures on different rows.)

Failure report on uploading malformed entries.

You can also add training data directly from your Candidates. Please note that Candidates with only question content cannot be added to training data.

How to add training entries to candidates

Manage Test Data #

Here you can manually enter test data or modify/delete existing entries. You can also upload test data in bulk by clicking Upload test data. Keep in mind that question-document pairs are unique, so you cannot have two entries with the same questions and document titles. Test data is how we can benchmark the model’s performance after retraining with training data.

Type in the question for test data
Provide the document where the proper answer resides
Allow the AI to search that document for the possible answer.
Choose the answer. An answer must be chosen to populate document hit accuracy
Add another piece of test data after submitting this one
Submit or cancel adding test data

Here is an example of test data populated properly. Unlike training data, an answer must be chosen to populate all accuracy metrics.

A sample file with the correct format can also be downloaded from the Upload test data window. After uploading your file, Alli will report any failed rows. (All properly formatted, non-duplicate rows will be added regardless of failures on different rows.)

How to Retrain the Model #

Once your training data is ready, you must retrain your model to see the effects. Return to the Documents page and click “RETRAIN Documents“

Feel free to navigate away or close the window during training. Once training is complete the “in progress” bar will disappear. Congratulations! You’ve just successfully trained your model.

If training the model is deemed to be too slow, we can see what the allocated resources for document search are within the training status. If more resources are needed, contact your account manager.

Once the model training has completed, below is an example of three different models trained, however only one has proper training and test data populated. We can manage which model is deployed and easily see accuracy metrics. Once we are happy with the results, we can deploy the model desired.

Training Settings #

You can configure your document training by changing the new training settings. They can be found by clicking the settings icon on the Documents page and navigating to the “Training Settings” tab.

Here is a brief description of what each setting does. (These descriptions are also available through the tooltips.)

Consider Document Title: When turned on, Alli considers the document’s title (file name) when running documents search.
Document Title Weight: ‘Consider Document Title’ must be on to apply. This setting decides the weight of the document title for a knowledge base search.
- Even though an answer may not have any keywords or similar from the question, a heavier document title weight will allow keywords in the document name to have a higher impact on the final score of an answer.
- Lighter document title weight will rely heavier on the contents of the documents themselves without taking into account the document name.

# of Answer Candidates per Document: This setting decides the maximum number of results extracted from one document. Default is 0 which means there is no limit per document.
Remove Similar Results: Hide similar document search results if there are any. You can remove all similar results, ones with same hashtags, or ones extracted from the same document. If there are similar documents (i.e. same document but published in different years) it is best practice to select “Remove if extracted from the same document” to show as many results as possible.

Getting Started

Conversation

Skills

Miscellaneous

Knowledge Base

Settings

LLM App Market