Elasticsearch
For more information on Elasticsearch and RAG broadly, see the Elasticsearch article in RAG at GitLab.
Retrieve GitLab Documentation
A proof of concept was done to switch the documentation embeddings from being stored in the embedding database to being stored on Elasticsearch.
Synchronizing embeddings with data source
The same procedure used by PostgreSQL can be followed to keep the embeddings up to date in Elasticsearch.
Retrieval
To get the nearest neighbours, the following query can be executed an index containing the embeddings:
{
"knn": {
"field": vector_field_containing_embeddings,
"query_vector": embedding_for_question,
"k": limit,
"num_candidates": number_of_candidates_to_compare
}
}
Requirements to get to self-managed
- Productionalize the PoC MR
- Get more self-managed instances to install Elasticsearch by shipping GitLab with Elasticsearch. Elastic gave their approval to ship with the free license. The work required for making it easy for customers to host Elasticsearch is more than 2 milestones.