Earlier this year, due to soaring interest and multiple product initiatives, we went on a quest to evaluate and deploy an internally hosted vector database.
We’re in the midst of the AI revolution, and many new AI-inspired product initiatives rely on vector embeddings. Embeddings are generated by AI models, and take the form of high dimensional vectors of features that represent the data in a manner that can be used to create clusters, find similarity, and perform semantic search.
In order to find similar entities by matching these embedding vectors, we have to be able to store and query them in a scalable and efficient way. Because the vectors are high dimensional, it is not feasible to perform exhaustive pairwise distance calculations on large datasets to determine similarity; therefore, these need to be calculated using Approximate Nearest Neighbour (ANN) search.
Vector databases implement one or more of these algorithms, and there are many candidates in the market to choose from. We see brand new products along with older products adding vector storage and ANN search to their existing capabilities.
After an initial investigation, the decision was taken to move ahead with Elasticsearch 8 (ES) as it was already entrenched in the Automattic stack. Additionally, performing hybrid search would be easily attainable since most, if not all, of the content we wanted to create embeddings for was already being indexed by existing pipelines on ES. However, as soon as we started testing different use cases, we ran into the vector size limitation on the larger OpenAI text-embedding-ada-002 embedding vectors, which have an output size of 1536 dimensions.
According to the ES documentation, you cannot have an indexed vector of size greater than 1024 dimensions. ES does allow you to create a dense vector field with a size of up to 2048 dimensions, but it cannot be indexed. 1024 is fine for most run-of-the-mill use cases, but not for the recommended OpenAI models and so it was actually a blocker for us.
The Elastiknn plugin that Tumblr is using in ES 7 also has a recommendation in the docs not to exceed ~1000. We weren’t sure if it would actually prevent indexing above that value, as ES 8 does, but it was apparent that the lookup speed was going to be negatively affected. So ES was out of the running as a candidate.
In the initial round of fact finding, feature comparison, and experimentation, the list of options has been whittled down. Milvus and Vald were excluded because of their dependency on Kubernetes for manageable deployments. This is because we are deploying the vector database as a shared service internally and we didn’t want to have to deploy a new Kubernetes production cluster simply to install a vector database.
Chroma lacked core scaling and consistency features. Weaviate has a limitation that if you scale the nodes you have to rebuild indexes, as well as the other limitations have they listed, such as replication still being under development. These two products are still obviously being developed and these problems will be solved, but for now we’ll pass.
RediSearch was originally high on the list, because it has the base core features we need and is also already part of the Automattic stack, with its scalability being well known. It does come with a caveat of its licensing though, so it’s not a clear winner on those merits alone.
That left us with these three contenders:
|Consistency||Well known||Configurable, focused on availability and maximum throughput of search ops.||Configurable, designed for high efficiency under workloads where eventual consistency is an acceptable tradeoff.|
|Notable features||1. Vector range queries|
2. Potential to include RedisBloom
|1. Automated vector embedding quantization|
2. Node “Listener” mode for backups and geo-replication
3. Multi-vector support
|1. Automated vector embedding|
2. Ranking using ONNX models
3. Search with multi-vector indexing
Installation and Configuration
With RediSearch, it was trivial to get a cluster up and running and quickly become productive. The query syntax is a little unintuitive, but this can be fixed by abstracting it via a library.
Qdrant was very simple to get up and running as well, even more so than RediSearch, because it uses the raft consensus protocol, which just works™ , and creating a collection is a single API request, and even simpler with the Python library.
Vespa has been in development for a long time and shows this in its configuration and setup requirements. Setting up a cluster wasn’t simple, and neither was getting used to the configuration you have to create for each application package, which requires a schema, document, etc. What a finicky experience and sophisticated offering for our simpler needs! 🙂 There are many example applications to work off, but I can see this being very anti-agile for products/features that only require vector storage.
Vespa has by far the richest feature-set. In fact, I ended up getting carried away with prototyping and testing multi-vector indexing and custom embeddings—not to mention the hybrid search support with traditional BM25. The major downside is the complexity and somewhat steep learning curve to become productive. I would say that it is a strong contender to be an Elasticsearch replacement, but its rich feature-set and therefore complexity count against it in this comparison.
RediSearch has a dependability and familiarity that puts it in good stead to be a natural choice, however, Qdrant matches its simplicity, is recognized by industry, and has the biggest standout feature for our use case of a pure vector database, the automated quantization. Since vector embeddings are stored in memory, this feature has the potential to have significant impact as we scale.
To summarize, when it comes to selecting between them for scalability, performance, observability, and vector database functionality, the two options we considered the best were RediSearch and Qdrant. Based on our evaluation, the recommendation was to deploy a Qdrant cluster, which is exactly what we did.
We conducted this evaluation so that we could launch an internal service that was performant, easy to deploy, and maintain. So far the Qdrant cluster has only peaked at serving 37 searches per second in production and it has not been necessary to horizontally scale the service, however we have upgraded it effortlessly a few times without incident with the Qdrant Team releasing new features at a steady release cadence.
As our internal service’s usage expands across Automattic, we’ll begin to learn the limits of the cluster and enter into the horizontal scaling phase with certainty that, considering how effortless the administration has been so far, the process should go off without a hitch. Let us know if you’ve had a similar experience or maybe your tests resulted in a different outcome? We’ll be happy to hear about them.