Six months ago I was the guy defending Pinecone in every group chat. The managed service was fine, the latency was acceptable, the price was high but predictable, and the API had not burned me. Then my bill hit four figures on a product that was not earning four figures, and I started looking at alternatives with the energy of a man who had just seen his AWS statement.
Two months later I have run the same workload across four different vector stores. The workload is a RAG pipeline for a niche docs site with roughly 2 million embedded chunks, 500 to 2,000 queries per day, and a latency budget of 200 milliseconds at the p95. Nothing exotic. Just the kind of RAG setup that a lot of developers ship and then stop thinking about until the invoice arrives.
This post is the honest write-up of what each option actually felt like to run, where each one broke, and which one I ended up keeping in production. If you are picking a vector database in 2026 and you want numbers from a real workload instead of a benchmark deck, this is that.
What Changed In Vector Storage In 2026
Before the comparison, a quick orientation on where the market sits today, because the vector DB landscape has shifted more than most people noticed.
Pinecone is still the most well-known managed option, but its mindshare is cracking. The complaints that used to be whispered are now loud. Cost at scale is bad. The query model is opinionated in ways that frustrate people. The free tier got stingier.
pgvector is no longer the “cute little extension” it was in 2023. Postgres 17 and 18 landed serious improvements to HNSW indexing and parallel query execution, and the extension itself got a major overhaul. For most workloads under 10 million vectors, it is now fully production grade and the operational story is simpler than any dedicated vector DB.
Turbopuffer went from “interesting beta thing” to one of the most talked-about storage layers in the AI infrastructure space. It is built on object storage, which means it is dramatically cheaper than alternatives for large corpora, at the cost of some latency.
Qdrant keeps quietly eating market share. The open-source story is strong, the hosted product is solid, and the feature velocity is faster than its competitors. It has become the default pick for people who want serious filtering and hybrid search without paying Pinecone prices.
Weaviate, Milvus, Chroma, and LanceDB all still exist and still have fans, but none of them punched through enough to warrant prime billing in this comparison. I will touch on them briefly at the end.
The rest of this post is the four options I actually ran in parallel, with the same data, the same queries, and the same team writing the glue code.
pgvector: The Boring Winner For Most Projects
pgvector is a Postgres extension that adds a vector column type and similarity search operators. You install it, you create a column, you insert embeddings, you query with cosine or L2 distance. If you already run Postgres, which you probably do, there is nothing new to learn, deploy, or monitor.
What worked
The operational story is the thing. pgvector is just Postgres. Backups, migrations, monitoring, connection pooling, ACID transactions, joins against your relational data, all the things you already know how to do in Postgres. You do not need a separate vector pipeline, a separate set of credentials, a separate billing dashboard, or a separate on-call rotation.
Joining vector search results against other tables is trivial. Need to filter embeddings by tenant, by access permission, by date range, by any column on any other table? It is a SQL query. On any dedicated vector DB, this same operation is a second round trip plus whatever metadata filtering their API supports, which is usually less flexible than SQL.
HNSW index performance has caught up. On my 2 million vector workload with 1024-dimensional embeddings, pgvector served queries in 8 to 25 milliseconds at the p95, well under my 200ms budget. Index build time on the initial load was about 40 minutes on a mid-sized RDS instance, which is acceptable for most projects.
Cost is hard to beat. If you are already running Postgres, adding pgvector is free. The extra compute and storage is measurable but small. My RDS bill went up maybe 15 percent after adding the vector workload. Compared to Pinecone at 4x that number monthly, the math is obvious.
Where it hurts
It does not scale past a certain point. At 2 million vectors I was comfortable. At 20 million vectors on the same instance, query latency climbed into the hundreds of milliseconds and index build time became a weekend project. You can scale up the instance, but eventually you are running a database server that is mostly serving vector queries, which is a weird shape for Postgres.
Hybrid search support is okay, not great. You can combine lexical and vector search in SQL, but you are writing the combination yourself. Dedicated vector DBs and search engines have more mature hybrid retrieval built in.
Re-indexing after large inserts is not online. If you bulk-insert millions of vectors, you need to rebuild the HNSW index, which locks the table. For workloads where data changes constantly, this is annoying.
Metadata filtering on very selective queries is slower than pure vector search. pgvector is improving here but the pre-filter vs post-filter decision still requires some thought on complex queries.
Who it is for
pgvector is the right default for any project where you are already on Postgres, your vector count is under 10 million, and you do not have a specific reason to reach for a dedicated vector DB. That covers most indie and small-team projects. If you are running a RAG setup like the hybrid pattern I described recently, pgvector will handle it fine.
Pinecone: The Managed Option That Used To Be The Default
Pinecone was the early winner in the managed vector DB space and it is still the option most developers have heard of first. It is a hosted, serverless vector database with a clean API, no ops overhead, and a reputation for “it just works.”
What worked
Setup time is legitimately short. Sign up, get an API key, point your embedder at the index, start querying. There is no infrastructure to manage. There is no version upgrade to worry about. There is no disk to run out of.
Serverless pricing on smaller workloads is reasonable. For a project under 100,000 vectors with modest traffic, Pinecone’s free or low tier is genuinely fine and you should not over-engineer the choice.
The API is clean and has good client libraries across the major languages. Error messages are decent. The docs are well-maintained. You do not spend your first week fighting the tool.
Performance is consistent. My queries ran in 40 to 80 milliseconds at the p95, which is slower than pgvector on my particular setup but well within any user-facing latency budget.
Where it hurts
Cost at scale is brutal. This is the complaint you will hear most often and it is deserved. At 2 million vectors with low query volume, my Pinecone bill was roughly 4x what pgvector cost me, and the numbers get worse as vector count goes up. At 10 million vectors, Pinecone crosses into “is this even worth it” territory for anyone who is not funded.
The pod model has been confusing for years. Serverless improved the on-ramp, but tuning for performance still involves understanding concepts that are specific to Pinecone and not transferable to any other system. When you hit a performance issue, the first hour is often spent learning Pinecone-specific terminology rather than debugging.
Metadata filtering is limited compared to a real database. You can filter by fields, but complex queries with multiple conditions or joins are either not possible or have to be implemented in application code. This is fine for simple tenant-scoped lookups. It is painful for richer filtering patterns.
Lock-in is real. Your data lives in Pinecone’s format in Pinecone’s system. Migrating away requires rebuilding your index somewhere else and re-embedding if you want to change models. The cost of switching is non-trivial and Pinecone’s pricing power comes from knowing that.
Who it is for
Pinecone makes sense if you want zero ops overhead, your vector count is moderate, and the cost is acceptable for your business model. It is a reasonable pick for teams with funding and no appetite for managing infrastructure. It is a bad pick for bootstrapped projects or for anything where unit economics matter.
Turbopuffer: The Object Storage Play
Turbopuffer took a bet that the future of vector search is object storage backed. Instead of keeping all vectors in memory or on attached SSDs, it stores them in S3 or equivalent and uses caching and smart indexing to make queries fast enough without the memory footprint.
What worked
Cost per vector is dramatically lower than any traditional vector DB. For large corpora, the difference is not 2x or 3x. It is 10x to 50x. If you are embedding an entire documentation corpus, a legal archive, or a multi-tenant dataset with millions of vectors per tenant, Turbopuffer’s economics are in a different league.
Scaling to very large datasets is effectively unbounded. You are limited by object storage, which is to say, practically not limited at all. I tested up to 50 million vectors and latency stayed stable, which is the part that matters.
The API is clean and simple. Insert, query, filter, done. Less surface area than most of the competition, which is good when you want a tool to get out of your way.
Where it hurts
Cold query latency is the trade. First queries against a cold shard are significantly slower than in-memory alternatives, often in the 300 to 800 millisecond range on my workload. Caching warms it up for subsequent queries, but if your query pattern has a lot of cache misses, you feel it.
The hosted product is younger than Pinecone’s. The docs are good but there is less third-party content, fewer tutorials, and fewer people who have already hit your specific problem and written about it.
Hybrid search is improving but not as polished as Qdrant’s. If you need serious lexical-plus-vector retrieval with tuned weighting, you are doing more of the work yourself.
Metadata filtering is capable but, like most dedicated vector stores, not as expressive as SQL. For filter-heavy workloads, this can push complexity into your application layer.
Who it is for
Turbopuffer is the pick for workloads where you have a lot of vectors and cost matters. Multi-tenant apps with per-tenant corpora. Large document archives. Anything where you would have looked at Pinecone’s pricing and spit out your coffee. If your traffic pattern tolerates occasional colder queries, the cost savings are the kind that change whether the feature ships at all.
Qdrant: The Feature-Rich Open Option
Qdrant is an open-source vector database written in Rust. You can self-host it or use the hosted product. It has arguably the richest feature set of any option in this comparison: advanced filtering, hybrid search, quantization, sparse vectors, and a lot of knobs for tuning.
What worked
Hybrid search is genuinely excellent. Qdrant supports lexical and dense retrieval in the same query with tuned weighting, and the results are noticeably better than any of the other options on queries where both signals matter.
Filtering is expressive. You can filter on nested fields, ranges, geo-spatial conditions, and logical combinations with syntax that reads cleanly. For filter-heavy workloads this is a meaningful step up.
Self-hosting works well. The defaults are sensible. Resource usage is reasonable. Upgrading between versions has been smooth for me. If you want a vector DB you control on infrastructure you control, Qdrant is the one that gives you the least operational pain.
Hosted pricing is competitive. Cheaper than Pinecone for comparable workloads, more flexible than pgvector once your dataset grows past its comfort zone.
Performance on my workload came in around 15 to 40 milliseconds at the p95, between pgvector and Pinecone.
Where it hurts
The surface area is larger than some developers want. All those features mean more things to learn, more choices to make at setup time, and more opportunities to misconfigure something. If you just want a simple vector store, Qdrant can feel like overkill.
The Rust ecosystem keeps the core tight but the client libraries across every language are not all equally polished. The Python client is great. The TypeScript client is fine. Others vary.
You are running a second piece of infrastructure. That is the trade against pgvector. It is not a lot of operational overhead, but it is not zero.
Who it is for
Qdrant is the right pick when you need advanced filtering, hybrid search, or serious tuning options, and either you are comfortable with self-hosting or their hosted pricing works for you. It is also a great middle ground between the “just use Postgres” extreme and the “pay someone else to care about this” extreme.
Side By Side: The Numbers
Here is how the four options stacked up on my actual workload. These numbers are for 2 million vectors, 1024-dimensional embeddings, and roughly 1,000 queries per day with moderate filtering.
Query latency at p95. pgvector: 8 to 25ms. Qdrant: 15 to 40ms. Pinecone: 40 to 80ms. Turbopuffer: 25 to 60ms warm, 300 to 800ms cold.
Monthly cost at this scale. pgvector: marginal bump to existing RDS bill, maybe 30 dollars. Qdrant self-hosted: 40 dollars. Qdrant hosted: 80 dollars. Turbopuffer: 25 dollars. Pinecone: 180 dollars on their serverless tier.
Setup complexity. pgvector: minutes if you are on Postgres. Pinecone: minutes. Turbopuffer: an hour. Qdrant hosted: an hour. Qdrant self-hosted: half a day.
Feature richness for advanced retrieval. Qdrant wins by a meaningful margin. pgvector is capable but you are writing more SQL. Pinecone is adequate for the common cases. Turbopuffer is improving but still the newest of the four.
Scaling ceiling. Turbopuffer is effectively unbounded. Qdrant scales well with effort. Pinecone scales if you pay for it. pgvector tops out for most teams around 10 million vectors without significant tuning.
What I Actually Ended Up With
I moved the production workload from Pinecone to pgvector. It was the right call for my specific situation. The vector count was under pgvector’s comfort zone. The ops savings were real because I was already running Postgres. The cost drop was dramatic. The latency was actually better than Pinecone on my workload, which I was not expecting.
I kept Qdrant for a different project where I needed serious hybrid search. The lexical plus dense retrieval combination delivered results that neither pure vector search nor pure text search was producing, and the feature gap between Qdrant and pgvector on that specific pattern mattered.
I am using Turbopuffer for a third project, a large documentation archive where cost per vector is the dominant factor. The cold query latency is real but the archive is not user-facing, so a slower first query per topic is acceptable.
Pinecone is gone from my stack. I do not have a bad word to say about the product. The pricing just stopped making sense for my unit economics. If I had a funded company where ops simplicity was worth 4x on the bill, I might still be there.
The interesting thing about this exercise is that the right answer was different for each project. “Which vector DB should I use” does not have one answer. It has one answer per workload, and the workload details that matter are size, latency, filter complexity, and whether your business model can absorb the cost.
The Quick Decision Guide
If you remember nothing else from this post, here is the shortest version I can give:
- Already running Postgres and under 10 million vectors? Start with pgvector.
- Need serious hybrid search or rich filtering? Qdrant.
- Very large corpus and cost is the biggest factor? Turbopuffer.
- Want zero ops and have the budget? Pinecone is still fine.
Then validate your pick with real queries on real data before you commit. Benchmarks from a blog post, including this one, are a starting point, not a conclusion. The workload that matters is yours.
What About The Other Options
A few words on the vector stores that did not make prime billing, because they will come up and you should know where they sit.
Weaviate has strong semantic search features and a loyal community. It feels like Qdrant’s slightly older cousin. If you are already on Weaviate, stay on Weaviate. If you are picking new, Qdrant generally wins the comparison.
Milvus is the big-data option. Built for serious scale, used by teams with tens or hundreds of millions of vectors. If Turbopuffer’s cold latency is a dealbreaker and you need in-memory performance at massive scale, this is the category where Milvus lives. For smaller workloads it is overkill.
Chroma is the developer-experience darling for local and small-scale RAG. It is great for prototypes and local development. It is not where you want to be at production scale.
LanceDB is a quiet sleeper. File-based, embedded, extremely simple to integrate, and the developer ergonomics are excellent. It is the right pick for desktop apps and some edge cases but has not yet hit the traction of the others.
Elasticsearch and OpenSearch with vector plugins deserve a mention. If you already run one of them for lexical search, adding vector search is a reasonable incremental step. The vector-first options generally outperform them on pure vector workloads, but the “we already have it” argument is strong.
The Real Lesson
I spent two months testing four vector databases and the most valuable thing I learned was not which one was best. It was that I had been paying for Pinecone out of habit, not because it was the right tool. I installed it two years ago when it was the obvious default, never revisited the decision, and let the bill grow.
That pattern is the pattern worth breaking. Vector databases are a category where the landscape shifted hard between 2023 and 2026. If you have not looked at your setup in the last year, there is a real chance you are running the wrong tool for your current workload, or paying three times what you need to.
The cost of picking wrong used to be “minor inefficiency.” The cost now, for a small team running real traffic, is the difference between a feature that pays for itself and a feature that quietly drains your runway. Pick with your eyes open, benchmark on your data, and do not let yesterday’s default choice be today’s default line item.
The vector DB question in 2026 has four reasonable answers, not one. Figure out which one fits your workload, stop overpaying for the others, and get back to building the part of the product users actually care about.