SpacetimeDB: a short technical review

The database market is harsh, particularly for newcomers. It’s very hard to launch a new product and differentiate yourself from the incumbents. Even harder to gain any long term traction. Earlier this week, SpacetimeDB launched version 2.0 of their database with a peculiar approach that —as far as I can tell— hasn’t been done before: a slightly surreal (meme-y) video where they mock their competitors (drinking “competitor’s tears”) and a set of benchmarks that seem too good to be true (they are, indeed, not true), and that also mock other databases. Look at the cute magnifying glass next to the big losers of that benchmark. You gotta zoom in to see how much they suck! Good stuff.

I’ll be upfront and admit that I find this distasteful. But nonetheless, I think there are interesting ideas in this product, and I’d like to do a short technical review whilst being as fair as possible.

Benchmarks

One common mistake newcomers to the database space make is believing that you can win by having “the best performance”. I’ve never seen this in practice. The (very few) companies that have built a sustainable database offering are winning by providing good, honest technical work that stands on its own. Of course, having benchmarks definitely helps with that. But then the benchmarks have to be good, honest technical work.

The ones that SpacetimeDB provided are none of those things. They have quite a few technical flaws in what they measure. You can see an alternate set of benchmarks here where SpacetimeDB stacks up very poorly against the competition.

Nonetheless, the big fundamental flaw in those benchmarks is that they’re not honest. And I get where they’re coming from, I do: they’re not honest because their database offering is something very different to the competition, and that makes it very enticing to write benchmarks like that. Their product is in a different segment of the database space, and they’re choosing to compare their product against databases that make different tradeoffs. It’s an appealing comparison, but it’s not a fair one.

I’ll give you an example of what this looks like, which I went through myself: a couple years ago I was working at PlanetScale and we shipped a MySQL extension for vector similarity search. We had some very specific goals for the implementation; it was very different from everything else out there because it was fully transactional, and the vector data was stored on disk, managed by MySQL’s buffer pools. This is in contrast to simpler approaches such as pgvector, that use HNSW and require the similarity graph to fit in memory. It was a very different product, with very different trade-offs. And it was immensely alluring to take an EC2 instance with 32GB of RAM and throw in 64GB of vector data into our database. Then do the same with a Postgres instance and pgvector. It’s the exact same machine, exact same dataset! It’s doing the same queries! But PlanetScale is doing tens of thousands per second and pgvector takes more than 3 seconds to finish a single query because the HNSW graph keeps being paged back and forth from disk.

It was indeed very alluring to show that in a benchmark. “We’re 10000 times faster than pgvector!”. But come on now. That’s not honest. Yes, it’s the same machine, the same dataset, and the same queries, but it’s not the same thing. We did not publish those benchmarks; instead we published a technical breakdown of the implementation, without unfair comparisons, which was very well received.

You don’t need “INSANE BENCHMARKS” to win at this. You just need solid technical work and solid technical writing explaining the trade-offs and limitations of your offering. You can see another example with Turbopuffer. Their benchmarks are not impressive, particularly when compared to their competitors. Their documentation has more lines discussing the things the database cannot do than the things it can do. But everyone knows that if your use case fits their offering, they have the best product for search in the market. Miles ahead of the competition. They don’t drink their competitors’ tears, they just quietly take their customers.

Anyway: back to SpacetimeDB and their benchmarks. They have a very different offering than their competitors! It’s an all-in-one database + application server, where you deploy a database instance and your application’s code runs inside the database itself. That’s a very interesting idea, I think. You could say it’s just like stored procedures in a relational database, but with better developer experience. Fair. But you can totally build a viable product out of that, though!

You gotta admit, however, that it has very little to do with the multi-region, highly available distributed databases against which it’s benchmarking itself. If your application code is running inside the database and your competitors have a separate application that must perform individual network requests for each query, then yes, you should be ahead in benchmarks that measure QPS. But are those honest benchmarks? Is that comparison what you want to show to potential customers evaluating your technical offering?

I’d say it’s not a very good thing to highlight. Accessing data in-memory is faster than accessing data over a network, and you’ve built a benchmark harness to prove that. As a potential user, I am not very impressed. I think from a marketing point of view, it would be much more interesting to show how fast you can access data in-memory, and then explain the trade-offs you’ve taken to get at those speeds.

From what I gather, there’s no clear technical breakdown on their website that explains this. So let’s give it a go here.

Storage

There are several reasons why SpacetimeDB shows such good write performance in the synthetic benchmarks they’ve published. Obviously, the elephant in the room is that application logic runs locally next to the database and it can be exceedingly efficient when writing to the data store that way. They boost this efficiency even further with other tricks (such as batching writes), but to get to the performance numbers they’re showing, you need to cut corners somewhere: the data store is in-memory, which is very much unlike a traditional RDBMs.

Now: The good news is that writes to the in-memory store are linearizable. There’s some bad news, however. Proving linearizability of a system is usually an arduous task; I did not need to whip out TLA+ to do it here. Here it is trivially provable. Because the system is, well, a hash table with a lock in front of it.

Storage implementation diagram

This may seem exaggerated but trust me, it actually a quite accurate description of how the storage engine is designed. The committed state for the whole database in a SpacetimeDB instance is wrapped in a single Read-Write Mutex. All write operations happen sequentially, which is indeed trivial proof of linearizability. Two writes cannot happen at the same time, so they cannot conflict or race. But a read and a write cannot happen at the same time either!

What happens if there are too many writes? Do the readers starve? Building a data store on top of a single global lock with read/write semantics is a valid technical choice. Perhaps it is a bit questionable to market that as “a database”. But it seems to me that if you’re going all in with that approach, if that lock will provide the concurrency control for your whole database, you need to have very explicit, customizable semantics for prioritizing readers and writers, to ensure the server remains responsive regardless of the workload.

In this case, the behavior is an implementation detail, not particularly defined nor explained anywhere. The mutex is an off-the-shelf parking_lot::RWMutex, from the parking_lot crate. It has eventual fairness, which means that readers will eventually acquire the lock, even during high write-throughput scenarios. They will be randomly delayed, though, up to 0.5ms. The parking_lot crate is a Rust port of WebKit’s original WTF::Lock — this 2024 changeset shows how eventual fairness was implemented there. You should read it, it has very good performance insights on mutex contention. Think of it as a palate cleanser from this blog post. Now back to the hash table & the lock.

So what happens in this system during a write? Well, anything happens. It really is quite magical. While the global lock is held, a Wasmtime runtime is used to execute “reducers” (arbitrary user code, compiled to WebAssembly). While the reducer is executing, no other reducers can execute and write to the database. No other code can read from the database either. From their official documentation, reducers “cannot perform HTTP requests”. Yeah. No shit. The critical section for all writes to this database is exclusive and serialized, and it executes arbitrary user code. You’d better not be doing HTTP requests in the middle of it.

There’s a bit of an escape hatch here: you can use “Procedures” in the server. As of this week’s release they are still in Beta (the documentation warns that the API may change in the future). They do allow you run expensive code, including HTTP requests, so that’s a good thing. From inside a procedure, you can open a transaction, which again acquires the global mutex and doesn’t allow any other concurrent writes nor reads to the database, so make sure you commit it very very quickly or the whole system will stall.

For reads, the story is very similar. They’re supposed to happen through “Views”, which are the read-only equivalent to reducers. Since they acquire a reader lock on the global mutex, several views can run concurrently, but the database cannot be written to while views are executing. Just like reducers, views are arbitrary user code compiled to WebAssembly.

Durability

One obvious consequence of this single-mutex design for a database is that you need to be doing the least amount of work possible in the critical path for the transaction. HTTP requests are definitely out of the question. But you cannot do other “expensive” stuff like some other RDBMs often do, such as, you know, persisting the transaction to disk (teehee).

Durability pipeline diagram

This fully in-memory database is backed by a Write Ahead Log, but the WAL is not committed to disk as part of the write transaction. The WAL is asynchronous, and is flushed periodically to disk on the background (by default, every 50ms).

Can you actually make this fully consistent? The limitations of the “single mutex design” make this complicated, as the WAL can never be written synchronously (it would completely stall all other writes and reads in the application). The system does provide an option when reading, with peculiar semantics. The withConfirmedReads flag allows reads to only return data that has been synced to disk, by sleeping on the server until it eventually sees the WAL entries for the result of the query flushed to disk. This can be a sleep of up to 50ms, which is a long time for a request. It’s not a very ergonomic behavior, but the assumption here is that this is a database for “mostly ephemeral” data and your average query doesn’t need this kind of highly consistent guarantee.

This whole thing is giving big MongoDB-2011 vibes. In many ways, really. The guys at Mongo launched a pretty shitty database with very impressive benchmarks, and eventually got builled by the internet (see: MongoDb is Web Scale) into implementing a proper storage engine. They acquired WiredTiger, which really is a proper storage engine. Fifteen years later, they are a serious and viable database company. And yet there’s still a lot of technical people who remember the early days of Mongo and refuse to use it in production or recommend it. Their information is outdated. Modern Mongo is a serious database that works. But the bad technical reputation lingers, and will linger forever.

I think there’s a big lesson to be learned here: in 2026, if I were to launch a database product that is a hash table with a single lock in front of it, I’d do it quietly. Because cutting corners when launching a database product has been proved to be a viable approach (I wouldn’t do it myself, but MongoDB did it with great success). But as soon as the product catches on, if it does, you need to rush to pay back the technical debt and the reputational debt. A marketing video with laser beams and a “bottle of tears” makes this much more complicated.

Tradeoffs

We’ve seen the technical choices that allow SpacetimeDB to perform so well in specific benchmarks (i.e. benchmarks where they measure how fast our application can write to the database; given that our application is the database). These choices are not explained upfront in the documentation, and sadly the trade-offs that these choices imply are also not explicitly listed.

Going through them briefly: this is not a distributed system and it has a very hard limit on scalability or availability. You can deploy a “SpacetimeDB cluster”, meaning a primary instance and several followers with eventually consistent replication (emphasis on eventually consistent; the WAL is eventually consistent, the replication is too, there’s a lot of margin for things to go wrong here), but your whole system is bottlenecked by the CPU and RAM capacity of the machine where your main SpacetimeDB instance is deployed. You need enough CPU for your database to execute all the queries, but also for your whole application to execute all its application logic, as again the application lives inside the database. You need enough RAM to fit all your database’s data in-memory. SpacetimeDB is not disk-backed at all; it just flushes a WAL to disk (and periodically, snapshots that make recovering from the WAL quicker on restarts). If your dataset grows larger than RAM, your database (and your application, which are the same thing) will fail over. The only option for scalability here is vertical: buying a bigger machine to run your database.

These tradeoffs are, again, perfectly valid. But they clearly position SpacetimeDB as “a more powerful Redis”, not “a more performant relational database”. It’s very puzzling why the authors chose to benchmark as the later.

Use Cases

The original version of SpacetimeDB was developed as the backend of an MMORPG (a real game that you can play in Steam right now). That seems fair to me. I think all the technical choices in the database fit this use case. You can asynchronously flush to disk a WAL entry that says that xXxPussyHunter420xXx has looted [Thunderfury, Blessed Blade of the Windseeker]. 50ms of delay is OK here. He’ll get upset if the instance crashes just right then, but he’ll get over it.

Of course, there’s not a lot of studios building MMORPGs right now, and the ones that are building any kind of multiplayer games really tend to prefer their own in-house backends. They’re large studios after all, they’ve done this before. So I totally get why they’re pivoting SpacetimeDB v2 into something with broader appeal.

Their marketing page now says that “LLMs go much further with SpacetimeDB because it handles all the persistence, logic, deployment, and real-time sync in a single cohesive backend.” That’s also a fair choice. Agentic coding is the thing happening right now. Building a database that targets LLMs seems like a good idea. But I’ll be honest here: they literally made the worst possible technical choices for this use case.

The whole shtick of SpacetimeDB is that the performance and availability of both your application and your database is 100% dominated by short segments of user code which cannot perform any operations with side effects or stalls, because they’re compiled to WebAssembly and executed by a virtual machine inside a critical section that serializes all writes and reads to the storage backend of your application. The absence of side effects or stalls cannot be enforced by the type system, and is dependent on the particular WASM bytecode that is generated by a JIT compiler at runtime. Any mistake inside these critical sections, any operation that could cause them to stall under load, is probably only going to be seen in production, and is going to degrade the performance of your whole application — most likely to the point of causing an availability issue.

This is not the ideal environment for a LLM to program in. lol

Having said that: I think there’s a product here, and some lessons to learn. Perhaps the authors eventually apply them to SpacetimeDB v3 and launch a more resilient and LLM-friendly database, where application code is isolated and can run for as long as it needs, without possibly affecting other application code running locally, even when faced with serious implementation bugs; where transactions can run for as long as they need without affecting the performance of other transactions; where they’re implicitly throttled if they’re taking too long, if the LLM did not provide an optimal query plan. Perhaps we’ll see a system that is much more resilient to failure, but with much less “impressive performance”; perhaps the system will be trivially distributed so that the AI agent doesn’t have to plan a distributed system itself; perhaps it will launch with fewer silly benchmarks and with more technical details.

Now that’d be a product to keep an eye out for.