Some thoughts on "The Law of Leaky Abstractions"
Some precautions to avoid falling into the trap of leaky abstractions and why every abstraction eventually forces you to look under the hood. A reflection on Joel Spolsky's article.
When building software, we often encounter different types of theories, patterns, and principles that help us solve common problems. Some of them are more general, while others are more specific to a particular context or problem.
As we begin to dive deeper into software design, concepts like layered architecture, Domain-Driven Design (DDD), and Clean Architecture emerge, among others. Thus, we find statements such as:
“Isolating the business layer from persistence ensures that the core logic remains independent of the database technology, improving maintainability, testability, and portability. This is achieved through the use of repositories to encapsulate data access, allowing the domain model to be agnostic of how data is stored or retrieved.” - Repository Pattern
“Define entities and aggregates in the domain layer, preventing them from containing database-specific annotations.” - Domain-Driven Design (DDD)
“The domain layer defines the data repository interfaces, and the persistence layer implements them, inverting the dependency flow so that it points toward the business logic.” - Dependency Inversion
Following these proposals, we generate code in any of its forms (functions, classes, components, adapters) hiding the particularities and, partly, the complexity of the underlying infrastructure.
Therefore, as more infrastructure components and technologies are added to a project, more abstractions are generated, and we have less control over what lies under the hood.
Let’s think of a project where we have technologies like an event bus and/or a message broker, a database, a file storage system, a caching system, etc. Each of these components carries a technological complexity depending on the technology used, and every abstraction we create with the idea of making the business logic as agnostic and independent of the implementation as possible is a potential source of future problems.
This is what Joel Spolsky talks about in his article “The Law of Leaky Abstractions”, which was published in 2002 but remains just as relevant today.
Let’s consider a simple example, familiar to most: an ORM.
An ORM is an abstraction that, from the developer’s perspective, has a clear purpose: to interact with different databases using the same mental model, the same entities, and similar operations. Saving, updating, or querying data should be conceptually equivalent regardless of the underlying engine.
Under this abstraction, working with data becomes similar to manipulating objects in memory; persisting them is as simple as invoking a method, and querying them resembles accessing properties of an already loaded object.
However, this simplification begins to blur when the specific details of each technology come into play.
For example, different database engines behave differently regarding:
- Transaction management and isolation levels
- Consistency
- Support for joins and complex relationships
- Indexing strategies
- Performance under certain query patterns
What seems like a homogeneous operation in an ORM can actually translate into completely different strategies depending on the engine. An efficient query in a relational system can be extremely costly or even impossible in a non-relational one.
This is where the abstraction begins to “leak”.
The idea that we are working with a uniform model breaks down when:
- Performance is not as expected
- Concurrency issues arise
- ORM queries generate more operations than anticipated
- Certain operations cannot be expressed efficiently without “escaping” the ORM
In these cases, as developers, we are forced to understand details that, in theory, were abstracted away:
- How the engine executes a query
- What indexes are being used
- What kind of locks are being applied
- How relationships are being materialized
In other words, the abstraction is no longer sufficient, and the underlying reality resurfaces.
The ORM tries to hide the complexity of multiple storage systems under a single interface, but it cannot do so perfectly because the differences between those systems are fundamental, not accidental.
And this phenomenon is not exclusive to ORMs. It also appears when we try to:
- Model distributed operations as if they were local
- Encapsulate the network as if it were a function call
- Represent concurrent systems as if they were sequential
In all these cases, the abstraction works well until it doesn’t.
Therefore, one of the main risks when applying patterns like Repository, DDD, or Clean Architecture lies not in the patterns themselves, but in assuming that the abstraction they propose (or we implemented) is sufficient for all scenarios.
When that happens, we stop designing with the reality of the system in mind and start designing based on an idealized representation of it. And that is where the problems begin.
Because of this, I want to share some ideas on how we can prevent and/or mitigate these problems.
-
Understand the underlying complexity:
- Before using an abstraction, carefully analyze its limitations and possible consequences. Take the necessary time to understand the pros, cons, and implications not just of the abstraction you are creating or using, but also of the underlying infrastructure and chosen technologies.
- Understand what is being hidden (even if you don’t use it right now).
-
Never blindly trust an abstraction:
- Every abstraction will fail at some point. Design with that moment in mind.
- The network can fail; a request might reach a server and execute without receiving a proper response. How are you going to solve idempotency? How are you going to handle timeouts and retries?
-
Choose carefully where to abstract:
- Pay special attention to operations that are critical to your business consistency, or sensitive to performance and scalability.
- At times, it is preferable for your business logic to know that its persistence is a PostgreSQL database and how it can handle concurrency, rather than blindly trusting an abstraction that hides it or does not adapt to your needs.
- It is perfectly fine to abstract repetitive, stable, or non-critical operations, for example: logging, data parsing, and other utilities.
-
Maintain escape hatches:
- Every good abstraction should be able to be broken when you need to. If an abstraction cannot satisfy a specific need, it should offer a clear alternative.
- It is not always possible to abstract everything; therefore, make sure your abstraction is flexible enough to be broken when necessary. This is why some ORMs have utilities like
rawQueryorexecuteRawto be able to execute SQL queries directly. - Accept that you can, will (and probably will be forced to) look under the hood and eventually break abstractions.
And remember, there is no worse abstraction than the one that makes you ignore reality. Complexity does not disappear; it waits for the right moment to return, and there you can ignore it, but you cannot avoid its consequences.
Thank you so much for reading! I hope to see you again in my next post. I already have some ideas in the works regarding specific solutions to these challenges. See you soon!