C1. Reliable, Scalable, and Maintainable Applications - Part 3
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Hello 👋 ,
Welcome to a new post!
Guys, we are growing 🥳! Thank you so much for your love. We are just getting started and this feels amazing. A growing community to me means more fun connecting with you guys but also a lot more accountability and discipline to deliver on what promised. So, if you don’t want me stop writing, share as much as possible (See what I did there? 😜).
Okay, back to the topic now. This is the last part of the first chapter of the first book in the first week of this blog. Complex, isn’t it? That’s the topic for today Maintenance.
In previous post we covered reliability (Part I), and scalability (Part II), if you missed them, go and read them first it will not take more than 10 mins (Duh! That’s like the whole point of this newsletter!).
*Click on the title to read in browser. IMO, it’s better experience.
Maintainability
What is maintenance?
Fixing bugs, keeping system operational, investigating failures, using new platforms, modifying for a new use case, repaying technical debt (something you said, "I will do as fast follow up" but never did), and adding new features - all these activities constitutes maintenance.
If a company had 100$ for building and maintaining software, they will spend 67$ on maintenance and rest all the activities such as requirement analysis, design, development, and testing will cost just 33$.
What I am trying to say is, most of the cost of software industry is in its maintenance not in development.
What are the design principles for minimizing operational pain?
Operability - Making life easy for operations
Simplicity - Managing complexity
Evolvability - Making change easy
Operability: Making Life Easy for Operations
What is operability?
Make it easy for operations teams to keep the system running smoothly.
Good operations can often work around the limitations of bad (or incomplete) software, but good software cannot run reliably with bad operations.
What are the responsibilities of operations team?
Monitoring health of the system and recovering fast
Root causing the problem
Keeping software & platforms up to date including security patches
Keeping tabs on services you are interacting with
Anticipating future problems and solving them before they occur (e.g. capacity planning)
Good documentation
The responsibilities can be seen as requirements and following items are action items on them
How can we achieve these?
Systems should be mundane and boring. The surprises leads to heavy maintenance.
Provide visibility via metric dashboards (monitoring), monitoring runtime behaviour and internal of the system
Design system to work with standard tools automation and integration options
Good documentation and standard operating procedure (SOP) wikis
Self-healing system with allowed manual control over system state by admin
Tip: Automation & good documentation go long way. Identify recurring problem in your system automate it. Repeat it. Sometimes the recurring problem is big enough to provide you opportunity for re-architecture. (I know what you are thinking, so, I will say it out laud "and then promotion (bang on)" 😝 )
Simplicity: Managing Complexity
What is simplicity?
It is opposite of complexity 😝. Coming to that next.
Basically, you start with small and beautiful codebase then making change becomes complex and slows down everyone, further increasing the cost of maintenance. Also called as big ball of mud.
What is complexity?
Complexity is anything related to the structure of a software system that makes it hard to understand and modify the system.
What are the symptoms of complexity?
Concepts can be simply explained with engineer's day to day dialogues. I bet you have said at least one of them along with nice words obviously 🤬🤬.
Change amplification - "I make a change here and something else breaks there"
Cognitive load - "Why is it so difficult to understand?", "Don't we have simpler explanation for all of this?", "Why do I need to know and test so many things for making a simple change?"
Unknown Unknown - "I don't even know what to do make this thing work"
What are the causes of complexity?
Dependencies - When given piece of code cannot be understood and modified in isolation; the code relates in some way to other code, and the other code must be considered and/or modified if the given code is changed
Obscurity - Tribal knowledge (only few people know about things), poor documentation, and un-clean code anything which doesn't provide you answers to your questions easily can be classified under this category
How to manage complexity?
The best tool we have to manage the complexity is abstraction.
My way to put it - The easiest way to understand the abstraction is being a lazy guy - "I don't care how you do it, just give me the result". For example, when writing to a file in a program you don't care on which block it is being stored, is it being stored on SSD or HDD - nothing. Everything is just handled by the high level language and operating system APIs.
But defining such good abstractions is a difficult task. Through out these posts, we will look for good abstractions which can be reused as components in our own designs.
Note - Though the book explains this section pretty well. I re-wrote this as summary from another amazing book “A Philosophy of Software Design” by John Ousterhout. Concepts are same but explanation is concise and clear.
Evolvability: Making Change Easy
The only thing constant is change.
What level of evolvability will be dealt in this series?
Change in business priorities, new feature requests, legal or regulatory requirements change, growth of the system forces architectural changes.
The methods such as TTD & refactoring are applied at source code level, a few files
In this book we try to apply refactoring at architecture level e.g. If you want to convert your single writer blog into a writing platform such as substack, how will you do that?
That’s it for this post. That’s the end of chapter 1. Whoo hoo! Now don’t forget to revise. That’s the important part. In the next chapter we will cover data models and query languages. Stay tuned!
Liked the article? Please Share & Subscribe! 😀