The Decision-Making Framework Every Engineering Team Needs
A practical way to turn chaos into shared decisions
One of the first issues of this newsletter explored the decision paralysis and how to overcome it. Today, I’m excited to share a guest post by Alfonso, who offers a fresh perspective on decision-making topic. Don’t miss it!
I’m Alfonso Graziano, and I want to thank you for reading and for being part of this conversation. I’m a Tech Lead and Software Engineer at Nearform, where I lead technical teams and help build reliable, scalable systems. I also write and speak about leadership, system design, and decision-making in tech.
I hope this post gives you some practical tools (and maybe a few lightbulb moments) for making better decisions in messy, real-world situations. I’d love to hear your thoughts, feel free to drop a comment or reach out on Linkedin to share your perspective.
Today we will talk about Decision Making, but before doing so, let me tell you a story. A team inherited a complex platform from another company. At first, they were excited: a big client, a challenging product, plenty of interesting work.
But the excitement faded fast…
There were no tests. Not unit tests, not integration tests, nothing. Just a giant, tangled codebase and a vague hope that things “kind of work.”
Releases became a ritual of anxiety. Developers pushed code, then manually clicked through pages hoping nothing exploded. Spoiler: things exploded.
QA was in survival mode. They didn’t even try to verify everything (they couldn’t). So they focused on the top 10 screens and crossed their fingers.
Customers kept reporting bugs. The dev team started burning out.
They talked about writing tests. They really did, but there was a roadmap. There were features to ship, there were deadlines.
So the decision was made: “We’ll do the tests later”. They didn’t document it, didn’t evaluate options, didn’t set conditions. They just… let the pain keep growing quietly in the background.
Six months later, the same conversations were happening, just with more pressure, more bugs, more team churn, and less trust from stakeholders.
This is not a rare story, this is what happens when you make an important decision without a framework, when urgency beats clarity, and good intentions become technical debt.
Now let’s rewind. What if they had approached that decision deliberately?
Let me show you a framework that could have changed everything.
The framework I’m about to walk you through is inspired by the book Clear Thinking by Shane Parrish. The core idea is simple: eliminate mental friction, slow down the right way, and act with intention.
Most of the steps you’ll see below are adapted from Shane’s work, with software engineering examples. If you’re curious and want to go deeper, I highly recommend exploring Farnam Street, it’s one of the best resources out there for thinking clearly in a noisy world.
Ready to learn how to take better decisions? Let’s get started!
1. Define the Problem
This is where many teams go wrong. They jump into fixing things without really understanding what’s broken.
In our story, the team said:
“We don’t have time to write tests, we have to ship features.”
But that’s not a clear problem, it’s more like an excuse. To define a problem well, you need to slow down and ask a few basic questions:
What exactly is going wrong?
Write it down in plain language. Avoid vague words like “mess” or “bad quality.” Describe what people are seeing and feeling.
Can we observe it?
Is there real evidence? Metrics? Bugs? Complaints? Logs? Anything that proves the problem exists in the real world, not just in someone’s head.
Can we measure it?
Rough numbers are fine. Just enough to track if things are getting better or worse. Here’s an example of a well-defined problem:
“In the last three months, every release has caused 3–5 bugs in production. QA can’t test everything manually, and each bug takes about 4 hours to fix. Developers lose 1–2 days per week doing rework.”
Now the problem is specific, observable, and measurable. Everyone on the team can look at it and say, “Yep, that’s real.” And here’s the final step: write it down and check:
“Do we all agree this is the main thing we want to fix?”
If the answer is yes, you’re ready to move on. If not, pause and align. Because if you’re solving different problems, you’re going in different directions, even if you’re all working hard.
2. Find the Root Cause
Once the team had a clear definition, the next step should have been to ask why this was happening. But instead, they just accepted the situation: “Well, we’re busy, we inherited this mess, what can you do?”.
If they had pushed further, maybe with a few rounds of “why?”, they would have seen something deeper.
QA couldn’t cover all scenarios because there were no automated tests. There were no tests because devs had no time. There was no time because stakeholders kept pushing for features. And stakeholders pushed because they didn’t fully understand the cost of all these bugs.
So the root problem wasn’t just missing tests: it was a system of misaligned priorities. Test debt wasn’t technical, it was cultural.
That’s a very different challenge. And you can’t solve it by just tossing in a Cypress test here and there.
3. Separate Problem and Solution
Instead of clarifying the problem and then creating space to explore multiple paths, the team jumped straight to “we don’t have time for tests”, which is already a decision disguised as a fact.
They never paused to separate the what from the how.
They skipped the messy, valuable part where you sit with the problem and explore options. They didn’t write anything down. They didn’t involve the right people. They didn’t document assumptions or tradeoffs. They just moved on.
What they should have done was treat problem definition as its own outcome.
Write it down. Get team alignment. Share it with stakeholders. Ask: “Do we all agree that this is the thing we’re trying to solve?”
Only then should they have started thinking about solutions. But we’ll get to that in the next steps.
One actionable trick you can use do to this effectively, is to split in two meetings the problem definition and the solution. The outcome of the first meeting is a clear, complete statement of the problem definition.
4. Explore Solutions
This is where the team usually jumps in too fast: someone says, “Let’s pause all feature work and build a full test suite,” or “Let’s just keep things as they are and hope for the best.”
That’s binary thinking. You feel like you have to choose between two extremes. But good decisions often come from exploring the space in between.
A better move is to push yourself to come up with at least three possible options, even ones that seem weird or imperfect. This helps the team think more creatively and avoid tunnel vision.
You can also use mental tools like:
Second-order thinking: “And then what?” What happens after the first result?
Premeditatio malorum: What could go wrong with each option?
Both/And thinking: Can we combine ideas instead of choosing just one?
In our case, the team might have explored options like:
Pause new features for 2 months to build automated tests
Start writing tests only for new code, and slowly improve coverage over time
Introduce lightweight contract tests on key flows to support QA, while keeping up with delivery
(bonus) Hire a testing specialist or ask for temporary QA support to help bootstrap automation
Then they could ask, “Can we mix these?”
Maybe they combine #2 and #3: test only new code and build coverage on high-risk flows.
Suddenly, it’s not a painful yes-or-no decision anymore. It’s a design space.
5. Consider Opportunity Costs
Every decision comes with a hidden cost: what you’re not doing because you chose something else. This is called opportunity cost, and teams often ignore it.
It’s easy to see what you get. It’s harder to see what you give up, especially invisible resources like time, focus, team motivation, or the trust of stakeholders.
You can use a few helpful questions:
Compared to what? What’s the real alternative?
Then what happens? What are the consequences in 3, 6, 12 months?
What am I sacrificing? What does this cost us in energy, morale, or speed?
In this story, pausing all work for testing sounds great… but it would delay customer features, cause tension with the business team, and risk team burnout.
On the other hand, ignoring tests means bugs keep happening, devs keep fixing them, and velocity stays low anyway, just hidden under fire-fighting.
By asking these questions, the team could see that the real cost of not investing in testing is actually higher than the short-term cost of slowing down.
It also shows that small, steady changes (like writing tests for new code) might offer better long-term value with less disruption.
6. Define Evaluation Criteria
Now that the team has some real options on the table, it’s time to agree on how you’ll judge them.
Without criteria, you end up debating personal opinions like “I think this is better” vs “I don’t like that.” It becomes a debate instead of a decision.
So the team should define 2–4 clear, shared criteria, like:
Impact on release stability
Speed of delivery
Effort required from the team
Long-term maintainability
Then: prioritize them. Not all criteria are equally important.
In this case, the team might agree that the top priority is:
“Reduce production bugs without blocking feature delivery.”
With that clarity, they can now look at each option and ask:
“How well does this meet our most important goal?”
This step turns a messy debate into a shared evaluation process and helps everyone feel more confident (and aligned) in the final choice.
7. Gather Information
By this point, the team has a clear problem, a list of possible solutions, and a way to compare them. But they’re not ready to decide yet, not without real information.
Too often, decisions are based on memory, assumptions, or whoever speaks the loudest in the meeting. That’s risky.
A better approach is to look for two things:
High-quality data: What’s really happening? Look at bug reports, time spent fixing regressions, CI duration, and QA effort. Try to get numbers that are close to the source.
Expert advice: Talk to teams who’ve solved this kind of problem before. Ask: what worked? What failed? What surprised you?
In this case, the team might dig into metrics like how many bugs are reported each sprint, how long it takes to fix them, and how often things break in specific areas of the app.
They might also talk to another team in the company that recently adopted test automation, or even reach out to the broader dev community.
And don’t forget to check for biases and incentives. If someone is strongly pushing a full rewrite or a fancy new test framework, ask why. Do they have past experience or are they just excited about using a new tool?
8. Make the Decision
Now it’s time to decide, but not just based on gut feeling. There are two things to consider here:
Consequentiality: How big are the long-term effects of this choice?
Reversibility: How hard is it to undo if it goes wrong?
These two questions help you pick a decision-making strategy:
If the decision is low risk and easy to reverse, go fast. That’s ASAP: act quickly, learn quickly.
If the decision is high risk and hard to undo, go slow. That’s ALAP: wait until you have enough data.
Or maybe you hit a moment where it’s clear what to do, that’s STOP (you’ve stopped learning new info), FLOP (the window is closing), or KNOW (you finally have clarity).
In this case, a full testing rewrite is a high-consequence, hard-to-reverse move, so ALAP makes sense. But introducing tests for new code while still shipping features is a lower-risk, reversible change can be done ASAP.
The smart move is probably a blend: act fast on the safe stuff, and take your time on the risky stuff.
9. Set Up Fail-Safes
Even a good decision can go wrong. That’s why smart teams plan for failure: not because they expect it, but because they want to catch it early if it happens.
There are three simple tools you can use:
Alarm wires: Set a clear signal that tells you when things are off track. For example: “If we don’t see a drop in regression bugs within six weeks, we review the plan.”
Delegation: Assign responsibility to someone to monitor progress, run experiments, or own part of the plan.
Self-imposed limits: Set constraints to reduce risk. For example: “We’ll only write tests for new modules, no big rewrites allowed.”
In this story, the team could say:
“We’ll start adding tests for new features, and track how many bugs come in. If nothing improves by next quarter, we revisit.”
It’s a way of saying: we trust the plan, but we’re also keeping one hand on the brake.
10. Learn from the Decision
This final step is often skipped, which is a shame, because it’s how teams get better over time.
Whether the decision goes well or not, you should:
Write down what happened: What problem were we solving? What options did we consider? What did we choose and why?
Make it visible: Share it in the team space so it doesn’t live in someone’s head or get lost in chat history.
Review it later: Come back in 1–3 months. What worked? What didn’t? What surprised us?
Even a “failed” decision can be a good one if you made it with care, and if it helped the team learn and improve.
This isn’t about blaming or proving someone wrong, it’s about building better decision-making muscles for the next time.
After going through the full framework, the team made a decision they all felt confident in: not perfect, but realistic and sustainable. They decided to start writing automated tests for all new features going forward, so they wouldn’t keep adding to the mess. At the same time, they agreed to invest some focused time each sprint to cover the most critical flows with tests, bit by bit. And finally, they set a new rule: every bug fix must include a test that would’ve caught the issue, turning each bug into an opportunity to strengthen the system. It wasn’t a full rewrite, and it wouldn’t fix things overnight, but it was a clear, practical move in the right direction.
TL;DR (but not really)
Sorry folks, decision making isn’t one of those “3 tricks to change your life” kind of topics. It’s messy, uncomfortable, and takes time to get good at, like debugging someone else’s code at 6pm on a Friday.
So here’s my advice: grab a coffee, block 15 quiet minutes, and read the whole thing. Don’t just scroll but actually think about the decisions you have made (or avoided), and how these tools could’ve helped.
Credits: Illustration 1