When Systems Fail: Fixing Processes, Not Just Bugs At Amazon, we're deeply committed to moving fast. But true velocity isn't just about coding quickly—it's about building systems that allow us to move swiftly without compromising quality. This requires learning from every mistake and fixing root causes, not just symptoms. Here's a recent example from my work: Our automated scanners flagged several potential issues across our codebase. An engineer quickly fixed a few instances, and the immediate alerts were resolved. Success, right? Not quite. Looking deeper, I noticed these issues were in code that hadn't been used for some time. Our scanners were being "too thorough"—flagging issues in code that was essentially dead weight. Instead of celebrating the quick fix, I chose to address the underlying process problem. I wrote simple tooling to conduct a comprehensive scan of our codebase. With hundreds of engineers working for many years, we had accumulated a significant amount of unused code. This unused code didn't directly affect customers, but it created several hidden costs: - Engineers spent time investigating non-issues - Attention was diverted from real customer problems - Cognitive load increased when maintaining the system - New team members struggled to understand what was actually important With a concerted effort across the organization, we identified and safely deprecated substantial portions of unused code. This not only eliminated false positives from our scanners but also simplified our systems and allowed engineers to focus on what truly matters. The lessons here extend beyond this specific example: 1. When fixing bugs, always ask: "Is this a symptom of a broader process issue?" 2. Look for patterns across individual incidents 3. Build tools to systematically identify and address root causes 4. Consider the hidden costs of complexity in your systems 5. Connect every process improvement back to customer impact Moving fast in the long term requires pausing occasionally to fix your foundations. Every bug is an opportunity not just to patch a hole, but to strengthen your entire system. What process improvements have you made after encountering recurring bugs? How do you balance quick fixes with systematic improvements? *I'm sharing practical lessons from my experience as a Senior Principal Technologist at Amazon. Follow me for more posts on using GenAI in technical practice, leading technical teams, making data-driven decisions, and building better systems.*
How Engineers Drive System-Level Improvements
Explore top LinkedIn content from expert professionals.
Summary
System-level improvements refer to changes that reshape how entire processes or organizations work, rather than just fixing individual problems. Engineers drive these improvements by designing smarter workflows, clarifying goals, and connecting technical decisions to broad outcomes that impact team productivity and customer satisfaction.
- Streamline complexity: Remove unnecessary steps and outdated code so teams can focus on solving real challenges instead of getting lost in busywork.
- Clarify objectives: Build a culture where everyone understands the purpose behind their work, enabling smarter decisions and faster progress across the board.
- Empower end-to-end ownership: Encourage engineers to take responsibility for entire product areas, including customer feedback and strategy, so quality stays high and innovation thrives.
-
-
I thought systems engineers were just glorified project managers. ↳ I assumed they were unnecessary overhead. ↳ I believed they only slowed down the development process. ↳ I was convinced our team could handle everything without them. Boy, was I wrong. Let me take you back to the project that changed my mind... We were developing a cutting-edge automotive safety system. Deadlines were looming, budgets were tight, and interdepartmental conflicts were rife. It was a perfect storm of chaos. Our VP suggested bringing in a systems engineer. I rolled my eyes. "Great," I thought. "Another 'expert' to tell us how to do our jobs." But here's what actually happened: 1. The systems engineer mapped out the entire project ecosystem. 2. Cross-functional communication improved dramatically. 3. Potential risks were identified and mitigated before they became issues. 4. Integration challenges were solved proactively. The result? We delivered the project 6 weeks early and 12% under budget. But don't just take my word for it. Let's look at some hard data: - A study by the International Council on Systems Engineering found that projects with effective systems engineering are 50% more likely to meet their objectives. - The National Defense Industrial Association reported that high-performing projects using systems engineering had a 57% success rate, compared to just 15% for those with low systems engineering capability. - NASA credits systems engineering for reducing their project failure rate from 1 in 4 to less than 1 in 100. The numbers don't lie. Systems engineers are the unsung heroes of complex projects. They're the glue that holds interdisciplinary teams together, the visionaries who see the big picture, and the problem-solvers who tackle challenges before they become showstoppers. My skepticism has transformed into advocacy. Now, I wouldn't dream of starting a complex project without a systems engineer on board. Have you had a similar experience? Did a systems engineer save your project from disaster? Share your stories below. Let's start a conversation about the hidden superpowers of systems engineering in the automotive industry. #SystemsEngineering #AutomotiveInnovation #ProjectSuccess #EngineeringLeadership
-
The best systems need the least management. Yet we keep adding steps, checkpoints, and approvals. I used to believe great companies were built on comprehensive processes. My first startup had detailed procedures for everything — each sales interaction, support ticket, and feature release followed a precise playbook. As we scaled, our process documentation grew faster than our revenue. Team velocity slowed. Innovation suffered. Talented people spent more time following protocols than solving problems. The turning point came when we rebuilt our approach around outcomes instead of activities: 1️⃣ We replaced activity metrics ("number of calls made") with outcome metrics ("deals progressed") 2️⃣ We stopped documenting how tasks should be done and started defining what success looked like 3️⃣ We built automated guardrails instead of manual checkpoints 4️⃣ We focused quality control on system inputs and outputs, not every step in between The results were transformative. Teams moved faster. Quality improved. People stayed energized. Business process exists to manage risk and ensure quality—both valid concerns. But most companies implement these controls at the tactical level when they belong at the systems level. Think of it like this: You can micromanage a road trip by dictating every turn, or you can set a destination, provide a reliable vehicle with good brakes, and trust the driver to navigate. The difference is critical. Tactical processes control behaviors while systems-level thinking shapes environments. Some practical shifts to consider: 1️⃣ Replace decision chains with clear boundaries and after-action reviews 2️⃣ Substitute detailed instructions with clear success criteria 3️⃣ Trade activity monitoring for outcome measurement 4️⃣ Swap manual checks for automated testing 5️⃣ Replace rigid workflows with principles and guardrails Design systems that make quality inevitable, not processes that make errors impossible. Operational excellence is fundamentally about outcome clarity, not process quantity. #startups #founders #growth #ai
-
How I made my engineering team 10x more productive, without hiring a single person. I just started beating the drum. Not of velocity. Not of deadlines. But of clarity. Every day, I told the story of what mattered: Who the customer was. What problem we were solving. Why it mattered now. I repeated it in standups, roadmap reviews, code reviews, and 1:1s. I made the vision visible until the team could repeat it without me. And then something changed. Engineers stopped waiting for perfect specs. They started asking better questions. They scoped more intentionally. They stopped building “just in case” solutions and started delivering exactly what was needed. We didn’t change the process. We didn’t add new tools. We just made clarity the norm. The result? Fewer delays. Smarter trade-offs. Less rework. Faster progress. And a team that wasn’t just moving faster, but building what mattered. You don’t need more people to scale. You need more clarity. Especially for engineers. Because when the goal is fuzzy, even the best teams slow down. But when clarity is built into the culture, the whole system speeds up. How do you make clarity unavoidable inside your team?
-
Rethinking Requirements in Hardware Engineering Requirements management isn’t just about checklists—it’s the difference between effective collaboration and costly missteps. Here are once-unconventional approaches to requirements now embraced by top teams 1. From “Requirements” to “Design Criteria” Early systems engineers were part engineer, part lawyer. Someone had to create “techno-legal documents” to manage external contracts. These evolved into requirements. Many cultural issues stem from using requirements incorrectly–as a weapon rather than tool for collaboration. Not all requirements need to be treated as commandments. Reframing lower-level requirements as design criteria reduces resistance among engineers, empowering them to see requirements as flexible guidelines open to questioning and adjustment. This is what you want to inspire. 2. Culture of Ownership and Accountability Drives Agility A strong requirements culture is built when engineers “own” their work. Engineers must take responsibility for the requirements they design against, creating a culture of ownership, responsibility, and systems-mindedness. Assigning a clear, single-point owner for each requirement, even across domains, encourages each engineer to think critically about their area’s requirements, establishing ownership and trust in the process. Encouraging information flow between teams helps engineers see how their work impacts others, leads to reduced and stronger system integration. Requirements should be viewed as evolving assets, not static documents. You want engineers to push back on requirements and eliminate unnecessary systems rather than add more requirements, complexity, or systems. 3. Requirements as Conversations, Not Just Checklists Requirements aren’t just specs or checklists—they’re starting points for cross-functional discussions. Every problem is a systems problem, and to solve complex challenges, engineers must be systems thinkers first and domain experts second. In traditional settings, requirements stay isolated in documents. But when teams understand why requirements exist, where they come from, and who owns them—and engage in continuous dialogue—they blur the lines between domains and foster a systems-oriented mindset. This collaborative environment accelerates problem-solving, enabling engineers to align quickly and tackle challenges together. Instead of siloed requirements for each subsystem, drawing dotted lines and encouraging information flow between teams helps engineers understand how their work affects others. This cross-functional awareness leads to fewer misalignments and stronger system integration. When you see engineers make sacrifices in their own area to benefit the overall system, you know you are on the right track. There you have it. The full guide goes into specifics on how to start implementing these ideas in tools.
-
Our engineering team works directly with customers in dedicated Slack channels. No middlemen 🙅♂️ When customers and developers can connect directly, it creates a cycle of immediate improvement. Our engineers witness problems firsthand, fix them faster, and build features based on actual usage patterns rather than guesswork about what customers might want. Once, a customer struggled with an EIP-712 transaction. The error wasn't obvious from our logs, but within minutes, our engineer identified the issue and helped them resolve it. We also spotted a recurring bug and built a minor feature update based on our findings. Not every company can or should expose its engineering team this way. It requires mature developers who communicate well, set clear boundaries around response times, and provide explicit documentation of feature requests versus quick fixes. The payoff is huge, though: engineers build with real users in mind rather than abstract personas. Our transaction success rates and reliability metrics have improved since implementing this system. Direct customer-to-engineer communication shouldn't be a revolutionary concept. It should be standard for infrastructure companies that are really serious about building solutions.
-
Engineering productivity — a commonly ignored area until the signals start showing up: release velocity drops, builds slow down, and delivery speed becomes a bottleneck. Your first reaction often would be to add more engineers or push harder on KPIs, but true impact comes from removing friction, not by overlooking it. Over time, you will see real productivity gains when teams focus on improving developer flow — not just output. A few levers that consistently move the needle: 🔹 AI-assisted code reviews and documentation — cutting repetitive review cycles and improving clarity. 🔹 Intelligent test generation & auto-healing pipelines — reducing flaky tests and CI/CD bottlenecks. 🔹 Self-service environments & one-click setups — empowering developers to experiment and debug faster. 🔹 Observability baked into development — shifting feedback loops left with real-time insights. 🔹 Culture of enablement — where engineers are trained to use tools smartly. Productivity isn’t just about doing things faster — it’s about building systems, tools, and cultures that make 'fast' feel natural. Give engineers their time back , to innovate — and they’ll give you high quality and scalable products. What’s one AI-driven or tooling shift that made your engineering team genuinely faster or happier this year? #EngineeringProductivity #DevEx #DeveloperExperience #SoftwareLeadership #AIDrivenDevelopment #EngineeringCulture #TechLeadership
-
We have zero product managers at Orbital. And product quality has never been higher. So what do we do instead? Every engineer we've hired in the last three years has taken on PM responsibilities. Here's how we set up our engineers to also own product domains: 1️⃣ Assign clear product areas Each engineer owns a feature or a product domain—like enrichment or inbound, in our case. That means they’re not just responsible for building it, but for driving the strategy behind it. 2️⃣ Let them advocate for customers Last week, an engineer challenged me. "We don't have scheduling in our outbound feature. I watched customer calls, and I know it's critical." He wasn't told to do this. He saw the gap because he had deep customer context and authority. Then he brought it to me. 3️⃣ Normalize product thinking When a demo was failing yesterday, an engineer jumped in and fixed it within 60 seconds. Not because it was her job, but because she owned that piece of the product. (Shoutout to Tulasi) We expect engineers to operate like domain experts and mini-PMs. That means end-to-end ownership – competitor research, prioritizing fixes vs. features, identifying edge cases before they become bugs, and pushing for UX improvements when something feels off. 4️⃣ Create space for contention As a founder, don’t assume you have all the answers. We encourage engineers to challenge roadmap decisions if they have better info. And they often do. That pushback leads to better decisions based on data and customer insights. You don’t need a big team or polished processes to build a great product. You need engineers who think beyond the code – and care deeply about the customer experience. That’s what keeps quality high, even when you don’t have PM.
-
I’ve spent nearly a decade mentoring, guiding, and managing engineers as a Principal Engineering Manager. And whenever someone asks me: “How do I grow to the next level?” Most people expect answers like: → System design → Code quality and other factors Those are hygiene. The real shift happens in how you choose to create impact. And over the years, I’ve seen two clear types of growth paths: 1. Incremental Engineers They’re reliable. They’re consistent. They keep the system afloat when no one else is looking. They reduce tech debt quietly. They make alerts go from noisy to signal-rich. They take a flaky test case and make it production-ready. This has quiet impact, but it builds over time. You get Compound credibility with this type of working style. You trust them with anything because they’ve earned it, over 100 small wins, not one loud one. 2. Transformational Engineers These folks move like product managers with a deep tech context. They see the bottleneck, and instead of working around it, they rewire the system. They introduce a new infra layer. They sunset a decade-old tool. They change how an entire team ships software. Their strength is Conviction + velocity. They’re not afraid to take bets. They know when to ship fast, when to pause, and how to bring people along. But you don’t have to be just one. The best engineers I’ve worked with knew when to go heads-down and when to go head-first. → When to say “I’ll fix it” → And when to say “Let’s rethink it” But if you’re always fixing bugs, adding logs, and cleaning old scripts, you're growing sideways. And if you're always chasing big changes without owning the maintenance, you're risking team trust. Real engineering growth comes from mastering both mindsets. → Incremental for stability. → Transformational for velocity. And the judgment to know when to switch. That’s what takes you from respected to irreplaceable.
-
Another Wake-Up Call — This Time From Sydney. A high-voltage power line fell on a Sydney train carriage. And the entire city felt the impact. Trains stopped. Commuters were evacuated. Entire lines were suspended. But for those of us in electrical engineering, the bigger question is this: How does a critical failure like this even happen? And more importantly, why are we still surprised when it does? This wasn’t just a fluke. It was a failure across systems—design, maintenance, and coordination. And as engineers, we don’t just look at what failed. We ask, what should have been in place to prevent it? Having worked in power system protection within high-voltage transmission networks, I know how delicate the balance is between stability and chaos. One miscalculation. One delayed inspection. One point of communication failure— and the system becomes vulnerable. Here’s what this incident reinforces: • Protection systems must anticipate worst-case scenarios, not just typical faults. • Aging infrastructure must be monitored, assessed, and upgraded—before the cost becomes human. • There’s no room for siloed systems. Power and transport networks must operate with shared risk models and integrated safety protocols. • Edge conditions are the new normal. Our systems must evolve to reflect that. It’s easy to say, “Thankfully no one was hurt.” But as engineers, we’re not in the business of hope— We’re in the business of prevention. This is why we must build resilient, responsive, and intelligent systems—with protection strategies that reflect the complexity of the world we live in today. So I ask you: Are our current design standards still enough? Are our emergency plans actually executable? And are we, as professionals, pushing for continuous improvement — or just reacting when things break? Let’s not wait for the next incident to have this conversation. Let’s lead it now—because every system we design carries the weight of public trust. Hanane Oudli 🌍 #EIT #ElectricalEngineering #PowerSystems #Engineering #EngineeringLeadership