Book Review: The Art of Troubleshooting by Jason Maxham

Last time I visited my mother’s house she put me to work fixing up an older model big screen TV system. I got down to business armed with nothing more than “it doesn’t turn on” and “I’m fed up with it”.

Standing there staring at the blank screen got me to thinking about the process of troubleshooting itself. I live far away from her so I wondered, how could I teach her to fish instead of just feeding her for one day so to speak? Actually…I’m no TV mechanic either so what makes me qualified to tackle this problem? Most of my relevant knowledge has nothing to do with this setup but rather how to follow a process for fixing mechanical systems in general.

Turns out it’s not as easy communicate the troubleshooting process at the macro level as you might think. Answering specific ‘how to’ type questions is no problem nowadays with some help from Google , but it is more challenging to describe why you chose one specific course of action over the infinite number of alternatives.

Why did you inspect that thing first?  How much time should you spend inspecting before you change something & test the system? How did I come up with the right question to ask? And who else thinks about weird things like this?

Enter Jason Maxham, experienced software engineer and author of the book The Art of Troubleshooting, and blog of the same name. Using lessons learned while working at a startup company experiencing tremendous growth and from interviews with 10 expert troubleshooters, Jason set out on a mission to systematically name names and define the universally applicable general principles of troubleshooting.

I found his book well worth the read as a source of quality food for thought as well as specific techniques to improve your mental problem solving processes. At 356 pages it’s not a quick read but Jason didn’t waste words and managed to maintain a utilitarian perspective throughout.

To top it off, Jason shared his work online for free! (See links above) Clearly, I think highly of this book and am surprised that it took randomly stumbling across it via online search to find it. So I’d like to introduce you to some of the key ideas here.

The-Art-of-troubleshooting

Inventing, engineering, manufacturing, and troubleshooting, are each their own unique discipline and together they make up the development process.

Though each disincline is different, they all require tangentially related skills. Techniques learned in the field of troubleshooting (the focus of Jason’s book) can be gainfully applied in the other three fields.

This is because the underlying principles of fixing things are always the same. That’s ‘principles’ in plural because no single troubleshooting process can be both universally applicable and specific enough to be useful.

Sure a quick google search for ‘universal troubleshooting process’ yields 4.1 million fancy flow charts. But when you start asking the right questions you find that all of those charts fail to fully meet both criteria. The more universal the process, the less useful it becomes and vice versa. The only truly universal process is “find the problem…then fix it”, and that is hardly useful.

Instead, Jason offers a set of ‘universal strategies’ from which you the troubleshooter must select as the most applicable. You have to make a judgement call as to which strategy is best suited for a particular problem and that is why troubleshooting is an art. (Indeed, the importance of your subjective opinion is why there is an ‘A’ (art) in educational ‘STEAM’ programs.)

These strategies are intended to be actionable proven recipes for framing your perspective & asking the right questions. Provided in no specific order, Jason managed to fit all of them onto a single page summary that you can find at the link here, and also hanging over my desk.

The-Art-of-troubleshooting

Not that I think you need to consult the summary every time you need to troubleshoot. Rather, a thorough review once in a while can put you in the right state of mind for working through tough issues. Often, all it takes is knowledge of the existence of a technique (once hard to articulate, now clearly stated) to give you an idea to solve your problem.

I find it interesting that the same list is intended to be as useful across disciplines. According to Jason, these techniques work for any machine when you define a ‘machine’ as anything that can accomplish work & malfunction, and can be also composed of smaller machines. In that context his advice can be applied to fixing cars, debugging software, or determining the root cause of issues in an R&D lab. As mentioned above, the underlying principles of fixing things are all the same!

Since all machines are designed by humans, Jason goes on say that all machine problems are human problems. Machines are designed, utilized, and maintained by humans, so the ultimate cause of failure is of human origin.

Car engine seize from an overdue oil change? The designer could have made servicing the machine easier (for the human error of laziness), included a better notification system (for the human error of forgetfulness), or designed the vehicle so as to minimize service frequency (for both). You get the idea.

In that light, even having a great toolbox of troubleshooting strategies is not enough to achieve mastery of the process. Seemingly unrelated to the world of machines, it turns out that your ability to troubleshoot problems is not limited to your technical knowledge, but by your attitude, behaviours, and mindset as well.

Jason goes on to describe in detail what virtues an expert troubleshooter should possess.  Skepticism, Listening, Curiosity, Organized/Systematic, Creativity, Presence, and the ability to set boundaries, deliberately choose your commitments, & set expectations.  All of which are attributes that are within your ability to control.

Jason ends his story with a final call to action: To share your findings.

Even after you solve the problem, the job is not complete until you share what you have learned.  Your efforts won’t benefit your team or society at large until your results are communicated. Staying silent will result in seeing a problem repeated, and as Jason says, a good firefighter would prefer not to see a house engulfed in flames.

(This note in particular rings true to my ears. A recurring theme of this blog is things I wish I had learned sooner, that is, mistakes I may not have made had folks before me shared their findings first!)

So next time you are stuck on a tricky problem, consider the advice from Jason’s The Art of Troubleshooting. Try out a few of the proven strategies, apply yourself to possessing the virtues of a good troubleshooter, and when you finally work through the problem don’t forget to share your findings!

Oh yea, what about my mom’s broken big screen TV? Turned out to be a random cable unplugged. To her credit this system had lots of peripherals hooked up and there were dusty similar looking cables everywhere. For future reference, see strategies: “Can I reduce complexity (by turning off unneeded features)” & “is it plugged in?”

One comment

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.