MessiandNeymar

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Tuesday, February 28, 2012

James Hamilton studies some failures

Posted on 2:32 PM by Unknown

Regular readers of my blog will know that I'm an enthusiastic proponent of the study of failure. When something unexpectedly goes wrong, there is always something to learn.

Thankfully, James Hamilton is a big supporter of that point of view as well, and happens to have written several wonderful essays over the past few weeks on the topic.

Firstly, Hamilton wrote about the Costa Concordia grounding, and then followed that up with a second essay responding to some of the feedback he got. This is obviously still an active investigation and we are continuing to learn a lot from it. Hamilton's essay has some wonderful visual charts illustrating the accident, and speculating on some of what was occurring, together with a massive amount of supporting information discussing what is currently known.

My favorite part of Hamilton's essay, though, is his conclusion:

What I take away from the data points presented here is that experience, ironically, can be our biggest enemy. As we get increasingly proficient at a task, we often stop paying as much attention. And, with less dedicated focus on a task, over time, we run the risk of a crucial mistake that we probably wouldn’t have made when we were effectively less experienced and perhaps less skilled. There is danger in becoming comfortable.
Very true, and very important words. Not to reduce it to the overly-mundane, but I recently got a traffic ticket for rolling through a stop sign and I had my opportunity for that once-a-decade visit to Traffic School. Although the fines and wasted time were an annoyance, it was clear by the end of Traffic School that in fact my 35 years of driving experience have become somewhat of an enemy; there were many specific details about how to drive safely and legally that I was no longer paying attention to, which the course materials recalled to the front of my mind.

There is, indeed, danger in becoming comfortable.

Secondly, Hamilton wrote about another fascinating incident: the loss of the Russian space mission Phobos-Grunt.

As Hamilton notes, there is a very interesting report on this incident in the IEEE Spectrum magazine: Did Bad Memory Chips Down Russia’s Mars Probe?.

But, as Hamilton observes, although the analysis of memory chips and radiation effects and system faults is fascinating and valuable, there is a further, deeper sort of failure:

Upon double failure of the flight control systems, the spacecraft autonomously goes into “safe mode” where the vehicle attempts to stay stable in low-earth orbit and orients its solar cells towards the sun so that it continues to have sufficient power.

...

Unfortunately there was still one more failure, this one a design fault. When the spacecraft goes into safe mode, it is incapable of communicating with earth stations, probably due to spacecraft orientation. Essentially if the system needs to go into safe mode while it is still in earth orbit, the mission is lost because ground control will never be able to command it out of safe mode.

...

Systems sufficiently complex enough to require deep vertical technical specialization risk complexity blindness. Each vertical team knows their component well but nobody understands the interactions of all the components.

Kudos to Hamilton for the well-researched and thoughtful observations, and for providing all the great pointers for those of us who, like him, love studying failures and their causes.

What failure will we be studying next? Well, it sure looks like there's a lot to learn from this one: The Air Force Still Doesn’t Know What’s Choking Its Stealth Fighter Pilots.

America’s newest stealth fighters have a major problem: their pilots can’t breathe, due to some sort of malfunction in the planes’ oxygen-generation systems. For months, the Air Force has been studying the problem, which temporarily grounded the entire fleet of F-22 Raptors and may have contributed to a pilot’s death. Today, the Air Force admitted they still don’t know exactly what’s causing the issue.
It looks like this question has been under study for several years, and may still take some time to resolve. The Wired article has a number of pointers to previous articles about the problem. I'll keep an eye on this one, eager to learn from the detailed analysis of the failures.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Shelter
    I meant to post this as part of my article on Watership Down , but then totally forgot: Shelter In Shelter you experience the wild as a moth...
  • The Legend of 1900: a very short review
    Fifteen years late, we stumbled across The Legend of 1900 . I suspect that 1900 is the sort of movie that many people despise, and a few peo...
  • Rediscovering Watership Down
    As a child, I was a precocious and voracious reader. In my early teens, ravenous and impatient, I raced through Richard Adams's Watershi...
  • Must be a heck of a rainstorm in Donetsk
    During today's Euro 2012 match between Ukraine and France, the game was suspended due to weather conditions, which is a quite rare occur...
  • Beethoven and Jonathan Biss
    I'm really enjoying the latest Coursera class that I'm taking: Exploring Beethoven’s Piano Sonatas . This course takes an inside-out...
  • Starting today, the games count
    In honor of the occasion: The Autumn Wind is a pirate, Blustering in from sea, With a rollocking song, he sweeps along, Swaggering boisterou...
  • Parbuckling
    The enormous project to right and remove the remains of the Costa Concordia is now well underway. There's some nice reporting on the NP...
  • For your weekend reading
    I don't want you to be bored this weekend, so I thought I'd pass along some articles you might find interesting. If not, hopefully y...
  • Are some algorithms simply too hard to implement correctly?
    I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes ...
  • Don't see me!
    When she was young, and she had done something she was embarrassed by or felt guilty about, my daughter would sometimes hold up her hand to ...

Blog Archive

  • ►  2013 (165)
    • ►  September (14)
    • ►  August (19)
    • ►  July (16)
    • ►  June (17)
    • ►  May (17)
    • ►  April (18)
    • ►  March (24)
    • ►  February (19)
    • ►  January (21)
  • ▼  2012 (335)
    • ►  December (23)
    • ►  November (30)
    • ►  October (33)
    • ►  September (34)
    • ►  August (29)
    • ►  July (39)
    • ►  June (27)
    • ►  May (48)
    • ►  April (32)
    • ►  March (30)
    • ▼  February (10)
      • James Hamilton studies some failures
      • Online cryptography class delayed again
      • Download the Universe
      • How do we peer-review code?
      • Time for the next generation
      • ABC follows up on the NYT Foxconn story
      • Pricing strategies and bots
      • Code Reviews
      • Crowdsourcing the forecast
      • Yeah, I know how that feels
Powered by Blogger.

About Me

Unknown
View my complete profile