February 2012 ~ MessiandNeymar

Tuesday, February 28, 2012

James Hamilton studies some failures

Posted on 2:32 PM by Unknown

Regular readers of my blog will know that I'm an enthusiastic proponent of the study of failure. When something unexpectedly goes wrong, there is always something to learn.

Thankfully, James Hamilton is a big supporter of that point of view as well, and happens to have written several wonderful essays over the past few weeks on the topic.

Firstly, Hamilton wrote about the Costa Concordia grounding, and then followed that up with a second essay responding to some of the feedback he got. This is obviously still an active investigation and we are continuing to learn a lot from it. Hamilton's essay has some wonderful visual charts illustrating the accident, and speculating on some of what was occurring, together with a massive amount of supporting information discussing what is currently known.

My favorite part of Hamilton's essay, though, is his conclusion:

What I take away from the data points presented here is that experience, ironically, can be our biggest enemy. As we get increasingly proficient at a task, we often stop paying as much attention. And, with less dedicated focus on a task, over time, we run the risk of a crucial mistake that we probably wouldn’t have made when we were effectively less experienced and perhaps less skilled. There is danger in becoming comfortable.

Very true, and very important words. Not to reduce it to the overly-mundane, but I recently got a traffic ticket for rolling through a stop sign and I had my opportunity for that once-a-decade visit to Traffic School. Although the fines and wasted time were an annoyance, it was clear by the end of Traffic School that in fact my 35 years of driving experience have become somewhat of an enemy; there were many specific details about how to drive safely and legally that I was no longer paying attention to, which the course materials recalled to the front of my mind.

There is, indeed, danger in becoming comfortable.

Secondly, Hamilton wrote about another fascinating incident: the loss of the Russian space mission Phobos-Grunt.

As Hamilton notes, there is a very interesting report on this incident in the IEEE Spectrum magazine: Did Bad Memory Chips Down Russia’s Mars Probe?.

But, as Hamilton observes, although the analysis of memory chips and radiation effects and system faults is fascinating and valuable, there is a further, deeper sort of failure:

Upon double failure of the flight control systems, the spacecraft autonomously goes into “safe mode” where the vehicle attempts to stay stable in low-earth orbit and orients its solar cells towards the sun so that it continues to have sufficient power.
...
Unfortunately there was still one more failure, this one a design fault. When the spacecraft goes into safe mode, it is incapable of communicating with earth stations, probably due to spacecraft orientation. Essentially if the system needs to go into safe mode while it is still in earth orbit, the mission is lost because ground control will never be able to command it out of safe mode.
...
Systems sufficiently complex enough to require deep vertical technical specialization risk complexity blindness. Each vertical team knows their component well but nobody understands the interactions of all the components.

Kudos to Hamilton for the well-researched and thoughtful observations, and for providing all the great pointers for those of us who, like him, love studying failures and their causes.

What failure will we be studying next? Well, it sure looks like there's a lot to learn from this one: The Air Force Still Doesn’t Know What’s Choking Its Stealth Fighter Pilots.

America’s newest stealth fighters have a major problem: their pilots can’t breathe, due to some sort of malfunction in the planes’ oxygen-generation systems. For months, the Air Force has been studying the problem, which temporarily grounded the entire fleet of F-22 Raptors and may have contributed to a pilot’s death. Today, the Air Force admitted they still don’t know exactly what’s causing the issue.

It looks like this question has been under study for several years, and may still take some time to resolve. The Wired article has a number of pointers to previous articles about the problem. I'll keep an eye on this one, eager to learn from the detailed analysis of the failures.

Posted in | No comments

Online cryptography class delayed again

Posted on 7:11 AM by Unknown

Prof. Boneh's online cryptography class has been delayed again. The announcement says "We now expect that the course will start either late in February or early in March," further explaining that "There have naturally been legal and administrative issues to be sorted out in offering Stanford classes freely to the outside world, and it's just been taking time. "

Still keeping my fingers crossed...

Posted in | No comments

Monday, February 27, 2012

Download the Universe

Posted on 1:25 PM by Unknown

What a wonderful idea!

Posted in | No comments

How do we peer-review code?

Posted on 8:12 AM by Unknown

There's a fascinating article in the current issue of Nature magazine online: The case for open computer programs, by Darrel C. Ince, Leslie Hatton & John Graham-Cumming.

The article deals with the problem of successfully and adequately peer-reviewing scientific research in this age of experiments which are supported by extensive computation.

However, there is the difficulty of reproducibility, by which we mean the reproduction of a scientific paper’s central finding, rather than exact replication of each specific numerical result down to several decimal places.

There are some philosophy-of-science issues that are debated in the article, but in addition one of the core questions is this: when attempting to reproduce the results of another's experiment, the reviewers may need to reproduce the computational aspects as well as the data-collection aspects. Is the reproduction of the computational aspects of the experiment best performed by:

taking the original experiment's literal program source code, possibly code-reviewing it, and then re-building and re-running it on the new data set, or
taking a verbal specification of the original experiment's computations, possibly design-reviewing that specification, and then re-implementing and re-running it on the new data set?

Hidden within the discussion is the challenge that, in order for the first approach to be possible, the original experiment must disclose and share its source code, which is currently not a common practice. The authors catalog a variety of current positions on the question, noting specifically that “Nature does not require authors to make code available, but we do expect a description detailed enough to allow others to write their own code to do similar analysis.”

The authors find pros and cons to both approaches. Regarding the question of trying to reproduce a computation from a verbal specification, they observe that:

Ambiguity in program descriptions leads to the possibility, if not the certainty, that a given natural language description can be converted into computer code in various ways, each of which may lead to different numerical outcomes. Innumerable potential issues exist, but might include mistaken order of operations, reference to different model versions, or unclear calculations of uncertainties. The problem of ambiguity has haunted software development from its earliest days.

which is certainly true. It is very, very hard to reproduce a computation given only a verbal description of it.

Meanwhile, they observe that computer programming is also very hard, and there may be errors in the original experiment's source code, which could be detected by code review:

First, there are programming errors. Over the years, researchers have quantified the occurrence rate of such defects to be approximately one to ten errors per thousand lines of source code.
Second, there are errors associated with the numerical properties of scientific software. The execution of a program that manipulates the floating point numbers used by scientists is dependent on many factors outside the consideration of a program as a mathematical object.
...
Third, there are well-known ambiguities in some of the internationally standardized versions of commonly used programming languages in scientific computation.

which is also certainly true.

The authors conclude that high-quality science would be best-served by encouraging, even requiring, published experimental science to disclose and share the code that the experimenters use for the computational aspects of their finding.

Seems like a pretty compelling argument to me.

One worry I have, which doesn't seem to be explicitly discussed in the article, is that programming is hard, so if experimenters routinely disclose their source code, then others who are attempting to reproduce those results might generally just take the existing source code and re-use it, without thoroughly studying it. Then, a worse outcome might arise: an undetected bug in the original program would propagate into the second reproduction, and might gain further validity. Whereas, if the second team had re-written the source from first principles, this independent approach might very well have not contained the same bug, and the likelihood of finding the problem might be greater.

Anyway, it's a great discussion and I'm glad to see it going on!

Posted in | No comments

Sunday, February 26, 2012

Time for the next generation

Posted on 9:33 AM by Unknown

My daughter, who is studying computer programming using the Processing language, happened to be home (briefly) over the weekend, and one of her requests was about how she could start learning about Linux.

So off we went to the Ubuntu site, where a quick click on "Run it alongside Windows" took us to the Wubi installer.

Forty five minutes later, she was up and running Linux, poking about, asking questions, and generally online.

Make way, world, here comes the next generation!

Posted in | No comments

Thursday, February 23, 2012

ABC follows up on the NYT Foxconn story

Posted on 4:53 PM by Unknown

Remember that New York Times article on the Foxconn factories that I blogged about last month?

Well, the ABC Nightline staff have followed up on that story, and David Pogue of the New York Times covers the ABC Nightline findings in a follow-up article on the New York Times website.

Sounds like there's still a lot to learn; I'm pleased that the media organizations are really devoting some resources to trying to do some serious journalism here.

UPDATE: Mike Daisey, the reporter who broke the original Foxconn story in the NYT, follows up some more, on his personal blog.

Posted in | No comments

Pricing strategies and bots

Posted on 2:53 PM by Unknown

From an intriguing post by Carlos Bueno: How Bots Seized Control of My Pricing Strategy:

we have a delightful futuristic absurdity: a computer program, pretending to be human, hawking a book about computers pretending to be human, while other computer programs pretend to have used copies of it. A book that was never actually written, much less printed and read.

The mind reels.

If this happens to be the first time that you've thought about pricing bots, drop everything else you're doing and go read Michael Eisen's great essay: Amazon’s $23,698,655.93 book about flies.

Posted in | No comments

Code Reviews

Posted on 7:17 AM by Unknown

Matt Welsh writes a great essay about all the wonderful aspects of code reviews.

I thoroughly agree. Code reviews are just about the best tool available to teams trying to improve their software.

Matt's essay discusses the many benefits of code reviews (yes, there are lots of benefits, for the reviewers, the reviewee, and for the overall organization), and suggests a number of useful tools and techniques for accomplishing them effectively.

What Matt doesn't discuss, unfortunately, is why it's so hard to get good code review practices established in an organization.

I've been in a lot of software development situations, with a lot of phenomenally great programmers. Sometimes there is a healthy code review culture, sometimes there isn't. And I still, after 30 years, don't understand why that is.

Invariably, executives will sing the praises of code reviews, teams will experience their benefits, but in practice it is very hard to consistently and thoroughly employ them.

Kudos to Google for making it happen, somehow; Welsh's article doesn't really say why Google has succeeded at this when so many other organizations fail.

Of course, there are many things at which Google succeeds while other organizations fail, so this result is neither surprising nor particularly illuminating in that respect.

Still, I suspect I'll wonder til the day I retire why it is that organizations will seem to grasp at so many other aspects of the software development process (team structures, agile methods, scrums and Kanban walls and burndown charts, IDEs, CI systems, project management automation, etc.) yet don't perform the basic and incredibly powerful technique of universally employing code review.

Does your organization have a code review culture? If so, how did it come about and how is it sustained?

Posted in | No comments

Wednesday, February 22, 2012

Crowdsourcing the forecast

Posted on 3:30 PM by Unknown

Here's a great story by Farhad Manjoo on Slate about how Weather Underground are rolling out their new locally-aware weather forecasting system.

Weather Underground’s system takes most of this NWS data into account, and then it adds even more. In particular, the site has assembled a huge network of constantly updating automated weather stations. These stations are owned and maintained by weather enthusiasts—people who love to track precipitation in their own backyards. They agree to share their data with Weather Underground because the site offers free archiving; you can see what your station was reporting months or years ago, easily, from anywhere.

MessiandNeymar

Tuesday, February 28, 2012

James Hamilton studies some failures

Online cryptography class delayed again

Monday, February 27, 2012

Download the Universe

How do we peer-review code?

Sunday, February 26, 2012

Time for the next generation

Thursday, February 23, 2012

ABC follows up on the NYT Foxconn story

Pricing strategies and bots

Code Reviews

Wednesday, February 22, 2012

Crowdsourcing the forecast

Yeah, I know how that feels

Popular Posts

Blog Archive

About Me