MessiandNeymar

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, March 8, 2012

Are some algorithms simply too hard to implement correctly?

Posted on 6:48 PM by Unknown

I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem.

The paper describes a sophisticated enhancement to the 4.4BSD FFS (Fast FileSystem) algorithms that not only improves performance dramatically, but also eliminates the need to run fsck after every system crash.

The core of the algorithm is to carefully track dependencies between different update requests that the filesystem makes:

With soft updates, the filesystem uses delayed writes (i.e., write-back caching) for metadata changes, tracks dependencies between updates, and enforces these dependencies at write-back time. Because most metadata blocks contain many pointers, cyclic dependencies occur frequently when dependencies are recorded only at the block level. Therefore, soft updates tracks dependencies on a per-pointer basis, which allows blocks to be written in any order. Any still-dependent updates in a metadata block are rolled-back before the block is written and rolled-forward afterwards. Thus, dependency cycles are eliminated as an issue. With soft updates, applications always see the most current copies of metadata blocks and the disk always sees copies that are consistent with its other contents.

It's one of those algorithms that's fairly easy to state but devilishly intricate to implement properly, and the McKusick and Ganger paper is all about implementation.

The core concept of dependency tracking and write ordering is very sound and reliable, and the resulting filesystem has seen enormous practical success. At my day job, we use a similar, though vastly simpler, dependency tracking and write ordering algorithm in the server database, and it, too, has been reliable and has seen enormous practical success.

However, as Val Aurora points out, it's an interesting observation that the basic soft updates algorithm is simultaneously well-known and infrequently used:

If a file system discussion goes on long enough, someone will bring up soft updates eventually, usually in the form of, "Duhhh, why are you Linux people so stupid? Just use soft updates, like BSD!" Generally, there will be no direct reply to this comment and the conversation will flow silently around it, like a stream around an inky black boulder. Why is this? Is it pure NIH (Not Invented Here) on the part of Linux developers (and Solaris and OS X and AIX and...) or is there something deeper going on? Why are soft updates so famous and yet so seldom implemented?

As Aurora notes, the details involved in the implementation turn out to be quite intricate:

I've read this paper at least 15 times, and each time I when get to page 7, I'm feeling pretty good and thinking, "Yeah, okay, I must be smarter now than the last time I read this because I'm getting it this time," - and then I turn to page 8 and my head explodes.

I understand the exploding head issue; the McKusick and Granger paper is dense and detailed. And yet, it seems to me to be almost perfectly written:

  • The concepts are explained clearly, accurately, and thoroughly.
  • The implementation complexities are covered
  • The paper describes the data structures as well as their use
It's really hard to imagine the Soft Updates algorithm being any more clearly described or documented; the paper is nearly ideal. Yet, as Aurora concludes:
when it come to implementation, only programmers with deep, encyclopedic knowledge of the file system's on-disk format can derive the correct ordering rules and construct the associated data structures and web of dependencies. The close coupling of on-disk data structures and their updates and the user-visible data structures and their updates results in weird, counter-intuitive behavior that must be covered up with additional code.

Overall, soft updates is a sophisticated, insightful, clever idea - and an evolutionary dead end. Journaling and copy-on-write systems are easier to implement, require less special-purpose code, and demand far less of the programmer writing to the interface.

How has that assessment fared over the three years since Aurora wrote her article? According to this article from last winter, Linux filesystem engineers remain convinced that Soft Updates are too complex to implement, and other techniques, such as the Journaling or copy-on-write approaches mentioned by Aurora, are still the preferred techniques. As Ted Ts'o observes

The *BSD's didn't get advanced features such as Extended Attribute until some 2 or 3 years after Linux. My theory why is that it required someone as smart as Kirk McKusick to be able to modify UFS with Soft Updates to add support for Extended Attributes and ACL's.
I am comfortable conceding that, if Ted Ts'o feels that a particular filesystem algorithm is too complex, it is too complex.

So it seems that the time for Soft Updates has come and gone. If nothing else, it is extremely informative to study this work, ant it is also interesting to reflect on the rise and fall of the Soft Updates algorithm.

What are other good examples of algorithms that arose, came to be considered to be state of the art, then faded as newer algorithms came along? It's clear that Cryptography is full of such algorithms, but Soft Updates are interesting in that they're from an area that is perhaps less well-known.

Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Shelter
    I meant to post this as part of my article on Watership Down , but then totally forgot: Shelter In Shelter you experience the wild as a moth...
  • The Legend of 1900: a very short review
    Fifteen years late, we stumbled across The Legend of 1900 . I suspect that 1900 is the sort of movie that many people despise, and a few peo...
  • Rediscovering Watership Down
    As a child, I was a precocious and voracious reader. In my early teens, ravenous and impatient, I raced through Richard Adams's Watershi...
  • Must be a heck of a rainstorm in Donetsk
    During today's Euro 2012 match between Ukraine and France, the game was suspended due to weather conditions, which is a quite rare occur...
  • Beethoven and Jonathan Biss
    I'm really enjoying the latest Coursera class that I'm taking: Exploring Beethoven’s Piano Sonatas . This course takes an inside-out...
  • Starting today, the games count
    In honor of the occasion: The Autumn Wind is a pirate, Blustering in from sea, With a rollocking song, he sweeps along, Swaggering boisterou...
  • Parbuckling
    The enormous project to right and remove the remains of the Costa Concordia is now well underway. There's some nice reporting on the NP...
  • For your weekend reading
    I don't want you to be bored this weekend, so I thought I'd pass along some articles you might find interesting. If not, hopefully y...
  • Are some algorithms simply too hard to implement correctly?
    I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes ...
  • Don't see me!
    When she was young, and she had done something she was embarrassed by or felt guilty about, my daughter would sometimes hold up her hand to ...

Blog Archive

  • ►  2013 (165)
    • ►  September (14)
    • ►  August (19)
    • ►  July (16)
    • ►  June (17)
    • ►  May (17)
    • ►  April (18)
    • ►  March (24)
    • ►  February (19)
    • ►  January (21)
  • ▼  2012 (335)
    • ►  December (23)
    • ►  November (30)
    • ►  October (33)
    • ►  September (34)
    • ►  August (29)
    • ►  July (39)
    • ►  June (27)
    • ►  May (48)
    • ►  April (32)
    • ▼  March (30)
      • 30 days has September, April, June, and November ...
      • Online reviews
      • I don't know what the game will be like ...
      • Every day you get in your life is precious
      • Last year, it took a full hour!
      • Bill Slawski's Top Ten SEO Patents list
      • Cliff Click's deep dive into Constant Propagation ...
      • A great example of the power of generators
      • I have no idea what it will be like to play ...
      • I'm really enjoying Dan Boneh's online cryptograph...
      • BATS IPO Algorithmic Snafu?
      • The complicated task of trying NOT to keep up with...
      • It's not just a game...
      • Internet scale, administrations, and operations
      • Followup, followup, followup
      • Perfectly named for your career
      • rands on hacking
      • Mapping the past
      • Mark your calendars!
      • The Google I/O 2012 website is live
      • Software in the courts
      • Great gaming post
      • Link dumping, March 2012 edition
      • Are some algorithms simply too hard to implement c...
      • The Stanford online cryptography class is up and r...
      • A fine use of the web
      • RSA 2012 has come and gone
      • How will the field of software analysis emerge?
      • Satan's Trifecta
      • I love style guides
    • ►  February (10)
Powered by Blogger.

About Me

Unknown
View my complete profile