MessiandNeymar

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Tuesday, August 7, 2012

MVCC theory

Posted on 9:20 AM by Unknown

If you're one of those people who is fascinated by the internals of concurrency control algorithms, here's some fun new references that you might enjoy.

  • I recently came across a great new paper from a team lead by Per-Ake Larson of Microsoft: High-Performance Concurrency Control Mechanisms for Main-Memory Databases

    These guys are really pushing the edge, trying for both the highest-concurrency AND the highest raw performance, because they're looking at fully memory-resident databases. The paper is quite clear and easy to read, and I really enjoyed the way that they dug into some of the intricacies and edge cases. Plus, it has a great set of references to chase.

    Decades of research has shown that multiversion concurrency control (MVCC) methods are more robust and perform well for a broad range of workloads. This led us to investigate how to construct MVCC mechanisms optimized for main memory settings. We designed two MVCC mechanisms: the first is optimistic and relies on validation, while the second one is pessimistic and relies on locking. The two schemes are mutually compatible in the sense that optimistic and pessimistic transactions can be mixed and access the same database concurrently. We systematically explored and evaluated these methods, providing an extensive experimental evaluation of the pros and cons of each approach.
  • Following some of the references from the above paper, I came across an older paper by Alexander Thomasian; it came out about a decade ago, but it was new to me: Concurrency Control: Methods, Performance, and Analysis.. The paper is behind the ACM paywall, but you might be able to find it online elsewhere.

    This paper spends a lot of time trying to derive analytical formulae for evaluating the performance impacts of concurrency control, which I think is rather a waste of time, in general, since the assumptions turn out to be rather ridiculous. I also hate the way the paper uses its own jargon, like saying "txn" instead of "transaction".

    But there are some nice sections, for example Section 2.6 which surveys a collection of "Methods to Improve Locking Performance".

    The concurrent processing of long read-only queries for decision support applications and short update txns introduces conflicts between the two workloads, especially when the former follows a strict 2PL paradigm.
  • A much more interesting paper is Michael Cahill's PhD thesis from 2009: Serializable Isolation for Snapshot Databases

    The point of his thesis is this: Snapshot Isolation, by itself, does not provide Serializable transactions. However, it is possible to augment Snapshot Isolation with additional locks, in a manner which provides serializability.

    Despite the attractive properties of SI, it has been known since SI was formalized in (Berenson et al., 1995) that SI permits non-serializable executions. In particular, it is possible for an SI-based concurrency control to interleave some transactions, where each transaction preserves an integrity constraint when run alone, but where the final state after the interleaved execution does not satisfy the constraint. This occurs when concurrent transactions modify different items that are related by a constraint, and it is called the “Write Skew” anomaly.

    ...

    In this thesis, we instead focus [on] changing the DBMS internals to guarantee that every execution is serializable regardless of the semantics of the transactions, while still having the attractive properties of SI, in particular much better performance than is allowed by strict two-phase locking.

    Note that, while working for SleepyCat/Oracle about 8 years ago, Cahill wrote a Snapshot Isolation implementation for BerkeleyDB, which I believe is the implementation that is now shipped as the "transactional" BDB edition, so he certainly knows what he's talking about here. Cahill's thesis ended up winning an award for the Australian dissertation of the year. Cahill also discusses an alternate implementation, for the InnoDB engine (again, an Oracle product) of MySQL.

Pretty geeky stuff, I know; my wife will certainly give me a hard time for this post. But if this is your cup of tea, drink deeply!

Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Shelter
    I meant to post this as part of my article on Watership Down , but then totally forgot: Shelter In Shelter you experience the wild as a moth...
  • The Legend of 1900: a very short review
    Fifteen years late, we stumbled across The Legend of 1900 . I suspect that 1900 is the sort of movie that many people despise, and a few peo...
  • Rediscovering Watership Down
    As a child, I was a precocious and voracious reader. In my early teens, ravenous and impatient, I raced through Richard Adams's Watershi...
  • Must be a heck of a rainstorm in Donetsk
    During today's Euro 2012 match between Ukraine and France, the game was suspended due to weather conditions, which is a quite rare occur...
  • Beethoven and Jonathan Biss
    I'm really enjoying the latest Coursera class that I'm taking: Exploring Beethoven’s Piano Sonatas . This course takes an inside-out...
  • Starting today, the games count
    In honor of the occasion: The Autumn Wind is a pirate, Blustering in from sea, With a rollocking song, he sweeps along, Swaggering boisterou...
  • Parbuckling
    The enormous project to right and remove the remains of the Costa Concordia is now well underway. There's some nice reporting on the NP...
  • For your weekend reading
    I don't want you to be bored this weekend, so I thought I'd pass along some articles you might find interesting. If not, hopefully y...
  • Are some algorithms simply too hard to implement correctly?
    I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes ...
  • Don't see me!
    When she was young, and she had done something she was embarrassed by or felt guilty about, my daughter would sometimes hold up her hand to ...

Blog Archive

  • ►  2013 (165)
    • ►  September (14)
    • ►  August (19)
    • ►  July (16)
    • ►  June (17)
    • ►  May (17)
    • ►  April (18)
    • ►  March (24)
    • ►  February (19)
    • ►  January (21)
  • ▼  2012 (335)
    • ►  December (23)
    • ►  November (30)
    • ►  October (33)
    • ►  September (34)
    • ▼  August (29)
      • Stuff I'm reading this weekend
      • No obligations, no gestures, no smiles, and no ins...
      • Hey Marseilles
      • 2012 World Chess Olympiad is underway
      • MongoDB 2.2 released
      • Post 1000
      • Maybe it's just a macaque
      • A steady murmur about HFT
      • Trying to digest the Apple/Samsung verdict
      • Fun photos of the GitHub office space
      • America's Cup mishap pictures
      • A nice short explanation of how a copy-on-write BT...
      • A compendium of database-y stuff
      • Ready Player One: a very short review
      • Reuters blogs outage continues
      • Hard problems, studied over many years
      • Backpacking 2012: Rancheria Creek, Hetch Hetchy Re...
      • Some boats in a race
      • Mumbling
      • MVCC theory
      • Kaspersky profile
      • Turtles all the way down
      • Knight Capital Group, continued
      • It's not just a game ...
      • Notstop looking
      • Knight trading debacle, redux
      • HFT again
      • Massachusetts WinFall lottery gambling associations
      • Olympian drama
    • ►  July (39)
    • ►  June (27)
    • ►  May (48)
    • ►  April (32)
    • ►  March (30)
    • ►  February (10)
Powered by Blogger.

About Me

Unknown
View my complete profile