MessiandNeymar

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Tuesday, April 10, 2012

HumptyDumpty in NoSQL land

Posted on 10:20 PM by Unknown

“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.” -- Lewis Carroll's Through the Looking Glass

I've recently been trying to understand more about these "NoSQL" systems, and how they work.

One interesting question is what they mean by "consistency". There is lots of talk about consistency, and eventual consistency, and the CAP theorem, and things like that.

And it's all very vague.

For example, do a search like this: (Google search for "HBase strong consistency"), and you'll find lots of pages like this that say things like:

if you search online posts related to HBase and Cassandra comparisons, you will regularly find the HBase community explaining that they have chosen CP, while Cassandra has chosen AP – no doubt mindful of the fact that most developers need consistency (the C) at some level.

Indeed, HBase's own documentation says:

Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore. This makes it very suitable for tasks such as high-speed counter aggregation.

So I guess that the HBase development team is choosing to define "strongly consistent" as "not 'eventually consistent'". Which isn't very much of a definition, in my opinion.

If you search still more, you'll find more detailed information, such as this HBase page on ACID semantics, which admits that:

HBase is not an ACID compliant database.

and then proceeds to completely re-define the famous ACID properties that Jim Gray set forth nearly 35 years ago.

It's very instructive to compare the original relational database definitions of the ACID properties versus the HBase definitions.

First, here's the class relational DBMS definitions, from the above Wikipedia article:

Atomicity

Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes.

Consistency

The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof.

Isolation

Isolation refers to the requirement that no transaction should be able to interfere with another transaction. One way of achieving this is to ensure that no transactions that affect the same rows can run concurrently, since their sequence, and hence the outcome, might be unpredictable. This property of ACID is often partly relaxed due to the huge speed decrease this type of concurrency management entails.[citation needed]

Durability

Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently. If the database crashes immediately thereafter, it should be possible to restore the database to the state after the last transaction committed.

Now, here's the HBase definitions, from the HBase ACID semantics page:

Definitions

For the sake of common vocabulary, we define the following terms:

Atomicity

an operation is atomic if it either completes entirely or not at all

Consistency

all actions cause the table to transition from one valid state directly to another (eg a row will not disappear during an update, etc)

Isolation

an operation is isolated if it appears to complete independently of any other concurrent transaction

Durability

any update that reports "successful" to the client will not be lost

Visibility

an update is considered visible if any subsequent read will see the update as having been committed

These aren't even remotely close to the same definitions!

It's not at all clear what the NoSQL community is trying to do by re-defining all these words, and it's doubly not clear why the entire computing industry appears to be going along with it.

Why not define new terminology? Why change the meanings of words that have had precise definitions for about as long as general purpose computers have been in use?

Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Shelter
    I meant to post this as part of my article on Watership Down , but then totally forgot: Shelter In Shelter you experience the wild as a moth...
  • The Legend of 1900: a very short review
    Fifteen years late, we stumbled across The Legend of 1900 . I suspect that 1900 is the sort of movie that many people despise, and a few peo...
  • Rediscovering Watership Down
    As a child, I was a precocious and voracious reader. In my early teens, ravenous and impatient, I raced through Richard Adams's Watershi...
  • Must be a heck of a rainstorm in Donetsk
    During today's Euro 2012 match between Ukraine and France, the game was suspended due to weather conditions, which is a quite rare occur...
  • Beethoven and Jonathan Biss
    I'm really enjoying the latest Coursera class that I'm taking: Exploring Beethoven’s Piano Sonatas . This course takes an inside-out...
  • Starting today, the games count
    In honor of the occasion: The Autumn Wind is a pirate, Blustering in from sea, With a rollocking song, he sweeps along, Swaggering boisterou...
  • Parbuckling
    The enormous project to right and remove the remains of the Costa Concordia is now well underway. There's some nice reporting on the NP...
  • For your weekend reading
    I don't want you to be bored this weekend, so I thought I'd pass along some articles you might find interesting. If not, hopefully y...
  • Are some algorithms simply too hard to implement correctly?
    I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes ...
  • Don't see me!
    When she was young, and she had done something she was embarrassed by or felt guilty about, my daughter would sometimes hold up her hand to ...

Blog Archive

  • ►  2013 (165)
    • ►  September (14)
    • ►  August (19)
    • ►  July (16)
    • ►  June (17)
    • ►  May (17)
    • ►  April (18)
    • ►  March (24)
    • ►  February (19)
    • ►  January (21)
  • ▼  2012 (335)
    • ►  December (23)
    • ►  November (30)
    • ►  October (33)
    • ►  September (34)
    • ►  August (29)
    • ►  July (39)
    • ►  June (27)
    • ►  May (48)
    • ▼  April (32)
      • What about for the PS3 version?
      • Ubuntu 12.04
      • Johan Peitz's SuperMario Summary
      • Precise Pangolin
      • Oracle/Google Java/Android trial is in full swing
      • Needed: more hours in the day
      • The Descriptive Camera
      • Original PoP disks recovered
      • I should have known this, but didn't
      • All Valve, all the time
      • ONS 2012
      • MovieMimic considered clever
      • Groan
      • Perforce 2012.1 is released!
      • Apropos of nothing
      • The Oracle-Google trial over Java and Android is f...
      • Keep Reading!
      • Security Games
      • For your weekend reading
      • The Battle of Pittsburg Landing
      • Three years at this blog
      • AWS docs in Kindle format
      • HumptyDumpty in NoSQL land
      • It's not just a book ...
      • Comprehending padding oracle exploits
      • Happy Birthday Tom Lehrer
      • Bryan turns 40!
      • How many hours are there in the day?
      • Symantec's "lost" cellphone experiment
      • In theory, theory and practice are the same ...
      • Hunger Games movie
      • It's a good idea to visit your Mom
    • ►  March (30)
    • ►  February (10)
Powered by Blogger.

About Me

Unknown
View my complete profile