MessiandNeymar

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, August 22, 2012

A compendium of database-y stuff

Posted on 1:14 PM by Unknown

I found myself wandering through some rather random database-y stuff recently; here's some of the fun stuff I've been reading this week:

  • Cassandra query performance. Aaron Morton, a Cassandra committer, posted a slide deck entitled: Cassandra SF 2012 - Technical Deep Dive: query performance. It's a very interesting presentation, although by the time you're a few dozen slides into the presentation, and he's described 8 or 10 different ways to try to tweak the Cassandra commit and cache algorithms, will you be as freaked out as I was?
  • OpenLDAP MDB. I think they're positioning this as an embedded data store:
    MDB is an ultra-fast, ultra-compact key-value data store. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases, and is only limited to the size of the virtual address space.
    Read more at: The MDB site.

    It's cool to look at what people do to make an embedded data store as absolutely blindingly fast as possible.

  • Distributed transactions for Google App Engine. If I'm understanding this correctly, these guys built a consistent distributed data store on top of a collection of independent local data stores.

    It's an extremely interesting paper, and a 1-hour video presentation by the author is also available from that page.

    Our contribution is that we provide transactional semantics without restriction (1): we create a kind of transaction that works across objects not in the same Entity Group. We call these transactions "Distributed Transactions" (DTs) in order to distinguish them from the original GAE "Local (to one Entity Group) Transactions".

    When using our Distributed Transactions the set of objects operated upon must be specified directly by their Keys, as one does with an object store, not by predicates on their properties, as does with a general relational query.

  • HBase Replication: Operational Overview. Available here.

    HBase's replication support continues to evolve. It still looks like a very complex system that is quite hard to monitor and diagnose. The basic tool appears to be to do some sort of massive data diff on your running system(s):

    A standard way to verify is to run the verifyrep mapreduce job, that comes with HBase. It should be run at the master cluster and require slave clusterId and the target table name. One can also provide additional arguments such as start/stop timestamp and column families. It prints out two counters namely, GOODROWS and BADROWS, signifying the number of replicated and unreplicated rows, respectively.
  • MemSQL Architecture. Yet another in-memory database that claims to be the fastest one of all. In a recent blog entry describing their benchmarking, the developers say:
    MemSQL is an in-memory database that stores all the contents of the database in RAM but backs up to disk. MongoDB and MySQL store their data on disk, though can be configured to cache the contents of the disk in RAM and asynchronously write changes back to disk. This fundamental difference influences exactly how MemSQL, MongoDB and MySQL store their data-structures: MemSQL uses lock-free skip lists and hash tables to store its data, whereas MongoDB and MySQL use disk-optimized B-trees.

    Todd Hoff reports his notes from interviewing the MemSQL team here, saying:

    On the first hearing of this strange brew of technologies you would not be odd in experiencing a little buzzword fatigue. But it all ends up working together. The mix of lock-free data structures, code generation, skip lists, and MVCC makes sense when you consider the driving forces of data living in memory and the requirement for blindingly fast execution of SQL queries.
  • PVLAN, VXLAN and cloud application architectures. I don't know very much about modern networking technologies, so most of this page went way over my head, but it sounds like the evolution in the networking world is as fast as it is in the data world.
    In short – you need multiple isolated virtual network segments with firewalls and load balancers sitting between the segments and between the web server(s) and the outside world.

    VXLAN, NVGRE or NVP, combined with virtual appliances, are an ideal solution for this type of application architectures. Trying to implement these architectures with PVLANs would result in a total spaghetti mess of isolated and community VLANs with multiple secondary VLANs per tenant.

  • Why BTrees beat Hashing for sharding. I've written about the Galaxy project before; it's new but quite intriguing. In this post, they try to do some benchmarking as well as some analytical reasoning.
    not only have we got O(1) (amortized) worst-case performance in terms of network roundtrips, but the actual constant is much less than 1 (because C>>2b), which is what we'd get when using a distributed hash-table. This is perfect scalability, and it is a direct result of the properties of the B+-tree (and other similar tree data structures), and is not true for all data structures.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Shelter
    I meant to post this as part of my article on Watership Down , but then totally forgot: Shelter In Shelter you experience the wild as a moth...
  • The Legend of 1900: a very short review
    Fifteen years late, we stumbled across The Legend of 1900 . I suspect that 1900 is the sort of movie that many people despise, and a few peo...
  • Rediscovering Watership Down
    As a child, I was a precocious and voracious reader. In my early teens, ravenous and impatient, I raced through Richard Adams's Watershi...
  • Must be a heck of a rainstorm in Donetsk
    During today's Euro 2012 match between Ukraine and France, the game was suspended due to weather conditions, which is a quite rare occur...
  • Beethoven and Jonathan Biss
    I'm really enjoying the latest Coursera class that I'm taking: Exploring Beethoven’s Piano Sonatas . This course takes an inside-out...
  • Starting today, the games count
    In honor of the occasion: The Autumn Wind is a pirate, Blustering in from sea, With a rollocking song, he sweeps along, Swaggering boisterou...
  • Parbuckling
    The enormous project to right and remove the remains of the Costa Concordia is now well underway. There's some nice reporting on the NP...
  • For your weekend reading
    I don't want you to be bored this weekend, so I thought I'd pass along some articles you might find interesting. If not, hopefully y...
  • Are some algorithms simply too hard to implement correctly?
    I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes ...
  • Don't see me!
    When she was young, and she had done something she was embarrassed by or felt guilty about, my daughter would sometimes hold up her hand to ...

Blog Archive

  • ►  2013 (165)
    • ►  September (14)
    • ►  August (19)
    • ►  July (16)
    • ►  June (17)
    • ►  May (17)
    • ►  April (18)
    • ►  March (24)
    • ►  February (19)
    • ►  January (21)
  • ▼  2012 (335)
    • ►  December (23)
    • ►  November (30)
    • ►  October (33)
    • ►  September (34)
    • ▼  August (29)
      • Stuff I'm reading this weekend
      • No obligations, no gestures, no smiles, and no ins...
      • Hey Marseilles
      • 2012 World Chess Olympiad is underway
      • MongoDB 2.2 released
      • Post 1000
      • Maybe it's just a macaque
      • A steady murmur about HFT
      • Trying to digest the Apple/Samsung verdict
      • Fun photos of the GitHub office space
      • America's Cup mishap pictures
      • A nice short explanation of how a copy-on-write BT...
      • A compendium of database-y stuff
      • Ready Player One: a very short review
      • Reuters blogs outage continues
      • Hard problems, studied over many years
      • Backpacking 2012: Rancheria Creek, Hetch Hetchy Re...
      • Some boats in a race
      • Mumbling
      • MVCC theory
      • Kaspersky profile
      • Turtles all the way down
      • Knight Capital Group, continued
      • It's not just a game ...
      • Notstop looking
      • Knight trading debacle, redux
      • HFT again
      • Massachusetts WinFall lottery gambling associations
      • Olympian drama
    • ►  July (39)
    • ►  June (27)
    • ►  May (48)
    • ►  April (32)
    • ►  March (30)
    • ►  February (10)
Powered by Blogger.

About Me

Unknown
View my complete profile