MessiandNeymar

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Monday, October 15, 2012

Drilling down, spreading the load

Posted on 6:20 PM by Unknown

A few related, but independent, postcards from the bleeding edge crossed my eyeballs the last few days:

  • The column-oriented gang at Google just keep cranking things out. There was the Dremel work of last summer, and an entirely-unrelated (I think) project called Supersonic that claims to provide
    an ultra-fast, column oriented query engine library written in C++. It provides a set of data transformation primitives which make heavy use of cache-aware algorithms, SIMD instructions and vectorised execution, allowing it to exploit the capabilities and resources of modern, hyper pipelined CPUs

    And now we have PowerDrill, introduced with the wonderful title: Processing a trillion cells per mouse click

    we present the column-oriented datastore developed as one of the central components of PowerDrill. It combines the advantages of columnar data layout with other known techniques (such as using composite range partitions) and extensive algorithmic engineering on key data structures. The main goal of the latter being to reduce the main memory footprint and to increase the efficiency in processing typical user queries. In this combination we achieve large speed-ups. These enable a highly interactive Web UI where it is common that a single mouse click leads to processing a trillion values in the underlying dataset.
  • Meanwhile, I'm quite enjoying the slide deck posted by the Tokutek team from their tutorial session at the XLDB conference: Data Structures and Algorithms for Big Databases
    The tutorial was organized as follows:
    • Module 0: Tutorial overview and introductions. We describe an observed (but not necessary) tradeoff in ingestion, querying, and freshness in traditional database.
    • Module 1: I/O model and cache-oblivious analysis.
    • Module 2: Write-optimized data structures. We give the optimal trade-off between inserts and point queries. We show how to build data structures that lie on this tradeoff curve.
    • Module 2 continued: Write-optimized data structures perform writes much faster than point queries; this asymmetry affects the design of an ACID compliant database.
    • Module 3: Case study – TokuFS. How to design and build a write-optimized file systems.
    • Module 4: Page-replacement algorithms. We give relevant theorems on the performance of page-replacement strategies such as LRU.
    • Module 5: Index design, including covering indexes.
    • Module 6: Log-structured merge trees and fractional cascading.
    • Module 7: Bloom filters.
    The slides are superb, but I bet it was even greater to attend the presentation; it seems that they packed pretty much every important topic in the last 10 years of file structure design into a single presentation!
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Shelter
    I meant to post this as part of my article on Watership Down , but then totally forgot: Shelter In Shelter you experience the wild as a moth...
  • The Legend of 1900: a very short review
    Fifteen years late, we stumbled across The Legend of 1900 . I suspect that 1900 is the sort of movie that many people despise, and a few peo...
  • Rediscovering Watership Down
    As a child, I was a precocious and voracious reader. In my early teens, ravenous and impatient, I raced through Richard Adams's Watershi...
  • Must be a heck of a rainstorm in Donetsk
    During today's Euro 2012 match between Ukraine and France, the game was suspended due to weather conditions, which is a quite rare occur...
  • Beethoven and Jonathan Biss
    I'm really enjoying the latest Coursera class that I'm taking: Exploring Beethoven’s Piano Sonatas . This course takes an inside-out...
  • Starting today, the games count
    In honor of the occasion: The Autumn Wind is a pirate, Blustering in from sea, With a rollocking song, he sweeps along, Swaggering boisterou...
  • Parbuckling
    The enormous project to right and remove the remains of the Costa Concordia is now well underway. There's some nice reporting on the NP...
  • For your weekend reading
    I don't want you to be bored this weekend, so I thought I'd pass along some articles you might find interesting. If not, hopefully y...
  • Are some algorithms simply too hard to implement correctly?
    I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes ...
  • Don't see me!
    When she was young, and she had done something she was embarrassed by or felt guilty about, my daughter would sometimes hold up her hand to ...

Blog Archive

  • ►  2013 (165)
    • ►  September (14)
    • ►  August (19)
    • ►  July (16)
    • ►  June (17)
    • ►  May (17)
    • ►  April (18)
    • ►  March (24)
    • ►  February (19)
    • ►  January (21)
  • ▼  2012 (335)
    • ►  December (23)
    • ►  November (30)
    • ▼  October (33)
      • HMS Bounty RIP
      • Programmers and Paparazzi
      • 24 hours with the "17"
      • The slow maturation of C++
      • Kinda quiet recently...
      • A random collection of random stuff
      • IPv6 Summit in Slovenia
      • Instance tasting
      • New Linux 0, Old Laptop 1
      • Quantal Quetzal
      • Photo-essay on cork production
      • Facing the tsunami
      • The age of books is not yet over
      • Paul Allen is in the Bay Area
      • Scott Hanselman nails it
      • Once there was a sailboat...
      • Wind: 1, Oracle: 0
      • Presidential Precedent
      • Drilling down, spreading the load
      • XCOM Enemy Unknown
      • MOOCing again
      • Gawker outs VA
      • Harborside Health
      • It's not just a game ...
      • Leo Messi piece on ESPN
      • Social coding and Atlassian Stash
      • Gone Girl: a very short review
      • Computers and Chess
      • A translation guide for the culturally impaired
      • Russell Coutts goes airborne!
      • Perforce Git Fusion
      • SHA-3 is Keccak
      • San Francisco Measure F
    • ►  September (34)
    • ►  August (29)
    • ►  July (39)
    • ►  June (27)
    • ►  May (48)
    • ►  April (32)
    • ►  March (30)
    • ►  February (10)
Powered by Blogger.

About Me

Unknown
View my complete profile