MessiandNeymar

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, March 6, 2013

A couple of interesting papers

Posted on 11:02 AM by Unknown

In between compiles, I've been spending some time with:

  • Nobody ever got fired for buying a cluster:
    We claim that a single "scale-up" server can process each of these jobs and do as well or better than a cluster in terms of performance, cost, power, and server density. Is it time to consider the "common case" for "big data" analytics to be the single-server rather than the cluster case? If so, this has implications for data center hardware as well as software architectures.
  • Optimizing Google’s Warehouse Scale Computers: The NUMA Experience
    It is overwhelmingly challenging to diagnose and attribute this performance swing to individual microarchitectural factors. Effects such as the contention for cache/bandwidth with various corunning applications on a server, non-uniform memory accesses (NUMA), and I/O interference among other factors all carry implications on the effectiveness of the policies used for cluster-level scheduling, machine-level resource management, and the execution configurations of the Gmail servers.
  • Themis: An I/O-Efficient MapReduce
    Given that many MapReduce jobs are I/O-bound, an efficient MapReduce system must aim to minimize the number of I/O operations it performs. Fundamentally, every MapReduce system must perform at least two I/O operations per record when the amount of data exceeds the amount of memory in the cluster. We refer to a system that meets this lower-bound as having the “2-IO” property. Any data processing system that does not have this property is doing more I/O than it needs to. Existing MapReduce systems incur additional I/O operations in exchange for simpler and more fine-grained fault tolerance.

    In this paper, we present Themis, an implementation of MapReduce designed to have the 2-IO property.

  • MinuteSort with Flat Datacenter Storage
    The sorts were accomplished using a heterogeneous cluster consisting of 256 computers and 1,033 disks, divided broadly into two classes: storage nodes and compute nodes. Notably, no compute node in our system uses local storage for data; we believe FDS is the first system with competitive sort performance that uses remote storage. Because files are all remote, our 1,470 GB runs actually transmitted 4.4 TB over the network in under a minute.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Shelter
    I meant to post this as part of my article on Watership Down , but then totally forgot: Shelter In Shelter you experience the wild as a moth...
  • The Legend of 1900: a very short review
    Fifteen years late, we stumbled across The Legend of 1900 . I suspect that 1900 is the sort of movie that many people despise, and a few peo...
  • Rediscovering Watership Down
    As a child, I was a precocious and voracious reader. In my early teens, ravenous and impatient, I raced through Richard Adams's Watershi...
  • Must be a heck of a rainstorm in Donetsk
    During today's Euro 2012 match between Ukraine and France, the game was suspended due to weather conditions, which is a quite rare occur...
  • Beethoven and Jonathan Biss
    I'm really enjoying the latest Coursera class that I'm taking: Exploring Beethoven’s Piano Sonatas . This course takes an inside-out...
  • Starting today, the games count
    In honor of the occasion: The Autumn Wind is a pirate, Blustering in from sea, With a rollocking song, he sweeps along, Swaggering boisterou...
  • Parbuckling
    The enormous project to right and remove the remains of the Costa Concordia is now well underway. There's some nice reporting on the NP...
  • For your weekend reading
    I don't want you to be bored this weekend, so I thought I'd pass along some articles you might find interesting. If not, hopefully y...
  • Are some algorithms simply too hard to implement correctly?
    I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes ...
  • Don't see me!
    When she was young, and she had done something she was embarrassed by or felt guilty about, my daughter would sometimes hold up her hand to ...

Blog Archive

  • ▼  2013 (165)
    • ►  September (14)
    • ►  August (19)
    • ►  July (16)
    • ►  June (17)
    • ►  May (17)
    • ►  April (18)
    • ▼  March (24)
      • Easter weekend reading
      • BioShock Infinite: a very short GUEST review
      • Age, by Bryan
      • Some versioning theory
      • Go Magnus go!
      • Madness!
      • A Glorious Defeat: a very short review
      • Same Trailer Different Park: a very short review
      • Stuff I'm reading on a Friday afternoon
      • Thought for the day
      • Crypto-Turing in 2012
      • Friday afternoon reading
      • Google Reader RIP
      • PS3 Network Diagnosis
      • Spot the Bryan
      • Tick ... tick ... tick ... Magnus is coming!
      • Tears of the Jaguar: a very short review
      • Sunday Morning Legos
      • Some interesting SSD tidbits
      • A couple of interesting papers
      • It's probably much better with a pitcher of margar...
      • Important it ain't
      • The Witness
      • Coursera quick hit
    • ►  February (19)
    • ►  January (21)
  • ►  2012 (335)
    • ►  December (23)
    • ►  November (30)
    • ►  October (33)
    • ►  September (34)
    • ►  August (29)
    • ►  July (39)
    • ►  June (27)
    • ►  May (48)
    • ►  April (32)
    • ►  March (30)
    • ►  February (10)
Powered by Blogger.

About Me

Unknown
View my complete profile