MessiandNeymar

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, June 6, 2013

Network failures and system resilience

Posted on 9:56 AM by Unknown

On his Aphyr blog, Kyle Kingsbury has been doing some superb work.

First, there's his recent article surveying the ways that networks can fail, defeating our well-intentioned-but-inadequate attempts to survive such failures: The Network is Reliable

This post is meant as a reference point–to illustrate that, according to a wide range of accounts, partitions occur in many real-world environments. Processes, servers, NICs, switches, local and wide area networks can all fail, and the resulting economic consequences are real. Network outages can suddenly arise in systems that are stable for months at a time, during routine upgrades, or as a result of emergency maintenance. The consequences of these outages range from increased latency and temporary unavailability to inconsistency, corruption, and data loss. Split-brain is not an academic concern: it happens to all kinds of systems–sometimes for days on end. Partitions deserve serious consideration.

And don't stop there; make sure you read Kingsbury's series of articles on system architectures for handling network partitions:

  • Call me maybe: Carly Rae Jepsen and the perils of network partitions
    This article is part of Jepsen, a series on network partitions. We're going to learn about distributed consensus, discuss the CAP theorem's implications, and demonstrate how different databases behave under partition.
  • Call me maybe: Postgres
    Previously on Jepsen, we introduced the problem of network partitions. Here, we demonstrate that a few transactions which “fail” during the start of a partition may have actually succeeded.
  • Call me maybe: Redis
    Previously on Jepsen, we explored two-phase commit in Postgres. In this post, we demonstrate Redis losing 56% of writes during a partition.
  • Call me maybe: MongoDB
    Previously in Jepsen, we discussed Redis. In this post, we'll see MongoDB drop a phenomenal amount of data.
  • Call me maybe: Riak
    Previously in Jepsen, we discussed MongoDB. Today, we'll see how last-write-wins in Riak can lead to unbounded data loss.
  • Call me maybe: Final Thoughts
    Previously in Jepsen, we discussed Riak. Now we'll review and integrate our findings.
  • Asynchronous replication with failover
    In response to my earlier post on Redis inconsistency, Antirez was kind enough to help clarify some points about Redis Sentinel's design.

I've been really enjoying and learning from these articles; I hope Kingsbury continues to write and publish more great work!

Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Shelter
    I meant to post this as part of my article on Watership Down , but then totally forgot: Shelter In Shelter you experience the wild as a moth...
  • The Legend of 1900: a very short review
    Fifteen years late, we stumbled across The Legend of 1900 . I suspect that 1900 is the sort of movie that many people despise, and a few peo...
  • Rediscovering Watership Down
    As a child, I was a precocious and voracious reader. In my early teens, ravenous and impatient, I raced through Richard Adams's Watershi...
  • Must be a heck of a rainstorm in Donetsk
    During today's Euro 2012 match between Ukraine and France, the game was suspended due to weather conditions, which is a quite rare occur...
  • Beethoven and Jonathan Biss
    I'm really enjoying the latest Coursera class that I'm taking: Exploring Beethoven’s Piano Sonatas . This course takes an inside-out...
  • Starting today, the games count
    In honor of the occasion: The Autumn Wind is a pirate, Blustering in from sea, With a rollocking song, he sweeps along, Swaggering boisterou...
  • Parbuckling
    The enormous project to right and remove the remains of the Costa Concordia is now well underway. There's some nice reporting on the NP...
  • For your weekend reading
    I don't want you to be bored this weekend, so I thought I'd pass along some articles you might find interesting. If not, hopefully y...
  • Are some algorithms simply too hard to implement correctly?
    I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes ...
  • Don't see me!
    When she was young, and she had done something she was embarrassed by or felt guilty about, my daughter would sometimes hold up her hand to ...

Blog Archive

  • ▼  2013 (165)
    • ►  September (14)
    • ►  August (19)
    • ►  July (16)
    • ▼  June (17)
      • Catching some rays on the bay
      • Soleil
      • Bloggin' for the man
      • The Trial
      • In the cloud, you never know where your server is
      • Some mysteries last for decades
      • The third dimension
      • Shear Keys
      • Tom Fassbender's JMT hike
      • I'm starting to get nervous about losing Google Re...
      • I continue to be baffled by e-book pricing
      • What I'm reading
      • Network failures and system resilience
      • Naval contemplations
      • Retweet this!
      • Hoops
      • Pinnacles National Park
    • ►  May (17)
    • ►  April (18)
    • ►  March (24)
    • ►  February (19)
    • ►  January (21)
  • ►  2012 (335)
    • ►  December (23)
    • ►  November (30)
    • ►  October (33)
    • ►  September (34)
    • ►  August (29)
    • ►  July (39)
    • ►  June (27)
    • ►  May (48)
    • ►  April (32)
    • ►  March (30)
    • ►  February (10)
Powered by Blogger.

About Me

Unknown
View my complete profile