Dropbox’s Magic Pocket: Power of Software Defined Storage

Background

Dropbox is one of the poster boys of the modern-day tech start-ups, similar to the Uber’s and the Netflix’s of the world that was founded by engineers using their engineering prowess to help consumers around the world address various day to day challenges using technologies in a novel way. So, when I was informed that not only Dropbox would be presenting at the SFD15, but we’d also get to tour their state of the art data center, I was ecstatic. (perhaps ecstatic is an understatement!). I work with various technology vendors, from large vendors like Microsoft, Amazon, VMware, Cisco, NetApp…. etc to little known start-ups and Dropbox’s name is often mentioned in event keynote speeches, case studies…etc by most of these vendors as a perfect example of how a born in the cloud organisation can use modern technology efficiently. Heck, they are even referenced in some of the AWS training courses I’ve come across on Pluralsight that talk about Drobox’s ingenious way of using AWS S3 storage behind the scene to store file data content.

So, when I learned that they have designed and built their own Software Defined Storage solution to bring back most of their data storage from AWS on to their on data centres, I was quite curious to find out more details of the said platform and the reasoning behind the move back to on-premises. Given it’s the first time their engineering team openly discussed things, I was looking forward to talking their engineering team at the event.

This post summarises what I learnt from the Dropbox team.

Introduction

I don’t think it’s necessary to introduce Dropbox to anyone these days. If, however you’ve been under a rock for the past 4 years, Dropbox is the pioneering tech organisation from the Silicon Valley that built an online content sharing and a collaboration platform that allows you to synchronise content between various end user devices automatically while letting you access them on any device, anywhere. During this process of data synchronisation and content sharing, they are dealing with,

  • 500+ million users
  • 500+ Petabytes of data storage
  • 750+ billion API calls handled

When they first went live, Dropbox used AWS’s S3 storage (PaaS) to store the actual user file data behind the scene, while their own web servers were used to host the metadata about those files and users. However, as their data storage requirements grew, the necessity to change this architecture was starting to outweigh the benefits such as the agility and ease provided by leveraging AWS cloud storage. As such, Dropbox decided to bring this file storage back in to their own data center on-premises. Dropbox states 2 unique reasons behind this decision: Performance requirements and the raw storage costs. Given the unique use case they have for block storage at extremely high scale, by designing a tailor-made cloud storage solution of their own engineered to provide maximum performance at the lowest unit cost, Dropbox was planning on saving a significant amount of operational costs. As a private company that is about to go in to a public IPO, saving costs was obviously high on their agenda.

Magic Pocket: Software Architecture

While the original name came from an old internal nick name to Dropbox itself, Magic Pocket (MP) now refers to their custom built, internally hosted, software defined, cloud storage infrastructure that is now used by Dropbox to host majority of their user’s file data. This is multi-exabytes in size, with data being fully replicated for availability and has a high data durability (12 x 9’s) and high availability (4 x 9’s).

Within the MP architecture, files are stored in to blocks and replicated across their geo boundaries within their internal infrastructure (back end storage nodes) for durability and availability. The data stored in the MP infrastructure consist of 4mb blocks that are immutable by design. Changes to the data in the blocks are tracked through a File Journal that is part of the metadata held on the Dropbox application servers. Due to the temporal locality of the data, bulk of the static data that are cold, are stored on high capacity, high latency but cheap spinning drives while meta data, cache data & DB’s are kept on high performance low latency, but expensive SSDs.

Unlike most enterprise focused Software Defined Storage (SDS) solutions that utilises some kind of quorum style consensus or distributed coordination to ensure data availability and integrity, MP utilises a simple, centralised, sharded MySQL cluster which is a bit of surprise. Data redundancy is made available through…yeah you guessed it! Customised Erasure coding, similar to many other enterprise SDS solutions however. Data is typically replicated at 1GB chunks (known as buckets) that consist of random, often contiguous 4K blocks. A bucket would replicate or Erasure coded across multiple physical servers (storage nodes) and a set of 1 or more buckets replicated to a set of nodes makes up a volume. This architecture is somewhat similar to how the enterprise SDS vendor Hedvig store their data in the back end.

In Dropbox’s SDS architecture, a pocket is similar to a fault domain in other enterprise SDS solutions and is a geographical zone (US east, US west & US Central for example). Each zone has a cluster of storage servers and other application servers and data blocks are replicated across multiple zones for availability. Pretty standard stuff so far.

Dropbox has a comprehensive Edge network which is geographically dispersed across the world to funnel all customer Drobox application’s connectivities through. The client connectivity path is Application (on user device) -> local pop (proxy servers in an edge location) > Block server > Magic Pocket infrastructure servers > Storage nodes. While the proxy servers in edge locations don’t store any caching of data and can almost be thought of as typical Web servers the clients connect through, the other servers such as Block/MP/Storage nodes servers are ordinary X86 servers stores within Dropbox’s own DCs. These servers are multi sourced as per best practise, and somewhat customised for Dropbox’s specific requirements, especially when it comes to storage node servers. Storage nodes are customised, high density, storage nodes with a capacity to have around 1PB of raw data in each server using local disks. All servers run a generic version of Ubuntu and runs bare metal rather than as VM’s.

Inside each zone, application servers such as Block & Magic Pocket app & db servers act as gateways for storage requests coming through the edge servers. These also hosts the meta data mapping for block placement (block index) in the backend and runs sharded MySQL clusters to store this information (running on SSD storage). Cross zone replication is also initiated in an asynchronous manner within this tier.

A cell is a logical entity of physical storage servers (a cluster of storage nodes) and that defines the core of the Dropbox’s proprietary storage backend which is worth a closer look. These have very large local disks and each storage server (node) consist of around 1PB of storage. These nodes are used as dumb nodes for block level data storage. Replication table, which runs in memory as a small MySQL DB stores the mapping of logical Bucket <-> Volume <-> Storage nodes. This is also part of the metadata stack and is stored on app / db servers with SSD storage.

Master is the software component within each cell that is acting as a janitor and performs back end tasks such as storage node monitoring, creating storage buckets, and other background maintenance operations. However the Master is not on the data plane so doesn’t affect the immediate data read / write operations. There’s a 1:1 mapping between master : Cell. Volume manager (another software component) can be thought of as the data movers / heavy lifters responsible for handling instructions from Master and performing operations accordingly on the storage nodes. Volume manager runs on the actual storage nodes (Storage servers) in the back end.

The front end (interface to the SDS platform) supports simple operations such as Put, Get and Repair. (Details of how this works can be found here)

Magic Pocket: Storage Servers

Dropbox’s customized, high density storage servers make up the actual back end storage infrastructure. Typically each server has a 40GB NIC, around 90 x high capacity enterprise SATA drives as local disks totalling up to around 1PB of raw space per node, runs a bare metal Ubuntu Linux with the Magic Pocket SDS application code and their life cycle management is heavily automated using proprietary and custom built tools. This set up provides a significantly large fault domain per each storage node given the huge capacity of each, but the wider SDS application and network load balancing capabilities architected in the application itself ensure mitigate or design against a complete failures of each server or a cell. We were treated to a scene of observing how this works in action when these engineering team decided to randomly pull the networking cables out while we were touring the DC, and then also cut the power to a full rack which had zero impact on the normal operations of Dropbox’s service. That was pretty cool to see.

My thoughts

Companies like Dropbox inspire me to think outside of the box when it comes to what is possible and how to address modern day business requirements using innovative ways using technology. Similar to the session on Open19 project (part of the Open Compute Project) from the LinkedIn engineering team during SFD12 event last year, this session has also hugely inspired me about the power of software & Hardware engineering and, the impact initiatives like this can have on the wider IT community at large, that we all live and breathe.

As for the Magic pocket SDS & HW architecture… I am a big fan and its great to see organisations such as Dropbox and Netflix (CDN architecture) who epitomises extreme ends of certain use cases, publicly opening up about the backend IT infrastructure that are powering their solutions so that 99% of the other enterprise IT folks can learn and adapt from those blueprints where relevant.

It is also important to remember though, for normal organisations with typical enterprise IT requirements, such custom-built solutions will not be practical nor would they be required and often, the best they’d need can be met with a similarly architected, commercially available Software Defined Storage solution and tailor to meet their requirements. The most important part here though is to realise the power of Software Defined Storage here. If Dropbox can meet their extreme storage requirements through a Software Defined Storage solution that operate on a lower cost premium than a proprietary storage solution, the average corporate or enterprise storage use cases do not have any excuse to keep buying expensive SAN / NAS hardware with a premium price tag. Most enterprise SDS storage solutions (VMware vSAN, Nutanix, Hedvig, Scality…etc.) all have a very similar software and a hardware architecture to that of Dropbox’s and carries a lower cost price point compared to expensive hardware centric storage solutions from the big vendors like EMC, NetApp, HPe, IBM…etc. So why not look in to a SDS solution to if your SAN / NAS is up for a renewal? You can very likely save significant costs and at the same time, benefit from a software defined innovation which tends to comes quicker when there’s no proprietary hardware baggage.

Given Dropbox’s unique scale and storage size, they’ve made a conscious decision to move away for the majority of their storage requirements from AWS (S3 storage) as it they’ve gone past the point where using cloud storage was not economical nor performant enough. But it is also important to remember that they only got to that point through the growth of their business which at the beginning, was only enabled by the agility provided by the very same AWS S3 cloud storage platform they decided to move away from. Most organisations out there are nowhere near the level of scale like Dropbox and therefore its important to remember that for your typical requirements, you can benefit significantly through the clever use of cloud technologies, especially PaaS technologies such as AWS S3, AWS Lambda, Microsoft O365, Azure SQL that provide a ready to use technology solutions platform without you having to build it all from the scratch. In most cases, that freedom and the speed of access can be a worthy trade-off for a slightly higher cost.

Keen to get your thoughts – get involved via comments button below!

Image credit goes to Dropbox!

Chan

Storage Field Day 15 – Watch Live Here

Following on from my previous post about the vendor line-up and my plans during the event, this post is to share the exact vendor presentation schedule and some additional details.

Watch LIVE!

Below is the live streaming link to the event on the day if you’d like to join us LIVE. While the time difference might make it a little tricky for some, it is well worth taking part in as all the viewers will also have the chance to ask questions from the vendors live, similar to the delegates onset. Just do it, you won’t be disappointed!

Session Schedule

Given below is the session schedule throughout the event, starting from Wednesday the 7th. All times are in Pacific time (-8 hours from UK time)

Wednesday the 7th of March

    • 09:30 – 11:30 (5:30-7:30pm UK time) – WekaIO presents
    • 13:00 – 15:00 (9-11pm UK time) – IBM presents
    • 16:00 – 18:00 (12-2am 8th of March, UK time) Dropbox presents

Thursday the 8th of March

  • 08:00-10:00 (4-6pm UK time) – Hedvig presents from their Santa Clara offices
  • 10:30-12:30 (6:30-8:30pm UK time) NetApp presents from their Santa Clara offices
  • 13:30-15:30 (9:30-11:30pm UK time) – Western Digital/Tegile presents from Levi’s Stadium
  • 16:00-18:00 (12-2am 9th of March, UK time) – Datrium presents from Levi’s Stadium

Friday the 9th of March

  • 08:00-10:00 (4-6pm UK time) – StarWinds presents in the Seattle Room
  • 11:00-13:00 (7-9pm UK time) – Cohesity presents at their San Jose offices
  • 14:00-16:00 (10pm-12am UK time) – Huawei presents at their Santa Clara offices

Storage Field Day 15 – Introduction

Having attended the Storage Field Day 12 edition at the invitation of the chief organiser Stephen Foskett back in 2017, I was really looking forward to attending this popular event again. Luckily my wishes have been answered again this year where I’ve been invited again to attend the SFD15 with an exciting line up of various enterprise storage vendors. This post is a quick intro about the event and the schedule ahead.

SFD – Quick Introduction!

Storage Field Day is a part of the popular, invitees only, Tech Field Day series of events organised and hosted by Gestalt IT (GestaltIT.com). The genius idea behind the event is to bring together innovative technology solutions from various vendors (The “Sponsors”) who will be presenting their solutions live to a room full of independent technology bloggers and thought leaders (The “delegates”), chosen from around the world based on their knowledge, community profile and thought leadership, in order to get their independent thoughts (good or bad) of the said solutions. The event is also streamed live worldwide for anyone to tune in to and is often used by various technology start-ups to announce their arrival to the mainstream markets.

There are various different field day events organised by Gestalt IT such as Tech / Storage / Cloud / Mobility / Networking / Virtualisation / Unified Communications / Wirelesss Field Day events that take place throughout the year with respective technology vendor solutions showcased in each. It’s organised by the chief organiser Stephen Foskett (@SFoskett) and has always been an extremely popular event amongst the vendors as it provides an ideal opportunity for them to present their new products and solutions to a number of thought leaders and community influencers from around the world and get their valuable thoughts & feedback. It is equally popular amongst the attending delegates who gets the opportunity, not only to witness brand new technology at times, but also be able to critique and express their valuable feedback in front of these vendors.

During each day, the delegates and the organisers (Steve along with few of the supporting crew members such as the camera crew) would travel to each of the participating vendors offices (often in the Silicon Valley) where the technology & business leaders would be presenting there’s solutions to the audience. Typically, this whole session is streamed live thanks to the SFD camera crew and each session is also recorded to be posted on to various video sharing sites for subsequent viewings. I would seriously recommend have a look at their YouTube channel https://www.youtube.com/channel/UCSnTTyp4q7jMhwECxXzMAXQ/featured for past session recordings if you are a technology person with a keen interest in enterprise technology solutions.

SFD15 – Schedule & Vendor line-up

SFD15 is due to take place in the Silicon Valley between the 7-9th of March 2018. From what I understood, SFD15 has been a fiercely competitive event in terms of the sponsor slots by various vendors due to high popularity and I’ve noticed the participating vendor list changing few times up until now, presumably due to increasing competition. The list of vendors confirmed (as of today-17.02.2018) are as follows

  • Cohesity:    
    • Cohesity is a relatively new, secondary storage vendor that I’ve been keeping a close eye on for a while now. I know their offering and their value proposition fairly well and am looking forward to understanding what’s new and their future plans.
  • Hedvig:    
    • Hedvig is a Software Defined Storage company started about 5 years ago in the Silicon Valley by an ex engineer with distributed file system and database design experience at AWS and Facebook who invented Amazon DynamoDB & Cassandra. I’ve reviewed Hedvig’s technology quite closely in the past and have liked what I’ve seen. So, I’m very keen to see what they have to say this time around their latest offering and the future plans.
    • While I had come across these guys before, I didn’t know the details of their offering other than that they claim to be an (better) alternative typical converged and Hyper-Converged solutions out there. So looking forward to finding out a bit more about their offering.
    • This will likely be the Tegile offering that was procured by Wester Digital. Looking forward to this.
    • Huawei is the popular Chinese data center IT behemoth that is growing in popularity each year. Incidentally, Huawei is a vendor partner of my employer and have a growing presence in the UK public sector due to their lower costs on infrastructure. While they manufacture everything from servers to software Defined Networking controllers, I’d presume the focus of their session during SFD15 would be on their storage offering. I am personally not fully familiar with Huawei’s storage offerings so this could be a great opportunity for me to right that wrong.
  • IBM:
    • No introduction necessary for this one . Not sure what IBM storage solution they’ll be covering during SFD15
    • StarWind who is a HCI solution vendor, presented in the last SFD12 I attended and its good to see them come back to SFD15 again. Keen to get an updated view on things.
  • WEKA.IO:    
    • These guys claim to have the world’s fasted parallel file system and are totally new to me. So I am really looking forward to finding out the details of their offering and its value proposition.

My Plan

I will be travelling to San Jose from London Heathrow on the 6th with a view to catching up with this year’s delegates over the evening meal on the 6th which is always fun. There are some familiar faces from last years SFD12 as well as few new faces (to me) so looking forward to meeting these girls and boys. I am due to fly back to London on Monday the 12th and I expect to have some intimate knowledge of these solutions. I am aiming to provide some posts on most of the interesting solutions I come across to provide a deep dive and my independent thoughts on them during and after the event, which will all be published here on my blog.

If you are interested in attending a future TFD / SFD event, all the information you need to know can be found here at http://techfieldday.com/delegates/become-field-day-delegate/.

If you are interested in watching these sessions / presentations live, you can find the schedule and the live stream link information at http://techfieldday.com/event/sfd15/. I would seriously encourage you to watch the event live and do take part in the live questions as well.

 

SFD12 Posts

As mentioned above, I learnt a lot during the SFD12 participation last year about the storage industry in general as well about the direction of a number of storage vendors. If you are interested in finding out more, see my #SFD12 articles below