All posts by Chan

Lead Solutions Architect & Hybrid Cloud Practice Lead

NetApp United 2018 – No it’s not another football team!

I was glad to see an email from the NetApp united team this afternoon confirming that I’ve been selected as a member of the prestigious NetApp United (#NetAppUnited) team for 2018 which is a great honour indeed. Thanks NetApp!

Contrary to popular belief – NetApp United is NOT a football team but global community of individuals united by the passion for great technology. Similar to the VMware vExpert and Dell EMC elect programmes, NetAppUnited is a community programme run by NetApp (@PeytStefanova is the organiser in chief) to recognise global NetApp technology experts and community influencers with a view to giving them a platform to share more of their thoughts, contents, influence and ultimately share more of their expertise publicly though various community channels. Similar to the other community programs from other vendors, NetApp united is all about giving back to the community which is a good cause and I was happy to support.

Being recognised a member of the NetApp United program entitles you to a number of exclusive benefits such as dedicated NetApp technology update sessions with product engineers, exclusive briefings about future and upcoming NetApp solutions and products, Access to a private slack channel for the community members to discuss all things technical and related to NetApp and other exclusive events at NetApp Insight events in US and EMEA. All of these perks are nice to have indeed as they enable us to share some of those information with the others out there as well as provide our own thoughts which would be beneficial for current or future NetApp customers out there.

As I work for a global NetApp partner, I am looking forward to using the access to information I have as a part of this program to better leverage our partnership with NetApp as well as to educate our joint customers on future NetApp strategy. As I am also an independent contributor (outside of work), I intend to share some of the information (outside of NDA stuff) with my general audiences to help you understand various NetApp solutions, strategy and my independent thoughts on them which I think is important. I have been working with NetApp for a long time, initially as a customer and then as a partner where I’ve always been a great fan of their core strategy which was always about Software, despite being a HW product manufacturer. They have some extremely awesome innovation already available in their portfolio and even better innovation in the making for future (Have a look at the recently concluded #SFD15 presentation from them about the Data Pipeline vision here) and I am looking forward to sharing some of these along with my thoughts with everyone.

The full list of all the NetApp United 2018 members can be found here. Congratulations to all those who got selected and Thank you NetApp & @PeytStefanova for the invitation and the recognition!



Cohesity: A secondary storage solution for the Hybrid Cloud?


A key part of my typical day job involves staying on top of new technologies and key developments in the world of enterprise IT, with an aim to spot commercially viable, disruptive technologies that are not just cool tech but also have a good business value proposition with a sustainable use case.

To this effect, I’ve been following Cohesity since its arrival to the mainstream market back in 2015, keeping up to date on some of their platform developments with various feature upgrades such as v2.0, v3.0…etc with interest. SFD15 gave me another opportunity to catch up with them and get an up to date view on their latest offerings & the future direction. I liked what I heard from them! Their solution now looks interesting, their marketing message is a little sharper than it was a while ago and I like the direction they are heading in.

Cohesity: Overview

Cohesity claims to be a specialist, software defined, secondary storage vendor who specializes in modernization of the secondary storage tier within the hybrid cloud. Such secondary storage requirements typically include copies of your primary / tier 1 data sets (Such as test & dev VM data and reporting & analytics data) or file shares (CIFS, NFS…etc.). These types of data  tends to be often quite large and therefore typically cost more to store and process. Therefor storing them on the same storage solution as your tier 1 data can be un-necessarily expensive which I can relate to, as an enterprise storage customer as well as a channel SE in my past lives, involved in sizing and designing various storage solutions for my customers. Often, most enterprise customers need separate, dedicated storage solutions to store such data outside of the primary storage cluster but they are stuck with the same, expensive primary storage vendors for choice. Cohesity offers to provide a single, tailor made secondary data platform that spans across both ends of the hybrid cloud to address all these secondary storage requirements. They also provide the ability to act as a hybrid cloud backup storage target too with some added data management capabilities on top so that not only can they store data backups, but also do interesting things with those backup data, across the full Hybrid Cloud spectrum.

With what appears to be decent growth last year (600% revenue growth YoY) and some good customers already onboard, it appears that customers may be taking notice too.

Cohesity: Solution Architecture

A typical Cohesity software defined storage (SDS) solution on-premises comes as an appliance and can start with 3 nodes to form a cluster that provide linear scalable growth. An appliance will typically be a 2U chassis that accommodate 4 nodes and any commodity or an OEM HW platform is supported. Storage itself consist of PCI-e Flash (up to 2TB per node) + capacity disk, which is the typical storage architecture of every SDS manufacturer these days. Again, similar to most other SDS vendors, Cohesity uses Erasure coding or RF2 data sharding across the Cohesity nodes (within each cluster) to provide data redundancy, as a part of the SpanFS file system. Note that given its main purpose as a secondary storage unit, it doesn’t have (or need) an All Flash offering, though they may move in to the primary storage use case, at least indirectly in the future.

Cohesity storage solution can be deployed across to remote and branch office locations as well as to cloud platforms using virtual Cohesity appliances to work hand in hand with the on-premises cluster. Customers can then enable cross cluster data replication and various other integration / interaction activities in a similar way to NetApp Data Fabric works for example for primary data. Note however that Cohesity does not permit the configuration of a single cluster across platforms as of yet (where you can deploy nodes from the same cluster on premises as well as on the cloud enabling Erasure Coding to perform data replication in the way Hedvig storage solution permits for example), but we were hinted that this is in the works for a future release.

Cohesity also have some analytics capabilities built in to the platform which can be handy. The analytics engine uses MapReduce natively within its engine to avoid the need to build external analytic focused compute clusters (such as Hadoop clusters) and having to move (duplicate) data sets to be presented for analysis. The Analytics Workbench on Cohesity platform currently permits external custom code to be injected in to the platform. This can be used to search for contents inside various files held on the Cohesity platform including pattern matching that enables customers to search for social security or credit card numbers which would be quite handy to enforce regulatory compliance. During the SFD15 presentation, we were explained that the capabilities of this platform is being rapidly enhanced to enhance additional regulatory compliance policy enforcements such as those of GDPR. Additional information on Cohesity Analytics capabilities can be found here. Additional video explaining how this works can also be found here.

Outside of these, given the whole Cohesity solution is backed by a distributed file system that is software defined, they naturally have all the software defined goodness expected from any SDS solution such as global deduplication, compression, replication, file indexing, snapshots, multi protocol access, Multi tenancy and QoS within their platform.

My thoughts

I like Cohesity’s current solution and where they are potentially heading. However, the key to their success in my view, would ultimately be their price point which I am yet to see to make sense of where they belong amongst competition.

From a technology and strategy standpoint, Cohesity’s key use cases are very valid and the way they aim to address those is pretty damn good. When you think about the secondary storage use case, cost of serving out less performance hungry, tier 2 data (often large and clunky in size) through an expensive tier 1 storage array (where you have to include larger SAN & NAS storage controllers + additional storage), I cannot help but think that Cohesity’s secondary storage play is quite relevant for many customers. Tier 1 storage solutions, classic SAN /NAS solutions as well HCI solutions such as VMware vSAN or Nutanix, are typically priced to reflect their tier 1 use case. So, a cheaper, more appropriate secondary storage solution such as Cohesity could help save lots of un-necessary SAN / NAS / HCI costs for many customers by being able to now downsize their primary storage solution requirements. This may even further enable more and more customers to embrace HCI solutions for their tier 1 workload too resulting in even less of a need to have expensive, hardware centric SAN / NAS solutions except for when they are genuinely necessary. After all, we are all being taught the importance of rightsizing everything (thanks to the utility computing model introduced by the Public clouds), so perhaps it’s about time that we all look to break down the tier 1 and tier 2 data in to appropriately sized tier 1 and tier 2 storage solutions to benefit from the reduced TCO for the customer? It’s important to note though, that this rightsizing will only likely going to appeal to customers with heavy storage use cases such as typical enterprises and large corporate customers rather than the average small to medium customer who requires a typical multipurpose storage solution to host some VMs + some file data. This is evident in the customer stats provided to us during SFD15, where 70% of their customers are enterprise customers.

Both their 2 key use cases, Tier 2 data storage as well as backup storage now looks to incorporate cloud capabilities and allows customers to do more than just storing tier 2 data and storing back ups. This is good and is very time relevant indeed. They seem to take a very data centric approach to their use cases and their secret source behind most of the capabilities, the proprietary file system called SpanFS looks and feels very much like NetApp’s cDOT architecture with some enhancements in parts. They are also partnering up with various primary storage solutions such as Pure to enable replication of backup snapshots from Pure to Cohesity, while introducing additional features like built in NAS data protection from NetApp, EMC, Pure, direct integration with VMware vCF for data protection, direct integration with Nutanix for AHV protection kind of moves them closer to Rubrik’s territory which is interesting and ultimately provides customers the choice which is a good thing.

From a hardware & OEM standpoint, Cohesity has partnered up with both HPe and Cisco already and have also made themselves available on HPe pricebook so that customers can order the Cohesity solution using a HPe SKU which is convenient, though I’d personally urge customers to order directly from Cohesity (using your trusted solutions provider) where possible, rather than ordering through an OEM vendor where the pricing may be fixed or engineered to position OEM HW when its not always required.

Given their mixed capabilities of tier 2 data storage, backup storage, and ever-increasing data management capabilities across platforms, they are coopeting if not competing with a number of others such as NetApp who has a similar data management strategy in their “Data pipeline” vision (who also removes the need to have multiple storage silos in the DC for Tier 2 data due to features such as Clustered Data OnTAP & FlexClones), Veeam or even Pure storage. Given their direct integration with various SW & HCI platforms removing the need to have 3rd party backup vendors, they are likely going to be competing directly with Rubrik more and more in the future. Cohesity’s strategy is primarily focused on tier 2 data management and the secondary focus is on data backups and management of that data whereas Rubrik’s strategy appears to be the same but opposite order of priorities (backup 1st, data management 2nd). Personally, I like both vendors and their solution positioning’s as I can see the strategic value in both solutions offerings for customers. But most importantly for Cohesity, there don’t appear to be any other storage vendor, specifically focused on the secondary storage market like they do so I can see a great future for them, as long as their price point remains relevant and that great innovation keeps continuing.

You can watch all the videos from the #SFD15 recorded at the Cohesity HW in Santa Clara here.

If you are an existing Cohesity user, I’d be very keen to get your thoughts, feedback using the comments section below.

A separate post to follow looking at Cohesity’s SmapFS file system and their key use cases!


A look at the Hedvig distributed Hybrid Cloud storage solution

During the recently concluded Storage Field Day event (SFD15), I had the chance to travel to the Software Defined Storage company Hedvig in their HQ in Santa Clara where were given a presentation by their engineering team (including the founder) of their solution offering. Now luckily, I knew Hedvig already due to my day job (which involves evaluating new, disruptive tech start-ups to form solutions reseller partnerships – I had already gone through this process with Hedvig a while back). However I learnt about a number of new updates to their solution and this post aims to cover their current solution offering and my thoughts of it, in the current enterprise storage market.

Hedvig: Company Overview

Similar to a number of new storage or backup start-ups came out of stealth in recent times, Hedvig too was founded by an engineer with a technology background, back in 2012. The founder Avinash Lakshman came from a distributed software engineering background, having worked on large scale distributed storage applications such as Amazon Dynamo and Cassandra. While they came out of stealth in 2015, they did not appear to have an aggressive growth strategy backed by an aggressive (read “loud”) marketing effort behind them and looked rather content at natural, organic growth. At least that was my impression seeing how they operated in the UK market anyway. However, during the SFD15 presentation, we found out that they’ve somewhat revamped their logo and related marketing collateral. So perhaps they may well have started to address this already?

Hedvig: Solution Overview

At the outset, they are similar to most other software defined storage start-ups these days that leverages any commodity server hardware on top of their software tier to build a comparatively low cost, software defined storage (SDS) solution. They also have genuine distributed capability to be able to distribute the SDS nodes not just within the data center, but also across data centers as well as cloud platforms, though it’s important to note most SDS vendors these days have got the same capability or are in the process of adding it to their SDS platforms.

Hedvig has positioned themselves as a SDS solution that is a perfect fit for traditional workload such as VMs, backup & DR as well as modern workloads such as Big data, HPC, object storage and various cloud native workloads too. Their solution provides block & file storage capability like most other vendors in their category, as well as object storage which is again another potentially (good) differentiator, especially compared to some of the other HCI solutions out there that often only provide one type or the other.

The Hedvig storage platform typically consist of Hedvig SW platform + commodity server hardware with local disks. Each server node can be a physical server or a VM on a cloud platform that runs the Hedvig software. The Hedvig software consist of,

  • Hedvig Storage Proxy
    • This is a piece of software deployed on the compute node (app server, container instance, hypervisor…etc.)
    • Presents file (NFS) & block (iSCSI) storage to compute environments and coverts that to Hedvig proprietary communication protocol with storage service.
    • Also performs caching of reads (writes are redirected).
    • Performs dedupe up front and writes deduped blocks to the back end (storage nodes) only if necessary
    • Each hypervisor runs a proxy appliance VM / VSA (x2 as a HA pair) which will serve all local IO on that hypervisor
  • Hedvig API
    • Presents object storage via S3 or Swift and full RESTful API from the storage nodes to the storage proxy.
    • Runs on the storage nodes
  • Hedvig Storage Services
    • Manages the storage cluster activities and interface with server proxies
    • Runs on the storage nodes and similar to the role of a typical storage processor / SAN or NAS controller
    • Each storage server has 2 parts
      • Data process
        • Local persistence
        • Replication
      • Metadata process
        • Communicate with each other
        • Distributed logic
        • Stored in a proprietary DB on each node
    • Each virtual disk provisioned in the front end is mapped 1:1 to a Hedvig virtual disk in the back end

The Hedvig storage nodes can be commodity or mainstream OEM vendor servers as customer’s chose to use. They will consist of SSD + Mechanical drives which is typical for other SDS vendors too and the storage nodes which runs the Storage services SW will typically be connected to each other using 10Gbe (or higher) standard Ethernet networking.

Like most other SDS solutions, they have typical SDS features and benefits such as dedupe, compression, auto-tiering, caching, snapshots & clones, data replication…etc. Another potentially unique offering they have here is the ability to set storage policies per virtual disk or per container granularity (in the back end), which is nice. The below are some of key storage policy configuration items that can be set per VM / vDisk granularity.

  • Replication Factor (RF) – Number of copies of the data to keep. Range form 1-6. Quorum = (RF/2)+1. This is somewhat similar to the VMware vSAN FTT if you are a vSAN person.
  • Replication policy – Agnostic, Rack aware and DC aware – Kind of similar to the concept of Fault Domains in vSAN for example. Set the scope of data replication for availability
  • Dedupe – Global dedupe across the cluster. Happens at 512B or 4K block size and is done in-line. Important to node that dedupe happens at the storage proxy level which is ensures no un-necessary writes take place in the back end. This is another USP compared to other SDS solution which is also nice.
  • Compression
  • Client caching
  • …etc.

Data replication, availability & IO operations

Hedvig stores data as containers across the cluster nodes to provide redundancy and enforce the policy configuration items regarding availability at container level. Each vDisk is broken down to 16GB chunks and based on the RF level assigned to the vDisk, will ensure the number of RF copies are maintained across a number of nodes (This is somewhat similar to VMware vSAN component size which is set at 256GB). Each of these 16GB chunks is what is known as a container. Within each node, Hedvig SW will group 3 disks in to a logical group called a storage pool and each container that belong to that storage pool will typically stripe the data across that storage pool’s disks. Storage pool and disk rebalancing occurs automatically during less busy times. Data replication will also take in to account the latency considerations if the cluster spans across multiple geo boundaries / DCs / Cloud environments.

Hedvig software maintains an IO locality in order to ensure best performance for read and write IOs where it will prioritise servicing IO from local & less busy nodes. One of the key things to note that during a write, the Hedvig software doesn’t wait for all the acknowledgement from all the storage nodes unlike some of its competitor solutions. As soon as the quorum is met (Quorum = RF/2 + 1, so if the RF is 4, with a remote node on the cloud or on a remote DC over a WAN link, as soon as the data is written locally to 3 local nodes), it will send the ACK back to the sender and the rest of the data writing / offloading can happen in the background. This ensures the faster write response times, and is probably a key architectural element in how they enable truly distributed nodes in a cluster, which can often include remote nodes over a low latency link, without a specific performance hit to a write operation. This is another potential USP for them, at least architecturally on paper, however in reality, will only likely to benefit if you have a higher RF factor in a large cluster.

Reads are also optimised through using a combination of caching at the storage proxy level as well as actual block reads in the back end prioritising local nodes (with a lower cost) to remote nodes. This is markedly different to how VMware vSAN works for example where it avoids the client-side cache locality in order to avoid skewed flash utilisation across the cluster as well as frequent cache re-warning during VMotion…etc. Both architectural decisions have their pros and cons in my view and I like Hedvig’s architecture as it optimises performance which is especially important in a truly distributed cluster.

A deep dive on this subject including the anatomy of a read and a write is available here.

Hedvig: Typical Use Cases

Hedvig, similar to most of its competition, aim to address number of use cases.

Software Defined Primary Storage

Hedvig operates in traditional storage mode (dedicated storage server nodes providing storage to a number of external compute nodes such as VMware ESXi or KVM or even a bare metal application server) or in Hyper-Converged mode where both compute and storage is provided on a single node. They also state that these deployment architectures can be mixed in the same cluster which is pretty cool.

  • Traditional SDS – Agent (storage proxy) running on the application server accessing the storage and speaks storage protocols. Agent also host local metadata and provide local caching amongst other things. Used in a non-hypervisor deployment such as bare metal deployments of app servers.
  • HCI mode – Agent (storage proxy) running on the Hypervisor (as a control VM / VSA – Similar to Nutanix). This is probably their most popular deployment mode.

Software Defined Hybrid Cloud Storage

Given the truly distributed nature of Hedvig solution platform, they provide a nice Hybrid cloud use case for the complete solution to extend the storage cluster across geographical boundaries including cloud platforms (IaaS instances). Currently supported cloud platforms by Hedvig include AWS, Azure and GCP. Stretching a cluster over to a cloud platform would involve IaaS VMs from the cloud platform being used as cluster nodes with available block storage from the cloud platform providing virtual disks as local drives for each cloud node. When you define Hedvig virtual disks, you get specify the data replication topology across the hybrid cloud. Important to note though that the client accessing those disks will be advised to be run within the same data center / cloud platform / region for obvious performance reasons.

Hedvig also now supports integrating with Docker for containerised workloads through their Docker volume plugin & integration with Kubernetes volume integration framework, similar to most of the other SDS solutions.

Hyper-Converged Backup

This is a something they’ve recently introduced but unless I’ve misunderstood, this is not so much a complete backup solution including offsite backups, but more of a snapshot capability at the array level (within the Hedvig layer). Again, this is similar to most other array level snapshots from other vendor’s solutions and can be used for immediate restores without having to rely on a hypervisor snapshot which would be inefficient. An external backup solution using a backup partner (such as Veeam for example) to offsite those snapshot backups is highly recommended as with any other SDS solution.

My thoughts

I like the Hedvig solution and some of its neat littles tricks such as the clever use of the storage proxy agent to offload some of the backend storage operations to the front end (i.e. dedupe) and therefore potentially reduce back end IO as well as network performance penalty to a minimum between the compute and storage layers. They are a good hybrid SDS solution that can cater for a mixed workload across the private data center as well as public cloud platforms. It’s NOT a specialised solution for a specific workload and doesn’t claim to provide a sub millisecond latency solution and instead, provide a good all-around storage solution that is architected from ground up to be truly distributed. Despite its ability to be used in a traditional storage as well as HCI mode, most of the real-life applications of its technology however, would likely be in a HCI world, with some kind of a Hyper-visor like vSphere ESXi or KVM.

Looking at the organisation itself and their core solution, it’s obvious that they’ve tried to solve a number of hardware defined storage issues that were prevalent in the industry at the time of their inception (2012), through the clever use of software. That act is commendable. However, the sad truth is that, since then, a lot has happened in the industry and a number of other start-ups and established vendors have also attempted to do the same, some with perhaps an unfair advantage due to having their own hypervisor too, which is a critical factor when it comes to your capabilities. Nutanix and VMware vSAN for example, developed similar SDx design principles and tried to address most of the same technical challenges. I fear that those vendors and their solutions were little aggressive in their plans and managed to get their go to market process right in my view, at a much bigger scale as well. Nutanix pioneered in creating a new SDS use case (HCI) in the industry and capitalised on it before everyone else did and VMware vSAN came out as a credible, and potentially better challenger to dominate this space. While Hedvig is independent from a hypervisor platform and therefore provide same capabilities across multiple platforms, the reality is that not many customers would need that capability as they’d be happy with a single Hypervisor & a storage platform. I also think Hedvig potentially missed a trick in their solution positioning in the market to create a differentiated message and win market share. As a result, their growth is nowhere near comparable to that of VMware vSAN or Nutanix for example.

As much as I like the Hedvig technology, I fear for their future and their future survival. Without some significant innovation and some business leadership involved in setting a differentiated strategy for their business, life would be somewhat be difficult, especially if they are to make a commercial success out of the as a company. Their technology is good and engineering team seems credible, but the competition is high and the market is somewhat saturated with so many general purpose SDS solutions as well as specialist SDS solutions aimed at specific workloads. Most of their competition also have much more resources at their disposal to throw at their solution, including more comprehensive marketing engines too. For these reasons, I fear that Hedvig may struggle to survive in their current path of generalised SDS solution and would potentially be better off in focusing on a specific use case / vertical …etc and focusing all their innovation efforts on that.

The founder and the CEO of the company still appears to be very much an engineer at heart still and having an externally sourced business leader with start-up experience to lead Hedvig in to the future may not be a bad thing for them in the long run either, in my view.

Keen to get your thoughts, especially if you are an existing Hedvig customer – Please comment below.

Slide credit goes to Hedvig and Tech Field Day team.

P.S. You can find all the TFD and SFD presentations about Hedvig via the link here.


VMware vExpert 2018

The latest batch of VMware vExperts in 2018 has just been announced and I’m glad to say I’ve made the cut for the 4th year which was fantastic news personally. The vExpert programme is VMware’s global evangelism and advocacy programme and is held in high regards within the community due to the expertise of the selected vExperts and their contribution towards enabling and empowering customers around the world with their virtualisation and software defined datacentre projects through knowledge sharing. The candidates are judged on their contribution to the community through activities such as community blogs, personal blogs, participation of events, producing tools…etc.. and in general, maintaining their expertise in related subject matters. vExperts typically get access to private betas, free licenses, early access product briefings, exclusive events, free access to VMworld conference materials, and other opportunities to directly interact with VMware product teams which is totally awesome and in return, help us to feed the information back to our customers…

It’s been a great honour to have been recognised by VMware again for this prestigious title and I’d like to thank VMware as well as congratulate the other fellow vExperts who have also made it this year. Let’s keep up the good work…!!

The full list of VMware vExperts 2018 can be found below

My vExpert profile link is below



Dropbox’s Magic Pocket: Power of Software Defined Storage


Dropbox is one of the poster boys of the modern-day tech start-ups, similar to the Uber’s and the Netflix’s of the world that was founded by engineers using their engineering prowess to help consumers around the world address various day to day challenges using technologies in a novel way. So, when I was informed that not only Dropbox would be presenting at the SFD15, but we’d also get to tour their state of the art data center, I was ecstatic. (perhaps ecstatic is an understatement!). I work with various technology vendors, from large vendors like Microsoft, Amazon, VMware, Cisco, NetApp…. etc to little known start-ups and Dropbox’s name is often mentioned in event keynote speeches, case studies…etc by most of these vendors as a perfect example of how a born in the cloud organisation can use modern technology efficiently. Heck, they are even referenced in some of the AWS training courses I’ve come across on Pluralsight that talk about Drobox’s ingenious way of using AWS S3 storage behind the scene to store file data content.

So, when I learned that they have designed and built their own Software Defined Storage solution to bring back most of their data storage from AWS on to their on data centres, I was quite curious to find out more details of the said platform and the reasoning behind the move back to on-premises. Given it’s the first time their engineering team openly discussed things, I was looking forward to talking their engineering team at the event.

This post summarises what I learnt from the Dropbox team.


I don’t think it’s necessary to introduce Dropbox to anyone these days. If, however you’ve been under a rock for the past 4 years, Dropbox is the pioneering tech organisation from the Silicon Valley that built an online content sharing and a collaboration platform that allows you to synchronise content between various end user devices automatically while letting you access them on any device, anywhere. During this process of data synchronisation and content sharing, they are dealing with,

  • 500+ million users
  • 500+ Petabytes of data storage
  • 750+ billion API calls handled

When they first went live, Dropbox used AWS’s S3 storage (PaaS) to store the actual user file data behind the scene, while their own web servers were used to host the metadata about those files and users. However, as their data storage requirements grew, the necessity to change this architecture was starting to outweigh the benefits such as the agility and ease provided by leveraging AWS cloud storage. As such, Dropbox decided to bring this file storage back in to their own data center on-premises. Dropbox states 2 unique reasons behind this decision: Performance requirements and the raw storage costs. Given the unique use case they have for block storage at extremely high scale, by designing a tailor-made cloud storage solution of their own engineered to provide maximum performance at the lowest unit cost, Dropbox was planning on saving a significant amount of operational costs. As a private company that is about to go in to a public IPO, saving costs was obviously high on their agenda.

Magic Pocket: Software Architecture

While the original name came from an old internal nick name to Dropbox itself, Magic Pocket (MP) now refers to their custom built, internally hosted, software defined, cloud storage infrastructure that is now used by Dropbox to host majority of their user’s file data. This is multi-exabytes in size, with data being fully replicated for availability and has a high data durability (12 x 9’s) and high availability (4 x 9’s).

Within the MP architecture, files are stored in to blocks and replicated across their geo boundaries within their internal infrastructure (back end storage nodes) for durability and availability. The data stored in the MP infrastructure consist of 4mb blocks that are immutable by design. Changes to the data in the blocks are tracked through a File Journal that is part of the metadata held on the Dropbox application servers. Due to the temporal locality of the data, bulk of the static data that are cold, are stored on high capacity, high latency but cheap spinning drives while meta data, cache data & DB’s are kept on high performance low latency, but expensive SSDs.

Unlike most enterprise focused Software Defined Storage (SDS) solutions that utilises some kind of quorum style consensus or distributed coordination to ensure data availability and integrity, MP utilises a simple, centralised, sharded MySQL cluster which is a bit of surprise. Data redundancy is made available through…yeah you guessed it! Customised Erasure coding, similar to many other enterprise SDS solutions however. Data is typically replicated at 1GB chunks (known as buckets) that consist of random, often contiguous 4K blocks. A bucket would replicate or Erasure coded across multiple physical servers (storage nodes) and a set of 1 or more buckets replicated to a set of nodes makes up a volume. This architecture is somewhat similar to how the enterprise SDS vendor Hedvig store their data in the back end.

In Dropbox’s SDS architecture, a pocket is similar to a fault domain in other enterprise SDS solutions and is a geographical zone (US east, US west & US Central for example). Each zone has a cluster of storage servers and other application servers and data blocks are replicated across multiple zones for availability. Pretty standard stuff so far.

Dropbox has a comprehensive Edge network which is geographically dispersed across the world to funnel all customer Drobox application’s connectivities through. The client connectivity path is Application (on user device) -> local pop (proxy servers in an edge location) > Block server > Magic Pocket infrastructure servers > Storage nodes. While the proxy servers in edge locations don’t store any caching of data and can almost be thought of as typical Web servers the clients connect through, the other servers such as Block/MP/Storage nodes servers are ordinary X86 servers stores within Dropbox’s own DCs. These servers are multi sourced as per best practise, and somewhat customised for Dropbox’s specific requirements, especially when it comes to storage node servers. Storage nodes are customised, high density, storage nodes with a capacity to have around 1PB of raw data in each server using local disks. All servers run a generic version of Ubuntu and runs bare metal rather than as VM’s.

Inside each zone, application servers such as Block & Magic Pocket app & db servers act as gateways for storage requests coming through the edge servers. These also hosts the meta data mapping for block placement (block index) in the backend and runs sharded MySQL clusters to store this information (running on SSD storage). Cross zone replication is also initiated in an asynchronous manner within this tier.

A cell is a logical entity of physical storage servers (a cluster of storage nodes) and that defines the core of the Dropbox’s proprietary storage backend which is worth a closer look. These have very large local disks and each storage server (node) consist of around 1PB of storage. These nodes are used as dumb nodes for block level data storage. Replication table, which runs in memory as a small MySQL DB stores the mapping of logical Bucket <-> Volume <-> Storage nodes. This is also part of the metadata stack and is stored on app / db servers with SSD storage.

Master is the software component within each cell that is acting as a janitor and performs back end tasks such as storage node monitoring, creating storage buckets, and other background maintenance operations. However the Master is not on the data plane so doesn’t affect the immediate data read / write operations. There’s a 1:1 mapping between master : Cell. Volume manager (another software component) can be thought of as the data movers / heavy lifters responsible for handling instructions from Master and performing operations accordingly on the storage nodes. Volume manager runs on the actual storage nodes (Storage servers) in the back end.

The front end (interface to the SDS platform) supports simple operations such as Put, Get and Repair. (Details of how this works can be found here)

Magic Pocket: Storage Servers

Dropbox’s customized, high density storage servers make up the actual back end storage infrastructure. Typically each server has a 40GB NIC, around 90 x high capacity enterprise SATA drives as local disks totalling up to around 1PB of raw space per node, runs a bare metal Ubuntu Linux with the Magic Pocket SDS application code and their life cycle management is heavily automated using proprietary and custom built tools. This set up provides a significantly large fault domain per each storage node given the huge capacity of each, but the wider SDS application and network load balancing capabilities architected in the application itself ensure mitigate or design against a complete failures of each server or a cell. We were treated to a scene of observing how this works in action when these engineering team decided to randomly pull the networking cables out while we were touring the DC, and then also cut the power to a full rack which had zero impact on the normal operations of Dropbox’s service. That was pretty cool to see.

My thoughts

Companies like Dropbox inspire me to think outside of the box when it comes to what is possible and how to address modern day business requirements using innovative ways using technology. Similar to the session on Open19 project (part of the Open Compute Project) from the LinkedIn engineering team during SFD12 event last year, this session has also hugely inspired me about the power of software & Hardware engineering and, the impact initiatives like this can have on the wider IT community at large, that we all live and breathe.

As for the Magic pocket SDS & HW architecture… I am a big fan and its great to see organisations such as Dropbox and Netflix (CDN architecture) who epitomises extreme ends of certain use cases, publicly opening up about the backend IT infrastructure that are powering their solutions so that 99% of the other enterprise IT folks can learn and adapt from those blueprints where relevant.

It is also important to remember though, for normal organisations with typical enterprise IT requirements, such custom-built solutions will not be practical nor would they be required and often, the best they’d need can be met with a similarly architected, commercially available Software Defined Storage solution and tailor to meet their requirements. The most important part here though is to realise the power of Software Defined Storage here. If Dropbox can meet their extreme storage requirements through a Software Defined Storage solution that operate on a lower cost premium than a proprietary storage solution, the average corporate or enterprise storage use cases do not have any excuse to keep buying expensive SAN / NAS hardware with a premium price tag. Most enterprise SDS storage solutions (VMware vSAN, Nutanix, Hedvig, Scality…etc.) all have a very similar software and a hardware architecture to that of Dropbox’s and carries a lower cost price point compared to expensive hardware centric storage solutions from the big vendors like EMC, NetApp, HPe, IBM…etc. So why not look in to a SDS solution to if your SAN / NAS is up for a renewal? You can very likely save significant costs and at the same time, benefit from a software defined innovation which tends to comes quicker when there’s no proprietary hardware baggage.

Given Dropbox’s unique scale and storage size, they’ve made a conscious decision to move away for the majority of their storage requirements from AWS (S3 storage) as it they’ve gone past the point where using cloud storage was not economical nor performant enough. But it is also important to remember that they only got to that point through the growth of their business which at the beginning, was only enabled by the agility provided by the very same AWS S3 cloud storage platform they decided to move away from. Most organisations out there are nowhere near the level of scale like Dropbox and therefore its important to remember that for your typical requirements, you can benefit significantly through the clever use of cloud technologies, especially PaaS technologies such as AWS S3, AWS Lambda, Microsoft O365, Azure SQL that provide a ready to use technology solutions platform without you having to build it all from the scratch. In most cases, that freedom and the speed of access can be a worthy trade-off for a slightly higher cost.

Keen to get your thoughts – get involved via comments button below!

Image credit goes to Dropbox!


Storage Field Day 15 – Watch Live Here

Following on from my previous post about the vendor line-up and my plans during the event, this post is to share the exact vendor presentation schedule and some additional details.

Watch LIVE!

Below is the live streaming link to the event on the day if you’d like to join us LIVE. While the time difference might make it a little tricky for some, it is well worth taking part in as all the viewers will also have the chance to ask questions from the vendors live, similar to the delegates onset. Just do it, you won’t be disappointed!

Session Schedule

Given below is the session schedule throughout the event, starting from Wednesday the 7th. All times are in Pacific time (-8 hours from UK time)

Wednesday the 7th of March

    • 09:30 – 11:30 (5:30-7:30pm UK time) – WekaIO presents
    • 13:00 – 15:00 (9-11pm UK time) – IBM presents
    • 16:00 – 18:00 (12-2am 8th of March, UK time) Dropbox presents

Thursday the 8th of March

  • 08:00-10:00 (4-6pm UK time) – Hedvig presents from their Santa Clara offices
  • 10:30-12:30 (6:30-8:30pm UK time) NetApp presents from their Santa Clara offices
  • 13:30-15:30 (9:30-11:30pm UK time) – Western Digital/Tegile presents from Levi’s Stadium
  • 16:00-18:00 (12-2am 9th of March, UK time) – Datrium presents from Levi’s Stadium

Friday the 9th of March

  • 08:00-10:00 (4-6pm UK time) – StarWinds presents in the Seattle Room
  • 11:00-13:00 (7-9pm UK time) – Cohesity presents at their San Jose offices
  • 14:00-16:00 (10pm-12am UK time) – Huawei presents at their Santa Clara offices

Storage Field Day 15 – Introduction

Having attended the Storage Field Day 12 edition back in 2017, I was really looking forward to attending this popular event again. Luckily I’ve been invited again to attend the Storage Field Day 15 with an exciting line up of various enterprise storage vendors. This post is a quick intro about the event and the schedule ahead.

SFD – Quick Introduction!

Storage Field Day is a part of the popular, invitees only, Tech Field Day series of events organised and hosted by Gestalt IT ( The genius idea behind the event is to bring together innovative technology solutions from various vendors (The “Sponsors”) who will be presenting their solutions live to a room full of independent technology bloggers and thought leaders (The “delegates”), chosen from around the world based on their knowledge, community profile and thought leadership, in order to get their independent thoughts (good or bad) of the said solutions. The event is also streamed live worldwide for anyone to tune in to and is often used by various technology start-ups to announce their arrival to the mainstream markets.

There are various different field day events organised by Gestalt IT such as Tech / Storage / Cloud / Mobility / Networking / Virtualisation / Unified Communications / Wirelesss Field Day events that take place throughout the year with respective technology vendor solutions showcased in each. It’s organised by the chief organiser Stephen Foskett (@SFoskett) and has always been an extremely popular event amongst the vendors as it provides an ideal opportunity for them to present their new products and solutions to a number of thought leaders and community influencers from around the world and get their valuable thoughts & feedback. It is equally popular amongst the attending delegates who gets the opportunity, not only to witness brand new technology at times, but also be able to critique and express their valuable feedback in front of these vendors.

During each day, the delegates and the organisers (Steve along with few of the supporting crew members such as the camera crew) would travel to each of the participating vendors offices (often in the Silicon Valley) where the technology & business leaders would be presenting there’s solutions to the audience. Typically, this whole session is streamed live thanks to the SFD camera crew and each session is also recorded to be posted on to various video sharing sites for subsequent viewings. I would seriously recommend have a look at their YouTube channel for past session recordings if you are a technology person with a keen interest in enterprise technology solutions.

SFD15 – Schedule & Vendor line-up

SFD15 is due to take place in the Silicon Valley between the 7-9th of March 2018. From what I understood, SFD15 has been a fiercely competitive event in terms of the sponsor slots by various vendors due to high popularity and I’ve noticed the participating vendor list changing few times up until now, presumably due to increasing competition. The list of vendors confirmed (as of today-17.02.2018) are as follows

  • Cohesity:               Cohesity is a relatively new, secondary storage vendor that I’ve been keeping a close eye on for a while now. I know their offering and their value proposition fairly well and am looking forward to understanding what’s new and their future plans.
  • Dropbox:               This is big and for the first time ever, Dropbox are publicly announcing their home grown, highly customised storage infrastructure that runs hand in hand with their own Software Defined Storage solution that together stores all of our data behind the scene. This is going to be epic. So DO NOT miss out on this one.
  • Hedvig:                   Hedvig is a Software Defined Storage company started about 5 years ago in the Silicon Valley by an ex engineer with distributed file system and database design experience at AWS and Facebook who invented Amazon DynamoDB & Cassandra. I’ve reviewed Hedvig’s technology quite closely in the past and have liked what I’ve seen. So, I’m very keen to see what they have to say this time around their latest offering and the future plans.
  • Datrium:                While I had come across these guys before, I didn’t know the details of their offering other than that they claim to be an (better) alternative typical converged and Hyper-Converged solutions out there. So looking forward to finding out a bit more about their offering.
  • NetApp:                  NetApp is a regular delegate at most Tech Field Day events and this time around, they are likely going to be presenting their overall storage strategy (A good one that I’m already familiar with) with possibly a major focus on their HCI offering (which I’m not fully sold on as a credible HCI offering compared to competition, but thought of as more a CI solution). So keen to find out a bit more to clarify on this front. NetApp is a company I work with very closely and have known for years and its always a pleasure to be able to visit their HO in Sunnyvale.
  • HGST:                      This will likely be the Tegile offering that was procured by Western Digital. Looking forward to this.
  • Huawei:                   Huawei is the popular Chinese data center IT behemoth that is growing in popularity each year. Incidentally, Huawei is a vendor partner of company and I know they have a growing presence in the UK public sector due to their lower costs on infrastructure components. While they manufacture everything from servers to software Defined Networking controllers, I’d presume the focus of their session during SFD15 would be on their storage offering. I am personally not fully familiar with Huawei’s storage offerings so this could be a great opportunity for me to right that wrong.
  • IBM:                           No introduction necessary for this one . Not sure what IBM storage solution they’ll be covering during SFD15.
  • StarWind:               StarWind who is a HCI solution vendor, presented in the last SFD12 I attended and its good to see them come back to SFD15 again. Keen to get an updated view on things.
  • WEKA.IO:               These guys claim to have the world’s fasted parallel file system and are totally new to me. So I am really looking forward to finding out the details of their offering and its value proposition.

My Plan

I will be travelling to San Jose from London Heathrow on the 6th with a view to catching up with this year’s delegates over the evening meal on the 6th which is always fun. There are some familiar faces from last years SFD12 as well as few new faces (to me) so looking forward to meeting these girls and boys. I am due to fly back to London on Monday the 12th and I expect to gather some intimate knowledge of these solutions. I am aiming to publish some posts on most of the interesting solutions I come across to provide a deep dive and my independent thoughts on them, during and after the event.

If you are interested in attending a future TFD / SFD event, all the information you need to know can be found here at

If you are interested in watching these sessions / presentations live, you can find the schedule and the live stream link information at I would seriously encourage you to watch the event live and do take part in the live questions as well.

SFD12 Posts

As mentioned above, I learnt a lot during the SFD12 participation last year about the storage industry in general as well about the direction of a number of storage vendors. If you are interested in finding out more, see my #SFD12 articles below

VMworld 2017 US – VMware Strategy & My Thoughts

This is a quick post to summerise all the key announcements from VMworld 2017 US event and share my thoughts and insights of the strategy and the direction of VMware, the way I see it.

Key Announcements

A number of announcements were made during the week on products and solutions and below is a high level list of those to recap.

  • Announced the launch of the VMware Cloud Services which consists of 2 main components
    • VMware Cloud on AWS (VMC)
      • Consist of VMware vSphere + vSAN + NSX
      • Running on AWS data centers (bare metal)
      • A complete Public Cloud platform consisting of VMware Software Defined Data Center components
      • Available as a
    • A complete Hybrid-Cloud infrastructure security, management & monitoring & Automation solution made available through a Software as a Service (SaaS) platform
      • Work natively with VMware Cloud on AWS
      • Also work with legacy, on-premises VMware data center
      • Also work with native AWS, Azure and Google public cloud platforms
  • Next generation of network virtualisation solution based NSX-T (aka NSX Multi hypervisor)
    • Version 2.0 announced
    • Supports vSphere & KVM
    • Likely going to be strategically more important to VMware than the NSX-v (vSphere specific NSX that is commongly used today by vSphere customers). Think What ESXi was for VMware when ESX was still around, during early days!



  • Next version of vRealize Network Insight (version 3.5) released
    • Various cloud platform integrations
    • Additional on-premises 3rd party integrations (Check Point FW, HP OneView, Brocade MLX)
    • Support for additional NSX component integration (IPFIX, Edge dashboard, NSX-v DFW PCI dashboard)


  • VMware AppDefense
    • A brand new application security solution that is available via VMware Cloud Services subscription


  • VMware Pivotal Container Services (PKS) as a joint collaboration between VMware, Pivotal & Google (Kubernetes)
    • Kubernetes support across the full VMware stack including NSX & vSAN
    • Support for Sever-Less solution capabilities using Functions as a Service (Similar to AWS Lambda or Azure Functions)
    • Enabling persistent storage for stateful applications via the vSphere Cloud Provider, which provides access to vSphere storage powered by vSAN or traditional SAN and NAS storage,
    • Automation and governance via vRealize Automation and provisioning of service provider clouds with vCloud Director,
    • Monitoring and troubleshooting of virtual infrastructure via VMware vRealize Operations
    • Metrics monitoring of containerized applications via Wavefront.


  • Workspace One enhancements and updates
    • Single UEM platform for Windows, MacOS, Chrome OS, IOS and Android
    • Integration with unique 3rd party endpoint platform API’s
    • Offer cloud based peer-to-peer SW distribution to deploy large apps at scale
    • Support for managing Chrome devices
    • Provides customers the ability to enforce & manage O365 security policies and DLP alongside all of their applications and devices
    • Workspace One intelligence to provide Insights and automation to enhance user experience (GA Q4 FY18)
  • VMware Integrated OpenStack 4.0 announced
    • OpenStack Ocata integration
    • Additional features include
      • Containerized apps alongside traditional apps in production on OpenStack
      • vRealize Automation integration to enable OpenStack users to use vRealize Automation-based policies and to consume OpenStack components within vRealize Automation blueprints
      • Increased scale and isolation for OpenStack clouds enabled through new multi-VMware vCenter support
    • New pricing & Packaging tier (not free anymore)
  • VMware Skyline
    • A new proactive support offering aligned to global support services
    • Available to Premier support customers (North America initially)
    • Requires an appliance deployment on premise
    • Quicker time to incident resolution

Cross Cloud Architecture Strategy & My Thoughts

VMware announced the Cross Cloud Architecture (CCA) back in VMworld 2016 where they set the vision for VMware to provide the capability to customers to run & manage any application, on any cloud using any device. This was ambitious and was seen as the first step towards VMware recognising that running vSphere on premise should no longer be VMware’s main focus and they want to provide customers with choice.

This choice of platform options were to be,

  • Continue to run vSphere on premise if that is what you want to do
  • OR, let customers run the same vSphere based SDDC stack on the cloud which can be spun up in minutes in a fully automated way (IaaS)
  • OR, run the same workload that used to run on a VMware SDDC platform on a native public cloud platform such as AWS or Azure or Google cloud or IBM Cloud

During that VMworld, VMware also demoed the capability of NSX to bridge all these various private and public cloud platforms through the clever use of NSX to extend networks across all of those platforms. Well, VMworld 2017 has shown additional steps VMware have taken to make this cross cloud architecture even more of a reality. VMware Cloud on AWS (VMC) now lets you spin up a complete VMware based Software Defined Data Center running vSphere on vSAN connected by NSX through a simple web page, much similar to how Azure and AWS native infrastructure platforms allows you to provision VM based infrastructure on demand. Based on some initial articles, this could even be cheaper than running vSphere on-premise which is great news for customers. In addition to this price advantage, when you factor in the rest of Total Cost of Ownership factors such as maintaining on premise skill to set up and manage the infrastructure platforms that are no longer needed, the VMC platform is likely going to be extremely interesting to most customers. And most importantly, most customers will NOT need to go through costly re-architecting of their monolithic application estate to fit a native cloud IaaS platform which simplifies cloud migration of their monolithic application stack. And if that is not enough, you also can carry on managing & securing that workload using the same VMware management and security toolset, even on the cloud too.

When you then consider the announcement of VMware Cloud Services (VCS) offering as a SaaS solution, it now enables integrating a complete VMware hybrid cloud management toolset in to various platforms and workloads, irrespective of where they reside. VCS enables the discovery, monitoring, management and securing of those workloads across different platforms, all through a single pane of glass which is a pretty powerful message that no other public cloud provider can claim to provide in such a heterogeneous manner. This holistic management and security platform allows customers to provision, manage and secure any workload (Monolithic or Microservices based) on any platform (vSphere on premise, VMC on AWS, native AWS, native Azure, Native Google cloud) to be accessed on any device (workstation, laptop, Pad or a mobile). That to me is a true Cross Cloud vision becoming a reality and my guess is once the platform matures and capabilities increase, this is going to be very popular amongst almost all customers.

In addition to this CCA capabilities, VMware obviously appear to be shifting their focus from the infrastructure layer (read “virtual machine”) to the actual application layer, focusing more on enabling application transformation and application security which is great to see. As many have already, VMware too are embracing the concept of containers, not only as a better application architecture but also as the best way to decouple the application from the underlying infrastructure and using containers as a shipping mechanism to enable moving applications across to public cloud (& back). The announcement of various integrations within their infrastructure stack to Docker ecosystem such as Kubernetes testifies to this and would likely be welcomed by customers. I’d expect such integration to continue to improve across all of VMware’s SDDC infrastructure stack. With VMware solutions, you can now deploy container based applications on on-premise vSphere using VIC or Photon or even VMC or a native public cloud platform, store them on vSAN with volume plugins on premise or on cloud, extend the network to the container instance via NSX (on premise or on cloud), extend visibility in to container instance via vRNI and vROPS (on premise or cloud) and also automate provisioning or most importantly, migration of these container apps across on-premise or public cloud platforms as you see fit.

NSX cloud for example will let you extend all the unique capabilities of software defined networking such as micro-segmentation, security groups and overlay network extensions to not just within private data centers but also to native public cloud platforms such as AWS & Azure (roadmap) which enriches the capabilities of a public cloud platform and increases the security available within the network.

My Thoughts

All in all, it was a great VMworld where VMware have genuinely showcased their Hybrid Cloud and Cross Cloud Architecture strategy. As a technologist that have been working with VMware for a while, it was pretty obvious that a software centric organisation like VMware, similar to the likes of Microsoft was always gonna embrace changes, especially changes driven by software such as the public cloud. However most people, especially sales people in the industry I work in as well as some of the customers were starting to worry about the future of VMware and their relevance in the increasingly Cloudy world ahead. This VMworld has showcased to all of those how VMware has got a very good working strategy to embrace that software defined cloud adoption and empower customers by giving them the choice to do the same, without any tie in to a specific cloud platform. The soaring, all time high VMware share price is a testament that analysts and industry experts agree with this too.

If I was a customer, I would want nothing more!

Keen to get your thoughts, please submit via comments below

Other Minor VMworld 2017 (Vegas) Announcements

  • New VMware & HPe partnership for DaaS
    • Include Workspace ONE to HPe DaaS
    • Include Unified Endpoint Management through Airwatch
  • Dell EMC to offer data protection to VMC (VMware Cloud on AWS)
    • Include Data Domain & Data protection app suite
    • Self-service capability
  • VCF related announcements
    • CenturyLink, Fujitsu & Rackspace to offer VCF + Services
    • New HCI and CI platforms (VxRack SDDC, HDS UCP-RS, Fujitsu PRIMEFLEX, QCT QxStack
    • New VCF HW partners
      • Cisco
      • HDS
      • Fujitsu
      • Lenovo
  • vCloud Director v9 announced
    • GA Q3 FY18
  • New vSphere scale-out edition
    • Aimed at Big data and HPC workloads
    • Attractive price point
    • Big data specific features and resource optimisation within vSphere
    • Includes vDS
  • VMware Validated Design (VVD) 4.1 released
    • Include a new optional consolidated DC architecture for small deployments
  • New VMware and Fujitsu partnerships
    • Fujitsu Cloud Services to delivery VMware Cloud Services
  • DXC Technology partnership
    • Managed Cloud service with VMC
    • Workload portability between VMC, DXC DCs and customer’s own DCs
  • Re-announced VMware Pulse IoT Center  with further integration to VMware solutions stack to manage IoT components




Introduction To VMware App Defense – Application Security as a Service

Yesterday at VMworld 2017 US, VMware annouced the launch of AppDefense. This post is a quick introduction to look a little closely at what it is & my initial thoughts on it.

AppDefense – What is it?

AppDefense is a solution that uses the Hypervisor to introspect the guest VM application behaviour. It involves analysing the applicaiton (within guest VM) behaviourestablishing its normaly operational behaviour (intended state) & once verified to be the accurate, constantly measuring the future state of those applications against the intended state & least privilege posture and controlling / remediating its behaviour should non-conformance is detected. The aim is increase application security to detect infiltrations at the application layer and automatically prevent propogation of those infiltrations untill remediation.

AppDefense is a cloud hosted managed solution (SaaS) from VMware that is hosted on AWS ( that is managed by VMware rather than an onpremises based monitoring & management solution. It is a key part of the SaaS solution stack VMware also announced yesterday, VMware Cloud Services. (A separate detailed post to follow about VMware Cloud Services)

If you know VMware NSX, you know that NSX will provide least privillege execution environment to prevent attacks or propogation of security attacks through enforcing least privillege at the network level (Micro-Segmentation). AppDefense adds an additional layer by enforcing the same least privillege model to the actual application layer as well within the VM’s guest OS.

AppDefense – How does it work?

The high level stages employed by AppDefense in identifying and providing application security consist of the following high level steps (based on what I understand as of now).

  1. Application base lining (Intended State):  Automatically identifying the normal behavious of an application and producing a baseline for the application based on its “normal” behavioural patters (Intended state).                                                    This intended state can come from analyzing normal, un-infected application behaviour within the guest or even from external application state definition platforms such as Puppet…etc. Pretty cool that is I think!  
  2. Detection:  It will then constantly monitor the application behaviour against this baseline to see if there are any deviations which could amont to potential malicious behaviuours. If any are detected, AppDefense will either block those alien application activities or automatically isolate the application using the Hypervisor constructs, in a similar manner to how NSX & 3rd party AV tools auto isolate guest introspection using heuristic analysis. AppDefense uses an in-memory process anomaly detector rather than taking a hash of the VM file set (which is often how 3rd party security vendors work) which is going to be a unique selling point, in comparison to typical AV tools. An example demo showed by VMware was on an application server that ordinarily talks to a DB server using a SQl server ODBC connectivity, where once protected by AppDefense, it automaticlaly blocks any other form of direct connectivity from that app server to the DB server (say a Powershell query or a script running on the app server for example) even if that happened to be on the same port that is already permitted. – That was pretty cool if you ask me.  
  3. Automated remediation:  Similar to above, it can then take remediation action to automatically prevent propogation.


AppDefense Architecture

AppDefense, despite being a SaaS application, will work with cloud (VMware Cloud on AWS) as well as on-premises enviornment. The onpremises proxy appliance will act as the broker. Future road map items will include extending capabilities to non vSphere as well as bare metal workloads onpremises. There will be an agent that is deployed in to the VM’s (guest agent) that will run inside a secure memory space to ensure it’s authenticity.

For the on-premis version, vCenter is the only mandatory pre-req whereas NSX mgr and vRA are optional and only required for remediation and provisioning. (No current plans for Security Manager to be available onsite, yet).

AppDefense Integration with 3rd parties*

  • IBM Security:
    • AppDefense plans to integrate with IBM’s QRadar security analytics platform, enabling security teams to understand and respond to advanced and insider threats that cut across both on-premises and cloud environments like IBM Cloud. IBM Security and VMware will collaborate to build this integrated offering as an app delivered via the IBM Security App Exchange, providing mutual customers with greater visibility and control across virtualized workloads without having to switch between disparate security tools, helping organizations secure their critical data and remain compliant.
  • RSA:
    • RSA NetWitness Suite will be interoperable with AppDefense, leveraging it for deeper application context within an enterprise’s virtual datacenter, response automation/orchestration, and visibility into application attacks. RSA NetWitness Endpoint will be interoperable with AppDefense to inspect unique processes for suspicious behaviors and enable either a Security Analyst or AppDefense Administrators to block malicious behaviors before they can impact the broader datacenter.
  • Carbon Black:
    • AppDefense will leverage Carbon Black reputation feeds to help secure virtual environments. Using Carbon Black’s reputation classification, security teams can triage alerts faster by automatically determining which behaviors require additional verification and which behaviors can be pre-approved. Reputation data will also allow for auto-updates to the manifest when upgrading software to drastically reduce the number of false positives that can be common in whitelisting.
  • SecureWorks:
    • SecureWorks is developing a new solution that leverages AppDefense. The new solution will be part of the SecureWorks Cloud Guardian™ portfolio and will deliver security detection, validation, and response capabilities across a client’s virtual environment. This solution will leverage SecureWorks’ global Threat Intelligence, and will enable organizations to hand off the challenge of developing, tuning and enforcing the security policies that protect their virtual environments to a team of experts with nearly two decades of experience in managed services.
  • Puppet:
    • Puppet Enterprise is integrated with AppDefense, providing visibility and insight into the desired configuration of VMs, assisting in distinguishing between authorized changes and malicious behavior

*Credit: VMware AppDefense release news

Having spoken to the product managers, my guess is these partnerships will grow as the product goes through its evolution to include many more security vendors.


Comparison to competition

In comparison to other 3rd party AV tools that have heuristic analysis tools that does similar anomaly detection within the guests, VMware AppDefense is supposed to have a number of unique selling points such as the ability to better understand distributed application behaviours than competition to reduce false positives, the ability to not jut detect but also take remediation orchesatration capabilities (through the use of vRA and NSX) as well as the near future roadmap to use Machine learning capabilities to enhance anomaly detection within the guest which is pretty cool.

Understanding the “Intended state”

Inteded state can come from various information collected from various data center state definition tools such as vCenter, Puppet, vRealize Automation & othr configuraoin management solutions as well as devlopper workflows such as Ansible, Jenkins…etc.

App Defense agent (runs in the guest OS) runs in a protected memory space within the guest (via the hypervisor) to store the security controls that is tamper proof (secure runtime). Any attempts to intrude in to this space are detected and actioned upon automatically. While this is secure, it’s not guranteed at the HW layer (Think HyTrust that uses Intel CPU capabilities such as TXT to achieve HW root of trust), though I suspect this will inevitably come down the line.


AppDefense – My (initial) Thoughts

I like the sound of it and its capabilities based on what I’ve seen today. Obviously its a SaaS based application and some people may not like that to monitor and enforce your security, especially if you have an on-premises environment that you’d like to monitor and manage security on, but if you can get over that mindset, this could be potentially quite good. But obviously if you use VMware Cloud Services, especially VMware Cloud on AWS for example, this would have direct integration with that platform to enforce application level security which could be quite handy. As with all products however, the devil is normally in the detail and the this version has only just been released so the details available is quite scarse in order to form a detailed & an accurate opinion. I will be aiming to test this out in detail in the coming future, both with VMware cloud on AWS as well as On-Premises VMware SDDC stack and provide some detailed insights. Furthermore, its a version 1.0 product and realistically, most production customers will likely wait until its battle hardened and becomes richer with capabilities such as using Hardware root of trust capabilities are added before using this for key production workloads.

However until then, its great to see VMware are focusing more on security in general and building in native, differentiated security capabilities focusing on the application layer which is equally important as the security at the infrastructure layer. I’m sure the product will evolve to incorporate things such as AI & machine learning to provide more sophisticated preventive measures in the future. The ability to taken static applicatio / VM state definitions from external platforms like Puppet is really useful and I suspect would probably be where this would be popular with customers, at least initially.

Slide credits go to VMware.!



VMworld 2017 – vSAN New Announcements & Updates

During VMworld 2017 Vegas, a number of vSAN related product announcements will have been made and I was privy to some of those a little earlier than the rest of the general public, due being a vSAN vExpert. I’ve summerised those below. The embargo on disclosing the details lifts at 3pm PST which is when this blog post is sheduled to go live automatically. So enjoy! 🙂

vSAN Customer Adoption

As some of you may know, popularity of vSAN has been growing for a while now as a preferred alternative to legacy SAN vendors when it comes to storing vSphere workloads. The below stats somewhat confirms this growth. I too can testify to this personally as I’ve seen a similar increase to the number of our own customers that consider vSAN as the default choice for storage now.

Key new Announcements

New vSAN based HCI Acceleration kit availability

This is a new ready node program being announced with some OEM HW vendors to provide distributed data center services for data centers to keep edge computing platforms. Consider this to be somewhat in between vSAN RoBo solution and the full blown main data center vSAN solution. Highlights of the offering are as follows

  • 3 x Single socket servers
  • Include vSphere STD + vSAN STD (vCenter is excluded)
  • Launch HW partners limited to Fujitsu, Lenovo, Dell & Super Micro only
  • 25% default discount on list price (on both HW & SW)
  • $25K starting price



  • My thoughts: Potentially a good move an interesting option for those customers who have a main DC elsewhere or are primarily cloud based (included VMware Cloud on AWS). The practicality of vSAN RoBo was always hampered by the fact that its limited to 25 VMs on 2 nodes. This should slightly increase that market adoption, however the key decision would be the pricing. Noticeably HPe are absent from the initial launch but I’m guessing they will eventually sign up. Note you have to have an existing vCenter license elsewhere as its not included by default.

vSAN Native Snapshots Announced

Tech preview of the native vSAN data protection capabilities through snapshots have been announced and will likely be generally available in FY18. vSAN native snapshots will have the following characteristics.

  • Snapshots are all policy driven
  • 5 mins RPO
  • 100 snapshots per VM
  • Support data efficiency services such as dedupe as well as protection services such as encryption
  • Archival of snapshots will be available to secondary object or NAS storage (no specific vendor support required) or even Cloud (S3?)
  • Replication of snapshots will be available to a DR site.

  • My thoughts: This was a hot request and something that was long time coming. Most vSAN solutions need a 3rd party data center back up product today and often, SAN vendors used to provide this type of snapshot based backup solution from the array (NetApp Snap Manager suite for example) that vSAN couldn’t match. Well, it can now, and since its done at the SW layer, its array independent and you can replicate or archive that anywhere, even on cloud and this would be more than sufficient for lots of customers with a smaller or a point use case to not bother buying backup licenses elsewhere to protect that vSphere workload. This is likely going to be popular. I will be testing this out in our lab as soon as the beta code is available to ensure the snaps don’t have a performance penalty.


vSAN on VMware Cloud on AWS Announced

Well, this is not massively new but vSAN is a key part of VMware Cloud on AWS and the vSAN storage layer provide all the on premise vSAN goodness while also providing DR to VMware Cloud capability (using snap replication) and orchestration via SRM.


vSAN Storage Platform for Containers Announced

Similar to the NSX-T annoucement with K8 (Kubernetes) support, vSAN also provide persistent storage presentation to both K8 as well as Docker container instances in order to run stateful containers.

This capability came from the vmware OpenSource project code named project Hatchway and its freely available via GitHub now.

  • My thoughts: I really like this one and the approach VMware are taking with the product set to be more and more microservices (container based application) friendly. This capability came from an opensource VMware project called Project hatchway and will likely be popular with many. This code was supposed to be available on GitHub as this is an opensource project but I have not been able to see anything within the VMware repo’s on GitHub yet.


So, all in all, not very many large or significant announcements for vSAN from VMworld 2017 Vegas (yet), but this is to be expected as the latest version of vSAN 6.6.1 was only recently released with a ton of updates. The key take aways for me is that the popularity of vSAN is obviously growing (well I knew this already anyways) and the current and future announcements are going to be making vSAN a fully fledged SAN / NAS replacement for vSphere storage with more and more native security, efficiency and availability services which is great for the customers.