A look at the Hedvig distributed Hybrid Cloud storage solution

During the recently concluded Storage Field Day event (SFD15), I had the chance to travel to the Software Defined Storage company Hedvig in their HQ in Santa Clara where were given a presentation by their engineering team (including the founder) of their solution offering. Now luckily, I knew Hedvig already due to my day job (which involves evaluating new, disruptive tech start-ups to form solutions reseller partnerships – I had already gone through this process with Hedvig a while back). However I learnt about a number of new updates to their solution and this post aims to cover their current solution offering and my thoughts of it, in the current enterprise storage market.

Hedvig: Company Overview

Similar to a number of new storage or backup start-ups came out of stealth in recent times, Hedvig too was founded by an engineer with a technology background, back in 2012. The founder Avinash Lakshman came from a distributed software engineering background, having worked on large scale distributed storage applications such as Amazon Dynamo and Cassandra. While they came out of stealth in 2015, they did not appear to have an aggressive growth strategy backed by an aggressive (read “loud”) marketing effort behind them and looked rather content at natural, organic growth. At least that was my impression seeing how they operated in the UK market anyway. However, during the SFD15 presentation, we found out that they’ve somewhat revamped their logo and related marketing collateral. So perhaps they may well have started to address this already?

Hedvig: Solution Overview


At the outset, they are similar to most other software defined storage start-ups these days that leverages any commodity server hardware on top of their software tier to build a comparatively low cost, software defined storage (SDS) solution. They also have genuine distributed capability to be able to distribute the SDS nodes not just within the data center, but also across data centers as well as cloud platforms, though it’s important to note most SDS vendors these days have got the same capability or are in the process of adding it to their SDS platforms.

Hedvig has positioned themselves as a SDS solution that is a perfect fit for traditional workload such as VMs, backup & DR as well as modern workloads such as Big data, HPC, object storage and various cloud native workloads too. Their solution provides block & file storage capability like most other vendors in their category, as well as object storage which is again another potentially (good) differentiator, especially compared to some of the other HCI solutions out there that often only provide one type or the other.

The Hedvig storage platform typically consist of Hedvig SW platform + commodity server hardware with local disks. Each server node can be a physical server or a VM on a cloud platform that runs the Hedvig software. The Hedvig software consist of,

  • Hedvig Storage Proxy
    • This is a piece of software deployed on the compute node (app server, container instance, hypervisor…etc.)
    • Presents file (NFS) & block (iSCSI) storage to compute environments and coverts that to Hedvig proprietary communication protocol with storage service.
    • Also performs caching of reads (writes are redirected).
    • Performs dedupe up front and writes deduped blocks to the back end (storage nodes) only if necessary
    • Each hypervisor runs a proxy appliance VM / VSA (x2 as a HA pair) which will serve all local IO on that hypervisor
  • Hedvig API
    • Presents object storage via S3 or Swift and full RESTful API from the storage nodes to the storage proxy.
    • Runs on the storage nodes
  • Hedvig Storage Services
    • Manages the storage cluster activities and interface with server proxies
    • Runs on the storage nodes and similar to the role of a typical storage processor / SAN or NAS controller
    • Each storage server has 2 parts
      • Data process
        • Local persistence
        • Replication
      • Metadata process
        • Communicate with each other
        • Distributed logic
        • Stored in a proprietary DB on each node
    • Each virtual disk provisioned in the front end is mapped 1:1 to a Hedvig virtual disk in the back end

The Hedvig storage nodes can be commodity or mainstream OEM vendor servers as customer’s chose to use. They will consist of SSD + Mechanical drives which is typical for other SDS vendors too and the storage nodes which runs the Storage services SW will typically be connected to each other using 10Gbe (or higher) standard Ethernet networking.

Like most other SDS solutions, they have typical SDS features and benefits such as dedupe, compression, auto-tiering, caching, snapshots & clones, data replication…etc. Another potentially unique offering they have here is the ability to set storage policies per virtual disk or per container granularity (in the back end), which is nice. The below are some of key storage policy configuration items that can be set per VM / vDisk granularity.

  • Replication Factor (RF) – Number of copies of the data to keep. Range form 1-6. Quorum = (RF/2)+1. This is somewhat similar to the VMware vSAN FTT if you are a vSAN person.
  • Replication policy – Agnostic, Rack aware and DC aware – Kind of similar to the concept of Fault Domains in vSAN for example. Set the scope of data replication for availability
  • Dedupe – Global dedupe across the cluster. Happens at 512B or 4K block size and is done in-line. Important to node that dedupe happens at the storage proxy level which is ensures no un-necessary writes take place in the back end. This is another USP compared to other SDS solution which is also nice.
  • Compression
  • Client caching
  • …etc.

Data replication, availability & IO operations

Hedvig stores data as containers across the cluster nodes to provide redundancy and enforce the policy configuration items regarding availability at container level. Each vDisk is broken down to 16GB chunks and based on the RF level assigned to the vDisk, will ensure the number of RF copies are maintained across a number of nodes (This is somewhat similar to VMware vSAN component size which is set at 256GB). Each of these 16GB chunks is what is known as a container. Within each node, Hedvig SW will group 3 disks in to a logical group called a storage pool and each container that belong to that storage pool will typically stripe the data across that storage pool’s disks. Storage pool and disk rebalancing occurs automatically during less busy times. Data replication will also take in to account the latency considerations if the cluster spans across multiple geo boundaries / DCs / Cloud environments.

Hedvig software maintains an IO locality in order to ensure best performance for read and write IOs where it will prioritise servicing IO from local & less busy nodes. One of the key things to note that during a write, the Hedvig software doesn’t wait for all the acknowledgement from all the storage nodes unlike some of its competitor solutions. As soon as the quorum is met (Quorum = RF/2 + 1, so if the RF is 4, with a remote node on the cloud or on a remote DC over a WAN link, as soon as the data is written locally to 3 local nodes), it will send the ACK back to the sender and the rest of the data writing / offloading can happen in the background. This ensures the faster write response times, and is probably a key architectural element in how they enable truly distributed nodes in a cluster, which can often include remote nodes over a low latency link, without a specific performance hit to a write operation. This is another potential USP for them, at least architecturally on paper, however in reality, will only likely to benefit if you have a higher RF factor in a large cluster.

Reads are also optimised through using a combination of caching at the storage proxy level as well as actual block reads in the back end prioritising local nodes (with a lower cost) to remote nodes. This is markedly different to how VMware vSAN works for example where it avoids the client-side cache locality in order to avoid skewed flash utilisation across the cluster as well as frequent cache re-warning during VMotion…etc. Both architectural decisions have their pros and cons in my view and I like Hedvig’s architecture as it optimises performance which is especially important in a truly distributed cluster.

A deep dive on this subject including the anatomy of a read and a write is available here.

Hedvig: Typical Use Cases

Hedvig, similar to most of its competition, aim to address number of use cases.

Software Defined Primary Storage

Hedvig operates in traditional storage mode (dedicated storage server nodes providing storage to a number of external compute nodes such as VMware ESXi or KVM or even a bare metal application server) or in Hyper-Converged mode where both compute and storage is provided on a single node. They also state that these deployment architectures can be mixed in the same cluster which is pretty cool.

  • Traditional SDS – Agent (storage proxy) running on the application server accessing the storage and speaks storage protocols. Agent also host local metadata and provide local caching amongst other things. Used in a non-hypervisor deployment such as bare metal deployments of app servers.
  • HCI mode – Agent (storage proxy) running on the Hypervisor (as a control VM / VSA – Similar to Nutanix). This is probably their most popular deployment mode.

Software Defined Hybrid Cloud Storage

Given the truly distributed nature of Hedvig solution platform, they provide a nice Hybrid cloud use case for the complete solution to extend the storage cluster across geographical boundaries including cloud platforms (IaaS instances). Currently supported cloud platforms by Hedvig include AWS, Azure and GCP. Stretching a cluster over to a cloud platform would involve IaaS VMs from the cloud platform being used as cluster nodes with available block storage from the cloud platform providing virtual disks as local drives for each cloud node. When you define Hedvig virtual disks, you get specify the data replication topology across the hybrid cloud. Important to note though that the client accessing those disks will be advised to be run within the same data center / cloud platform / region for obvious performance reasons.

Hedvig also now supports integrating with Docker for containerised workloads through their Docker volume plugin & integration with Kubernetes volume integration framework, similar to most of the other SDS solutions.

Hyper-Converged Backup

This is a something they’ve recently introduced but unless I’ve misunderstood, this is not so much a complete backup solution including offsite backups, but more of a snapshot capability at the array level (within the Hedvig layer). Again, this is similar to most other array level snapshots from other vendor’s solutions and can be used for immediate restores without having to rely on a hypervisor snapshot which would be inefficient. An external backup solution using a backup partner (such as Veeam for example) to offsite those snapshot backups is highly recommended as with any other SDS solution.

My thoughts

I like the Hedvig solution and some of its neat littles tricks such as the clever use of the storage proxy agent to offload some of the backend storage operations to the front end (i.e. dedupe) and therefore potentially reduce back end IO as well as network performance penalty to a minimum between the compute and storage layers. They are a good hybrid SDS solution that can cater for a mixed workload across the private data center as well as public cloud platforms. It’s NOT a specialised solution for a specific workload and doesn’t claim to provide a sub millisecond latency solution and instead, provide a good all-around storage solution that is architected from ground up to be truly distributed. Despite its ability to be used in a traditional storage as well as HCI mode, most of the real-life applications of its technology however, would likely be in a HCI world, with some kind of a Hyper-visor like vSphere ESXi or KVM.

Looking at the organisation itself and their core solution, it’s obvious that they’ve tried to solve a number of hardware defined storage issues that were prevalent in the industry at the time of their inception (2012), through the clever use of software. That act is commendable. However, the sad truth is that, since then, a lot has happened in the industry and a number of other start-ups and established vendors have also attempted to do the same, some with perhaps an unfair advantage due to having their own hypervisor too, which is a critical factor when it comes to your capabilities. Nutanix and VMware vSAN for example, developed similar SDx design principles and tried to address most of the same technical challenges. I fear that those vendors and their solutions were little aggressive in their plans and managed to get their go to market process right in my view, at a much bigger scale as well. Nutanix pioneered in creating a new SDS use case (HCI) in the industry and capitalised on it before everyone else did and VMware vSAN came out as a credible, and potentially better challenger to dominate this space. While Hedvig is independent from a hypervisor platform and therefore provide same capabilities across multiple platforms, the reality is that not many customers would need that capability as they’d be happy with a single Hypervisor & a storage platform. I also think Hedvig potentially missed a trick in their solution positioning in the market to create a differentiated message and win market share. As a result, their growth is nowhere near comparable to that of VMware vSAN or Nutanix for example.

As much as I like the Hedvig technology, I fear for their future and their future survival. Without some significant innovation and some business leadership involved in setting a differentiated strategy for their business, life would be somewhat be difficult, especially if they are to make a commercial success out of the as a company. Their technology is good and engineering team seems credible, but the competition is high and the market is somewhat saturated with so many general purpose SDS solutions as well as specialist SDS solutions aimed at specific workloads. Most of their competition also have much more resources at their disposal to throw at their solution, including more comprehensive marketing engines too. For these reasons, I fear that Hedvig may struggle to survive in their current path of generalised SDS solution and would potentially be better off in focusing on a specific use case / vertical …etc and focusing all their innovation efforts on that.

The founder and the CEO of the company still appears to be very much an engineer at heart still and having an externally sourced business leader with start-up experience to lead Hedvig in to the future may not be a bad thing for them in the long run either, in my view.

Keen to get your thoughts, especially if you are an existing Hedvig customer – Please comment below.

Slide credit goes to Hedvig and Tech Field Day team.

P.S. You can find all the TFD and SFD presentations about Hedvig via the link here.

Chan

Time of the Hybrid Cloud?

Hybrid Cloud

A little blog on something slightly less technical but equally important today. Not a marketing piece but just my thoughts on something I came across that I thought would be worth writing something about.

Background

I came across an interesting article this morning based on a Gartner research on last years global IT spend where it was revealed that global IT spent was down by about $216 Billion during 2015. However during the same year data center IT spend was up by 1.8% and is forecasted to go up to 3% within 2016. Everyone from IT vendors to resellers to every IT sales person you come across these days, on Internet blogs / news / LinkedIn or out in the field seem to believe (and make you believe) that the customer owned data center is dead for good and everything is or should be moving to the cloud (Public cloud that is). If all that is true, it made me wonder how the data center spend went up when in fact that should have gone down? One might think this data center spend itself was possibly fuelled by the growth in the public cloud infrastructure expansion due to increased demand on Public cloud platforms like Microsoft Azure and Amazon AWS. Make total sense right? Perhaps in the outset. But upon closer inspection, there’s a slightly complicated story, the way I see it.

 

Part 1 – Contribution from the Public cloud

Public cloud platforms like AWS are growing fast and aggressively and there’s no denying that. They address a need in the industry to be able to use a global, shared platform that can scale infinitely on demand and due to the sheer economy of scale these shared platform providers have, customers benefit from cheaper IT costs, especially compared to having to spec up a data center for your occasional peak requirements (that may only be hit once a month) and having to pay for it all upfront regardless of the actual utilisation can be an expensive exercise for many. With a Public cloud platform, the up front cost is cheaper and you pay per usage which makes it an attractive platform for many. Sure there are more benefits of using a public cloud platform than just the cost factor, but essentially “the cost” has always been the most key underpinning driver for enterprises to adopt public cloud since its inception. Most new start ups (Netflix’s of the world) and even some established enterprise customers who don’t have the baggage of legacy apps, (By legacy apps, I’m referring to client-server type of applications typically run on Microsoft Windows platform), are by default electing to predominantly use a cheaper Public cloud platform like AWS to locate their business application stack without owning their own data center kit. This will continue to be the case for those customers and therefore will continue to drive the expansion of Public cloud platforms like AWS. And I’m sure a significant portion of the growth of the data center spend in 2015 would have come from the increase of these pure Public cloud usage causing the cloud providers to buy yet more data center hardware.

 

Part 2 – Contribution from the “Other” cloud

The point is however, not all the data center spend increment within 2015 would have come from just Public cloud platforms like AWS or Azure buying extra kit for their data centres. When you look at numbers from traditional hardware vendors, HP’s numbers appear to be up by around 25% for the year and others such as Dell, Cisco, EMC also appear to have grown their sales in 2015 which appear to have contributed towards this increased data center spend.  It is no secret that none of these public cloud platforms use traditional data center hardware vendors kit in their Public cloud data centres.  They often use commodity hardware or even build servers & networking equipment themselves (lot cheaper). So  where would the increased sales for these vendors have come from? My guess is that they likely have come from most enterprise customers deploying Hybrid Cloud solutions that involves customers own hardware being deployed in their own  / co-location / off prem / hosted data centres (customer still own their kit) along with using an enterprise friendly Public cloud platform (mostly Microsoft Azure or VMware vCloud Air) acting as just another segment of their overall data center strategy. If you consider most of the established enterprise customers, the chances are that they have lots of legacy applications that are not always cloud friendly. By legacy applications, I mean typical WINTEL applications that typically conform to the client server architecture. These apps would have started life in the enterprise since Windows NT / 2000 days and have grown with their business over time. These applications are typically not cloud friendly (industry buzz word is “Cloud Native”) and often moving these as is on to a Public cloud platform like AWS or Azure is commercially or technically not feasible for most enterprises. (I’ve been working in the industry since Windows 2000 days and I can assure you that these type of apps still make up a significant number out there). And this “baggage” often prevents many enterprises from purely using just Public cloud (sure there are other things like compliance that gets in the way too of Public cloud but over time, Public cloud system will naturally begin to cater properly for compliance requirements…etc. so these obstacles would be short lived). While a small number of those enterprises will have the engineering budget and the resources necessary to re-design and re-develop these legacy app stacks to be a more modern & cloud native stack, most of them will not have that luxury. Often such redevelopment work are expensive and most importantly, time consuming and disruptive.

So, for most of these customers, the immediate tactical solution is to resort to a Hybrid cloud solution where the legacy “baggage” app stack live on a legacy data center and all newly developed apps will likely be developed as cloud native (designed and developed from ground up) on an enterprise friendly Public cloud system such as Microsoft Azure or VMware vCloud Air. An overarching IT operations management platform (industry buzz word “Cloud Management Platform”) will then manage both the customer owned (private) portion and the Public portion of the Hybrid cloud solution seamlessly (with caveats of course). I think this is what has been happening in 2015 and this may also explain the growth of legacy hardware vendor sales at the same time. Since I work for a fairly large global reseller, I’ve witnessed this increased hardware sales first hand from the traditional data center hardware vendor partners (HP, Cisco…etc.) through our business too which adds up. I believe this adoption of Hybrid cloud solutions will continue through out 2016 and possibly beyond for a good while, at least until such time that all legacy apps are eventually all phased out but that could be a long while away.

 

Summary

So there you have it. In my view, Public cloud will continue to grow but if you think that it will replace customer owned data center kit anytime soon, that’s probably unlikely. At least 2015 has proved that both Public cloud and Private cloud platforms (through the guise of Hybrid cloud) have grown together and my thoughts are that this will continue to be the case for a good while. Who knows, I may well be proven wrong and within 6 months, AWZ & Azure & Google Public clouds will devour all private cloud platforms and everybody would be happy on just Public cloud :-). But the common sense suggest otherwise. I can see lot more Hybrid cloud deployments in the immediate future (at least few years) using mainly Microsoft Azure and VMware vCloud Air platforms.  Based on technologies available today, these 2 in my view stand out as probably the best suited Public cloud platforms with a strong Hybrid cloud compatibility given their already popular presence in the enterprise data center (for hosting legacy apps efficiently) as well as each having a good overarching cloud management platform that customers can use to manage their Hybrid Cloud environments with.

 

Thoughts and comments are welcome….!!