NetApp Data Fabric: A la Hybrid Cloud! – An update from NetApp Insight 2018


History

For those of you who have genuinely been following NetApp as a storage company over the years, you may already know that NetApp, contrary to the popular belief as a storage company, has always been a software company at their core. Unlike most of their competitors back in the day such as EMC or even HPe, who were focused primarily on raw hardware capabilities and purpose built storage offerings specific for each use case, NetApp always had a single storage solution (FAS platform) with fit for purpose hardware. However their real strength was in the piece of software they developed on top (Data OnTAP) which offered so many different data services that often would require 2 or 3 different solutions altogether to achieve when it comes to their competition. That software driven innovation kept them punching well beyond their weight to be in the same league as their much bigger competitors.

Over the last few years however, NetApp did expand out their storage offerings to include some additional purpose built storage solutions out of necessity to address many niche customer use cases. They built the E series for raw performance use cases with minimal data services, EF for extreme all flash performance and acquired SolidFire offering which was also a very software driven, scalable storage solution built on commodity HW. The key for most of these storage solution offerings was still the software defined storage & software defined data management capabilities of each platform and the integration of all them through the software technologies such as SnapMirror and SnapVault to move data seamlessly in between these various platform.

In an increasingly software defined world (Public & Private cloud all powered primarily through software), the model of leading with software defined data storage and data management services enables many additional possibilities to expand things out beyond just these Data Center solutions for NetApp, as it turned out.

NetApp Data Fabric

NetApp Data Fabric was an extension of that OnTAP & various other software centric storage capabilities beyond the customer data centers in to other compute platforms such as Public clouds and 3rd party CoLo facilities that NetApp set their vision a while ago.

The idea was that customers can seamlessly move data across all these infrastructure platforms as and when needed without having to modify (think “convert”) the data. NetApp’s Data Fabric at its core, aims to address the data mobility problem caused by platform locking of data, by providing a common layer of core NetApp technologies to host data across all those tiers in a similar manner. In addition, it also aims to provide common set of tools that can be used to manage those data, on any platform, during their lifetime, from the initial creation of data at the Edge location, to processing the data at the Core (DC) and / or on various cloud platforms to then long term storage & archival storage on the core and / or Public cloud platforms. In a way, this provide customers the choice of platform neutrality when it comes to their data which, lets admit it, that is the life blood of most digital (that means all) businesses of today.

New NetApp Data Fabric

Insight 2018 showcased how NetApp managed to extend the initial scope of their Data Fabric vision beyond Hybrid Cloud to new platforms such as Edge locations too, connecting customer’s data across Edge to Core (DC) to Cloud platforms providing data portability. In addition, NetApp also launched a number of new data services to help manage and monitor these data, as they move from one pillar to another across the data fabric. NetApp CEO George Kurian described this new Data Fabric as a way of “Simplifying and integrating orchestration of data services across the Hybrid Cloud providing data visibility, protection and control amongst other features”. In a way, its very similar to VMware’s “Any App, Any device, Any cloud” vision, but in the case of NetApp, the focus is all about the data & data services.

The new NetApp Data Fabric consist of the following key data storage components at each of its pillars.

NetApp Hybrid Cloud Data Storage
  • Private data center
    • NetApp FAS / SolidFire / E / EF / StorageGRID series storage platforms & AltaVault backup appliance. Most of these components now directly integrates with public cloud platforms.
  • Public Cloud
    • NetApp Cloud Volumes        – SaaS solution that provides file services (NFS & SMB) on the cloud using a NetApp OnTAP.
    • Azure NetApp files        – PaaS solution running on physical NetApp FAS storage solutions on Azure DCs. Directly integrated in to Azure Resource Manager for native storage provisioning and management.
    • Cloud volumes ONTAP        – NetApp OnTAP virtual appliance that runs the same ONTAP code on the cloud. Can be used for production workloads, DR, File shares and DB storage, same as on-premises. Includes Cloud tiering and Trident container support as well as SnapLock for encryption.
  • Co-Lo (Adjacent to public clouds)NetApp private storage        – Dedicated, Physical NetApp FAS (ONTAP) or a FlexArray storage solution owned by the customer, that is physical adjacent to major cloud platform infrastructures. The storage unit is hosted in an Equinix data center with direct, low latency 10GBe link to Azure, AWS and GCP cloud back ends. Workloads such as VMs and applications deployed in the native cloud platform can consume data directly over this low latency link.
  • Edge locationsNetApp HCI            – Recently repositioned as a “Hybrid Cloud Infrastructure” rather than a “Hyper-Converged Infrastructure”, this solution provides a native NetApp compute + Storage solution that is tightly integrated with some of the key data services & Monitoring and management solutions from the Data Fabric (described below).

Data Fabric + NetApp Cloud Services

While the core storage infrastructure components of Data Fabric enables data mobility without the need to transform data across each hop, customers still need the tools to be able to provision, manage, monitor these data on each pillar of the data fabric. Furthermore, customers would also need to use these tools to manage the data across non NetApp platforms that are also linked to the Data Fabric storage pillars described above (such as native cloud platforms).

Insight 2018 (US) revealed the launch of some of these brand new data services & Tool from NetApp most of which are actually SaaS solutions hosted and managed by NetApp themselves on a cloud platform. While some of these services are fully live and GA, not all of these Cloud services are live just yet, but customers can trial them all free today.

Given below is a full list of the announced NetApp Cloud services that fall in to 2 categories. By design, these are tightly integrated with all the data storage pillars of the NetApp Data Fabric as well as other 3rd party storage and compute platforms such as AWS, Azure and 3rd party data center components.

NetApp Hybrid Cloud Data Services (New)

  • NetApp OnCommand Cloud Manager    – Deploy and manage Cloud Volumes ONTAP as well as discover and provision on-premises ONTAP clusters. Available as a SaaS or an on-premises SW.
  • NetApp Cloud Sync            – A NetApp SaaS offering that enables easier, automated data migration & synchronisation across NetApp and non NetApp storage platforms across the hybrid cloud. Currently supports Syncing data across AWS (S3, EFS), Azure (Blob), GCP (Storage bucket), IBM (Object storage) and NetApp StorageGRID.
  • NetApp Cloud Secure            – A NetApp SaaS security tool that aim to identify malicious data access across all Hybrid Cloud storage solutions. Connects to various storage back ends via a data collector and support NetApp Cloud Volumes, OnTAP, StorageGRID, Microsoft OneDrive, AWS, Google GSuite, HPe Command View. Dropbox, Box, Workplace and Office 365 as end points to be monitored. Not live yet and more details here.
  • NetApp Cloud Tiering            – Based on ONTAP Fabric Pools, enables direct tiering of infrequently used data from an ONTAP solution (on premises or on cloud) seamlessly to Azure blob, AWS S3 and IBM Cloud Object Storage. Not a live solution just yet but a technical preview is available.
  • NetApp SaaS Backup            – A NetApp SaaS backup solution for backing up Office 365 (Exchange online, SharePoint online, One drive for business, MS Teams and O365 Groups) as well as Salesforce data. Formerly known as NetApp Cloud Control. Can back up data to native storage or to Azure blob or AWS S3. Additional info here.
  • NetApp Cloud backup            – Another NetApp SaaS offering, purpose built for backing up NetApp Cloud Volumes (described above)
NetApp Cloud Management & Monitoring (New)
  • NetApp Kubernetes service        – New NetApp SaaS offering to provide enterprise Kubernetes as a service. Built around the NetApp acquisition of Stackpoint. Integrated with other NetApp Data Fabric components (NetApp’s own solutions) as well as public cloud platforms (Azure, AWS and GCP) to enable container orchestration across the board. Integrates with NetApp TRIDENT for persistent storage vlumes.
  • NetApp Cloud Insights            – Another NetApp SaaS offering built around ActiveIQ, that provides a single monitoring tool for visibility across the hybrid cloud and Data Fabric components. Uses AI & ML for predictive analytics, proactive failure prevention, dynamic topology mapping and can also be used for resource rightsizing and troubleshooting with infrastructure correlation capabilities.

My thoughts

In the world of Hybrid Cloud, customer data, from VMs to file data can now be stored in various different ways across various data centers, various different Edge locations and various different Public cloud platforms, all underpinned by different set of technologies. This presents an inevitable problem for customers where their data requires transformation each time it gets moved or copied across from one pillar to another (known as platform locking of data). This also means that it is difficult to seamlessly move that data across those platforms during its life time should you want to benefit from every pillar of the Hybrid cloud and different benefits inherent to each. NetApp’s new strategy, powered by providing a common software layer to store, move and manage customer data, seamlessly across all these platforms can resonate well with customers. By continuing to focus on the customer’s data, NetApp are focusing on the most important asset organisations of today, and most definitely the organisations of tomorrow, have. So enabling their customers to avoid un-necessary hurdles to move this asset from one platform to another is only going to go down well with enterprise customers.

This strategy is very similar to that of VMware’s for example (Any App, Any Device, Any Cloud) that aim to also address the same problem, albeit with a more application centric perspective. To their credit, NetApp is the only “Legacy Storage vendor” that has this all-encompassing strategy of having a common data storage layer across the full hybrid cloud spectrum where as most of their competition are either still focused on their data centre solutions with limited or minor integration to cloud through extending backup and DR capabilities at best.

Only time will tell how successful this strategy would be for NetApp, and I suspect most of that success or the failure will rely on the continued execution of this strategy successfully through building additional data and data management services and their positioning to address various Hybrid cloud use cases. But the initial feedback from the customers appears to be positive which is good to see. Being focused on the software innovation has always provided NetApp with an edge over their competitors and continuing on that strategy, especially in an increasingly software defined world is only bound to bring good things in my view.

Slide credit to NetApp & Tech Field Day!

Continuation of Any Cloud, Any Device & Any App strategy – An update from VMworld 2018 Europe

The beginning

As an avid technologist, I’ve always had a thing for disruptive technologies, especially those that are not just cool tech but also provide genuine business benefits. Some of these benefits are obvious at first, but some are often not even anticipated until after a technology innovation has been achieved.

VMware’s inception: Through the emulation of X86 computing components within software was one of these moments where the power of software driven computing started a whole new shift in the IT industry. In an age of Hardware centric IT, this software defined computing technology paved way to achieve genuine cost savings through consolidation of multiple servers in to a handful of servers instead. For me back then as a lowly server engineer, introduction to this technology was one of those “goose bump” moments, especially when I thought about the possibilities of where this technology innovation could take us going forward, especially when that’s extended beyond just computing.

Fast forward about 12 more years, the software defined capabilities extended beyond compute in to storage and networking too, paving the way for brand new possibilities such as cloud computing. Recognising the commoditisation of this software defined approach by various other vendors, VMware strategically changed their direction to focus on building tools and solutions that provide customers the choice to run any application, on any cloud platform, accessible by any end user device (PC & Mobile). This strategy was launched back in 2015 and I’ve blogged about it here.

Continuation of a solid strategy

Following on from vSphere, vSAN and NSX as pillars of core software defined data center (SDDC), last couple of years showed how this vision from VMware was coming in to reality through the launch of various new solutions as well as modernisation of exiting solutions. IBM cloud (based on SDDC) & VMware Cloud on AWS (based on SDDC) were launched to harness cloud computing capabilities for customers without having to re-platform their workloads saving transformation costs. Along with over 2000 VMware Cloud Provider partner platforms (built on SDDC) all of whom that runs these very same technologies underneath their cloud platforms, this common architecture enabled customers to easily move their workload from on premises to any of these platforms relatively easily. Introduction of technologies such as VMware HCX last year made it even easier through one click migration of these workloads as well as the ability to move a running workload on to a cloud platform with zero downtime (Cloud motion).

In addition to the core infrastructure components, the existing infrastructure management and monitoring toolset deployed on-premises (vRealize suite) was also revamped over the last few years such that they can manage and monitor these environments across all these cloud platforms. vRealize suite was now one of the best Cloud Management Platforms that could provision workloads on to on-prem & on native cloud platforms such as AWS and Azure providing a single pane of glass.

NSX capabilities were also extended to cloud platforms to effectively bring cloud platforms closer to on-premises data centers via network adjacency providing customers easy migration and fall back choices while maintaining networking integrity across both platforms. With these updates, the vision of “Any Cloud” became more of a reality, though most of the use cases were limited to IaaS capabilities across the cloud platforms.

During last year, VMware also launched a number of fully managed, born in the cloud SaaS applications under the category of VMware Cloud Services (v1.0) aimed at extending this “Any Cloud” capabilities to cover none IaaS platforms. These SaaS offerings enabled ability to provision, manage and run cloud native workloads on none vSphere based cloud platforms such as Azure and native AWS platforms. These extended the “Any cloud” capabilities right in to various PaaS platforms too enabling better value to customers. A list of these new solutions and updates were listed on my previous post here.

Last few years also showed us how VMware intended on achieving the “Any Device” vision through the Workspace One platform & Air Watch. Incremental feature upgrades ensured that support for a wide array of end user computing and mobile devices to consume various enterprise IT services in a consistent, secure manner, regardless of where the applications & the data are hosted (on-premises or cloud). These updates include support for key none vSphere based cloud platforms and even competitive technologies such as Citrix providing customers plenty of choice to use any device of their choice to access applications hosted via all major avenues such as Mobile / PC / VDI / Citrix / Microsoft RDS.

“Any App” vision of enabling customers deploy and run any application was all about providing support for traditional (VM) based apps, micro-services based apps (containers) and SaaS apps. The partnership with Google for the implementation formed and new products such as PKE were also launched to provision, manage and run container workloads via an enterprise grade Kubernetes platform, both on premises as well as on cloud platforms, making the Any App strategy also a reality.

Update in 2018!

2018’s VMworld (Europe) messaging was very much an incremental continuation of this same multi-platform, multi app and multi device strategy, adding additional capabilities for core use cases. Some of the new updates also showed how VMware are also adding new use cases such as Edge computing and IoT solutions in to the mix.

Some of the key updates to note from VMworld 2018 include,

  • Heptio acquisition:    To strengthen the VMware’s Kubernetes platform offerings (Complements on-premises focused PKS as well as a SaaS offering for managed Kubernetes in VKE)
  • VMware Cloud PKS:    PKS as a Service (managed by VMware) on AWS with support coming for VMware Cloud on AWS, Azure, GCP and vSphere
  • Project Dimension:    Fully managed VMware Cloud Foundation solution for on-premises with Hybrid Cloud control plane. Beta announced!
  • Launch of VCF 3.5:    Latest version of Cloud Foundation with incremental updates and cloud integration via HCX.
  • CloudHealth in VCS:    Integration of recently acquired CloudHealth in to the VMware cloud services (SaaS offering) portfolio which now extends the cloud platform cost monitoring and resource management as a SaaS offering with better cloud scalability than vROPs
  • Pulse IoT center aaS:    IoT Infrastructure management solution previously available as an on-premises solution now available as a service. Beta announced!
  • New SaaS solutions:    Additional solutions are announced such as Cloud Assembly (vRA aaS), Service broker & Code stream to enhance DevOps app delivery & management.
  • VMware Blockchain:    Enterprise blockchain service inherently more secure than public blockchain that is integrated to underlying VMware tools and technologies for enterprises to consume.

Amongst these, there were also other minor incremental updates to existing tools and solutions such as vRealize suite 2018, Log Intelligence, Wavefront updates to provide application telemetry data (similar to App Dynamics) from container based deployments, vSphere & vSAN incremental updates, availability of vSphere platinum edition (with bundled in AppDefense) that learn (Good app behaviour), lock (the state in) and adapts security (based on changes to the application), Adaptive micro-segmentation via integrating NSX & AppDefense, Increased geo availability of VMware Cloud on AWS (Ireland, Tokyo, N California, Ohio, Gov clud west), availability of AWS RDS on vSphere on premises to name few.

In addition to the above based on the previously established Any Cloud, Any Device & Any App strategy, VMware are also embracing different target markets such as Telco clouds by offering industry specific solutions through the use of their VeloCloud technologies, in preparation for the 5G revolution that is imminent in the industry and large telco Vodafone are helping VMWare co-engineer and test these solutions to ensure their business relevance.

So all in all, there weren’t any attention grabbing headline announcements in this year’s VMworld event, but the focus was rather on providing evidence of the execution of that strategy set back in 2015/2016. VMware’s increasing pivoting to Cloud based solutions is becoming more and more obvious as almost all the net new products and solutions announced within 2017 and 2018 VMworlds are all SaaS offerings managed by VMware. This is a powerful message and customers seem to take note too, if the record breaking 12,000 attendees of VMworld 2018 Europe is anything to go by.

As I mentioned at the beginning of this post, as these technology updates and new innovation is continuing, no doubt there will be additional use cases being realised, and the associated business requirements previously not envisioned being established. In an age of rapid advancements of technology that often driving new business requirements retrospectively, I like how VMware are pushing ahead with a coherent technology strategy focused on providing customer the choice to benefit from innovations across these technology platforms.

Tech Field Day 17

This post was republished to ChansBlog at 19:48:55 12/10/2018

Tech Field Day 17

Having attended the Storage Field Day 15 back in March, I’ve been lucky enough to be invited to also attend not only Tech Field Day 17 but also the Tech Field Day Extra at NetApp Insight 2018 (US) too this month. This post is a quick intro about the event and the schedules ahead.

Watch LIVE!

Below is the live streaming link to the event on the day if you’d like to join us LIVE. While the time difference might make it a little tricky for some, it is well worth taking part in as all the viewers will also have the chance to ask questions from the vendors live, similar to the delegates onset. Just do it, you won’t be disappointed!

TFD – Quick Introduction!

Tech Field Day is a an invitees only series of events organised and hosted by Gestalt IT (GestaltIT.com) to bring together innovative technology solutions from various vendors (The “Sponsors”) who will be presenting their solutions to a room full of independent technology bloggers and thought leaders (The “delegates”), chosen from around the world based on their knowledge, community profile and thought leadership, in order to get their independent thoughts (good or bad) of the said solutions. The event is also streamed live worldwide for anyone to tune in to and is often used by various technology start-ups to announce their arrival to the mainstream markets. It’s organised by the chief organiser Stephen Foskett (@Sfoskett) and has always been an extremely popular event amongst the vendors as it provides an ideal opportunity for them to present their new products and solutions as well as new start-ups coming out of stealth announcing their wares to the world. It is equally popular amongst the attending delegates who gets the opportunity, not only to witness brand new technology at times, but also be able to critique and express their valuable feedback in front of these vendors.

TFD17 – Schedule & Vendor line-up

SFD15 is due to take place in the Silicon Valley between the 17-19th of October 2018. The planned vendor line up and timing are as follows

Wednesday the 17th of October

1pm-3pm (9-11pm UK time)

Thursday the 18th of October

8am-10am (4-6pm UK time) 11am-1pm (7-9pm UK time) 3-5pm (11pm-1am* UK time)

Friday 19th of October

11am-1pm (7-9pm UK time)

TFD Extra – Schedule TBC (NetApp Insight 2018 US)

  • Monday the 22nd of October:
    • NetApp Insight general events
  • Tuesday the 23rd of October:
    • 8:30-10am Vegas time / 4:30-6pm UK time : General session keynote
    • Morning: Analysts summit general session
    • Afternoon: TFD Extra session
  • Wednesday the 24th of October:
    • 8:30-10am Vegas time / 4:30-6pm UK time : General session
    • Morning: TFD Extra session
    • Afternoon: TFD Extra session

Previous Field Day event Posts

I’ve learnt a lot during the previous SFD15 participation earlier this year about the storage industry in general as well about the direction of a number of storage vendors. If you are interested in finding out more, see my #SFD15 articles below

VMware Cloud on Azure? Really?

I work for a global channel partner of Microsoft, VMware & AWS  and one of the teammates recently asked me the question whether VMware Cloud on Azure (similar solution to VMware Cloud on AWS) would be a reality? It turned out that this was on the back of a statement from VMware CEO Pat where he supposedly mentioned “We have interest from our customers to expand our relationships with Google, Microsoft and others” & “We have announced some incremental expansions of those agreements“, which seems to have been represented in a CNBC article as that VMware cloud is coming to  Azure (Insinuating the reality of vSphere on Azure bare metal servers).

I’d sent my response back to the teammate outlining what I think of it and the reasoning for my thought process but I thought it would be good to get the thoughts of the wider community also as its a very relevant question for many, especially if you work in the channel, work for the said vendors or if you are a customer currently using the said technologies or planning on to moving to VMware Cloud on AWS.

Some context first,

I’ve been following the whole VMWare Cloud on Azure discussion since it first broke out last year and ever since VMware Cloud on AWS (VMWonAWS) was announced, there were some noise from Microsoft, specifically Corey Sanders (Corporate vice president of Azure) about their own plans to build a VMWonAWS like solution inside Azure data centers. Initially it looked like it was just a publicity stunt from MSFT to steal the thunder from AWS during the announcement of VMConAWS but later on, details emerged that, unlike VMWonAWS, this was not a jointly engineered solution between VMware & Microsoft, but a standalone vSphere solution running on FlexPod (NetApp storage and Cisco UCS servers) managed by a VMware vCAN partner who happened to host their solution in the same Azure DC, with L3 connectivity to Azure Resource Manager. Unlike VMWonAWS, there were no back door connectivity to the core Azure services, but only public API integration via internet. It was also not supposed to run vSphere on native Azure bare metal servers unlike how it is when it comes to VMWonAWS.

All the details around these were available on 2 main blog posts, one from Corey @ MSFT (here) and another from Ajay Patel (SVP, cloud products at VMware) here but the contents on these 2 articles have since been changed to either something completely different or the original details were completely removed. Before Corey’s post was modified number of times, he mentioned that they started working initially with the vCAN partner but later on, engaged VMware directly for discussions around potential tighter integration and at the same time, Ajay’s post (prior to being removed) also corroborated with the same. But none of that info is there anymore and while the 2 companies are likely talking behind the scene for some collaboration no doubt, I am not sure whether its safe for anyone to assume they are working on a VMWonAWS like solution when it comes to Azure.  VMWonAWS is a genuinely integrated solution due to months and months of joint engineering and while VMware may have incentives to do something similar with Azure, it’s difficult to see the commercial or the PR benefit of such a joint solution to Microsoft as that would ruin their exiting messaging around AzureStack which is supposed to be their only & preferred Hybrid Cloud solution.

My thoughts!

In my view, what Pat Gelsinger was saying above when he says (“we have interest from our customers to expand our relationship with Microsoft and others”) likely means something totally different to building a VMware Cloud on Azure in a way that runs vSphere stack on native Azure hardware. VMware’s vision has always been Any Cloud, Any App, Any device which they announced at VMWorld 2016 (read the summary http://chansblog.com/vmworld-2016-us-key-annoucements-day-1/) and the aspiration (based in my understanding at least) was to be the glue between all cloud platforms and on-premises which is a great one. So when it comes to Azure, the only known plans (which are probably what Pat was alluding to below) were the 2 things as per below,

  • To use NSX to bridge on-premises (& other cloud platforms) to Azure by extending network adjacency right in to the Azure edge, in a similar way to how you can stretch networks to VMWonAWS. NSX-T version 2.2.0 which GA’d on Wednesday the 6th of June can now support creating VMware virtual networks in Azure and being able to manage those networks within your NSX data center inventory. All the details can be found here. What Pat was probably doing was setting the scene for this announcement but it was not news, as that was on the roadmap for a long time since VMworld 2016. This probably should not be taken as VMware on Azure bare metal is a reality, at least at this stage.
  • In addition to that, the VMware Cloud Services (VCS – A SaaS platform announced in VMworld 2017 – more details here) will have more integration with native AWS, native Azure and GCP which is also what Pat is hinting here when he says more integration with Azure, but that too was always on the roadmap.

At least that’s my take on VMware’s plans and their future strategy. Things can change in a flash as the IT market is full of changes these days with so many competitors as well as co-petitors. But I just cant see, at least in the immediate future, there being a genuine VMware Cloud on Azure solution that runs vSphere on bare metal Azure hardware, that is similar to VMWonAWS, despite what that article from CNBC seems to insinuate.

What do you all think? Any insiders with additional knowledge or anyone with a different theory? Keen to get people’s thoughts!

Chan

VMware vSAN vExperts 2018

I’ve just found out that I’ve been selected to be a vSAN vExpert again this year which was great news indeed. The complete list of vSAN vExperts 2018 can be found at https://blogs.vmware.com/vmtn/2018/06/vexpert-vsan-2018-announcement.html

vSAN vExpert programme is a sub programme of the wider VMware vExpert programme where out of those already selected vExperts, people who have shown specific speciality and thought leadership around vSAN & related Hyper-Converged technologies are being recognised specifically for their efforts. vSAN vExpert programme only started back in 2016 and while I missed out during the first year, I was also a vSAN vExpert in 2017 too so it’s quite nice to have been selected again for 2018.

As a part of the vSAN vExpert program, selected members typically are entitled to a number of benefits such as NFR license keys for full vSAN suite for lab and demo purposes, access to vSAN product management team at VMware, exclusive webinars & NDA meetings, access to preview builds of the new software and also get a chance to provide feedback to the product management team on behalf of our clients which is great for me as technologist working in the channel.

I have been a big advocate of Software Defined everything for about 15 years now as, they way I saw it, the power in most technologies are often derived from software. Public cloud is the biggest testament for this we can see today. So when HCI became a “thing”, I was naturally a big promoter of the concept and realistically, the Software Defined Storage (SDS) which made HCI what it is, was something I’ve always seen the value in. While there are many other SDS tech have started to appear since then, vSAN was always something unique in that it’s more tightly coupled to the underlying hypervisor like no other HCI / SDS solution and this architectural difference was the main reason why I’ve always liked and therefore promoted the vSAN technology from beta days. Well, vSAN revenue numbers have grown massively for VMware since its first launch with vSAN 5.5 and now, the vSAN business unit within VMware is a self sufficient business in its own right. Since I am fortunate to be working for a VMware solutions provider partner here in the UK, I have seen first hand the number of vSAN solutions we’ve sold to our own customers have grown over 900% year on year between 2016 and 2017 which fully aligns with wider industry adoption of vSAN as a preferred storage option for most vSphere solutions.

This will only likely going to increase and some of the hardware innovation coming down the line such as Storage Class Memory integration and NVMe over Fabric technologies will further enhance the performance and reliability of genuinely distributed software defined storage technologies such as vSAN. So being recognised as a thought leader and a community evangelist for vSAN by VMware is a great honour as I can continue to share my thoughts, updates on the product development with the wider community for other people to benefit from.

So thank you VMware for the honour again this year, and congratulations for all the others who have also been selected to be vSAN vExperts 2018. Keep sharing your knowledge and thought leadership content…!

Chan

NetApp & Next Generation Storage Technologies

There are some exciting technology developments taking place in the storage industry, some behind closed doors but some that are also publicly announced and already commercially available that most of you may already have come across. Some of these are organic developments to build on existing technologies but some are inspired by megascalers like AWS, Azure, GCP and various other cloud platforms. I’ve been lucky enough to be briefed on some of these when I was at SFD12 last year I the Silicon Valley, by SNIA – The Storage and Networking Industry Association that I’ve previously blogged about here.

This time around, I was part of the Storage Filed Day (SFD15) delegate panel that got a chance to visit NetApp at their HQ at Sunnyvale, CA to find out more about some of exciting new product offerings that are in NetApp’s roadmap, either in the works or starting to just come out, incorporating some of these new storage technologies. This post aim to provide a summary of what I learnt there and my respective thoughts.

Introduction

It is no secret that Flash media has changed the dynamics of the storage market over the last decade due to their inherent performance characteristics. While the earliest incarnations of flash media were prohibitively expensive to be used in mass quantities, the invention of SSDs commoditised the use of flash media across the entire storage industry. For example, most tier 1 workloads in the enterprises today are held on a SSD backed storage system where SSD disk drives form the whole or a key part of the storage media stack.

When you look at some of the key storage solutions in use today, there are 2 key, existing flash technologies that stand out, DRAM & SSD. DRAM is the fastest possible flash storage media that is most easily accessible by the data processing compute subsystem while SSD’s fall in to next best place when it comes to speed of access and the level of performance (IOPS & bandwidth). As such, most enterprise storage solutions in the world, be that the ones aimed at the customer data centers or on the megascaler’s cloud platforms utilise one or both of these flash media types to either accelerate (caching) or simply store tier 1 data sets.

It is important to note that, while the SSD’s benefitted from the overall higher performance and lower latency compared to mechanical drives due to the internal architecture of the SSD disks themselves (flash storage cells that don’t require spinning magnetic media), both the SSD drives and classic mechanical (spinning) drives are typically attached & accessed by the compute subsystem via the same SATA or the SaS interface subsystem with the same interface speed & latency. Often the internal performance of an SSD was not fully realised to its maximum potential, especially in an aggregated scenario like that of an enterprise storage array, due to these interface controller access speed and latency limitations, as illustrated in the diagram below.

One of the more recent technology developments in the storage and compute industry, namely “Non-Volatile Memory Express” (NVMe) aims to address these SAS & SATA interface driven performance and the latency limitations through the introduction of new, high performance host controller interface that has been engineered from the ground up to be able to fully utilise flash storage drives. This new NVMe storage architecture is designed to be future proof and would be compatible with various future disk drive technologies that are NAND based as well as non-NAND based storage media.

NVMe SSD drives connected via these NVMe interfaces will not only outperform traditional SSD drives attached via SAS or SATA, but most importantly will enable higher future capabilities such as being able to utilise Remote Direct Memory Address (RDMA) for super high storage performance extending the storage subsystem over a fabric of interconnected storage and compute nodes. A good introduction to the NVMe technology and its benefits over SAS / SATA interfaces can be viewed here.

Another much talked about development on the same front is the subject of the Storage Class Memory (SCM) – Also known as Persistent Memory (PMEM). SCM is an organic successor to the NAND technology based SSD drives that we see in mainstream use in flash accelerated as well as all flash storage arrays today.

At a theoretical level, SCM can come in 2 main types as shown in the above diagram (from a really great IBM research paper published in 2013).

  • M-Type SCM (Synchronous) = Incorporate non-volatile memory based storage in to the memory access subsystem (DDR) rather than SCSI block based storage subsystem through PCIe, achieving DRAM like throughput and latency benefits for persistent storage. Typically take the form of NVDIMM (that is attached to the memory BUS, similar to traditional DRAM) which is the fastest and best performant thing, next to DRAM itself. Uses memory card slots and appear to the system to use as a caching layer or as pooled memory (extended DRAM space) depending on the NVDIMM type (NVDIMMs come in 3 types, NVDIMM-N, NVDIMM-F and NVDIMM-P. A good explanation available here).
  • S-Type SCM (Asynchronous) = Incorporate non-volatile memory based storage but attached via the PCIe connector to the storage subsystem. While this is theoretically slower than the above, it’s still significantly faster than NAND based SSD drives that are in common use today, including those attached via NVMe host controller interface. Intel and Samsung both have already launched S-type SCM drives, Intel with their 3D XPoint architecture and Samsung with Z-SSD respectively but current drive models available are aimed more at consumer / workstation rather than server workloads. Server based implementations of similar SCM drives will likely arrive around 2019. (Along with supported server based software included within operating systems such as Hypervisors – vSphere 7 anyone?)

The idea of the SCM is to address the latency and performance gap that exist in every computer system when it comes to memory and storage since the advent of X86 computing. Typically, access latency for DRAM is around 60ns, and the next best option today, NVMe SSD drives will have a typical latency of around 20-200us and the SCM will fit in between these 2, at a typical latency between 60ns-20uS, depending on the type of the SCM, with a significantly high bandwidth that is incomparable to SSD drives. It is important to note however that most ordinary workloads do not need this type of super latency sensitive, extremely high bandwidth storage performance, the next generation data technologies involving Artificial Intelligence techniques such as machine learning, real-time analytics that relies on processing extremely large swathes of data at super quick time, would absolutely benefit, and in most instance, necessitate the need for these next gen storage technologies to be fully effective.

NetApp’s NVMe & SCM vision

NetApp was one of the first classic storage vendors who incorporate flash in to their storage systems, in an efficient manner to accelerate the workloads that is typically stored on spinning disks. This started with the concept of NVRAM that was included in their flagship FAS storage solutions as an acceleration layer. Then came the flash cache (PAM cards) which were flash media attached via the PCIe subsystem to act as a cashing layer for reads which was also popular. Since the advent of all flash storage arrays, NetApp went another step by introducing all flash storage in to their portfolio through the likes of All Flash FAS platform that was engineered and tuned for all flash media as well as the EF series.

NetApp innovation and constant improvement process hasn’t stopped there. During SFD15 event, we were treated to the next step of this technology evolution by NetApp when they discussed how they plan to incorporate the above mentioned NVMe and SCM storage technologies in to their storage portfolio, in order to provide next gen storage capabilities to serve next gen use cases such as AI, big data and real-time analytics. Given below is a holistic roadmap plan of where NetApp see NVMe and SCM technologies fitting in to their roadmap, based on the characteristics, benefits and costs of each technology.

The planned use of NVMe is clearly in 2 different points of the host->storage array communication path.

  • NVMe SSD drives : NVMe SSD drives in a storage array, attached via NVMe host controller interface in order to be able to fully utilise the latency and throughput potential of the SSD drives themselves by the storage processor (in the controllers). This will provide additional performance characteristics to the existing arrays.
  • NVMe-OF : NVMe over fabric which is attached to the storage consumer nodes (Servers) via a ultra-low latency NVMe fabric. NVMe-OF enable the use of RDMA capabilities to reduce the distance between the IO generator and the IO processor thereby significantly reducing the latency. NVMe-OF therefore is widely touted to be the next big thing in storage industry and a number of specialists start-ups like Excelero have already come out to market with specialist solutions and you can find out more about it in my blog here. An example of the NVMe-OF storage solution available from NetApp is the new NetApp EF570 all flash array. This product is already shipping and more details can be found here or here. This platform offers some phenomenal performance numbers at ultra-low latency, built around their trusted, mature, feature rich, yet simple EF storage platform which is also a bonus.

The planned (or experimented) use of SCM is in 2 specific areas of the storage stack, driven primarily by the costs of the media vs the need for acceleration.

  • Storage controller side caching:        NetApp mentioned that some of the experiments they are working on with prototype solutions already built are looking at using SCM media on the storage controllers as a another tier to accelerate performance, in the same way PAM cards or Flash cache was used on the older FAS system. This a relatively straight forward upgrade and would be specially effective in an all flash FAS solution with SSD drives in the back end where a traditional flash cache card based on NAND cells would be less effective.
  • Server (IO generator) side caching:        This use case looks at using the SCM media on the host compute systems that generates the IO to act as a local cache, but most importantly, used in conjunction with the storage controllers rather than in isolation, performing tiering and snapshots from the host cache to a backend storage system like an All Flash FAS.
  • NetApp are experimenting on this front primarily using their recent acquisition of Plexistor and their proprietary software that performs the function of combining DRAM and SCM as a single address space that is byte addressable (via memory semantics which is much faster than scsi / NVMe addressable storage) and presenting that to the applications as a cache while also presenting the backend NetApp storage array such as an All Flash FAS as a persistent storage tier. The applications achieve significantly lower latency and ultra-high throughput this way through caching the hot data using the Plexistor file system which incidentally bypasses the complex Linux IO stack (Comparison below). The Plexistor tech is supposed to provide enterprise grade feature as a part of the same software stack though the specifics of what those enterprise grade features meant were lacking (Guessing the typical availability and management capabilities as natively available within OnTAP?)

Based on some of the initial performance benchmarks, the effect of this is significant, as can be seen below when compared to a normal

My thoughts

As an IT strategist and an Architect at heart with a specific interest in storage who can see super data (read “extremely large quantities of data”) processing becoming a common use case soon across most industries due to the introduction of big data, real-time analytics and the accompanying Machine Learning tech, I can see value in this strategy from NetApp. Most importantly, they are looking at using these advanced technologies in harmony with some the proven, tried and tested data management platforms they already have in the likes of OnTAP software could be a big bonus. The acquisition of Plexistor was a good move for NetApp and integrating their tech and having a shipping product would be super awesome if and when that happens but I would dare say the use cases would be somewhat limited prohibitive initially given the Linux dependency. Others are taking note and the HCI vendor Nutanix’s acquisition of PernixData kind of hints Nutanix also having a similar strategy to that of Plexistor and NetApp.

While the organic growth of current product portfolio with capabilities through incorporating new tech such as NVMe is fairly straight forward and help NetApp stay relevant, it remains to be seen however how well acquisition driven integration such as that of Plexistor with SCM technologies to the NetApp platform would pan out to become a shipping product. NetApp has historically had issues around the efficiency of this integration process which in the past has known to be slow but this time around, under the new CEO George Kurian who brought in a more agile software development methodology and therefore, a more frequent feature & update release cycle, things may well be different this time around. The evidence seen during SFD15 pretty much suggest the same to me which is great.

Slide credit to NetApp!

Thanks

Chan

NetApp United 2018 – No it’s not another football team!

I was glad to see an email from the NetApp united team this afternoon confirming that I’ve been selected as a member of the prestigious NetApp United (#NetAppUnited) team for 2018 which is a great honour indeed. Thanks NetApp!

Contrary to popular belief – NetApp United is NOT a football team but global community of individuals united by the passion for great technology. Similar to the VMware vExpert and Dell EMC elect programmes, NetAppUnited is a community programme run by NetApp (@PeytStefanova is the organiser in chief) to recognise global NetApp technology experts and community influencers with a view to giving them a platform to share more of their thoughts, contents, influence and ultimately share more of their expertise publicly though various community channels. Similar to the other community programs from other vendors, NetApp united is all about giving back to the community which is a good cause and I was happy to support.

Being recognised a member of the NetApp United program entitles you to a number of exclusive benefits such as dedicated NetApp technology update sessions with product engineers, exclusive briefings about future and upcoming NetApp solutions and products, Access to a private slack channel for the community members to discuss all things technical and related to NetApp and other exclusive events at NetApp Insight events in US and EMEA. All of these perks are nice to have indeed as they enable us to share some of those information with the others out there as well as provide our own thoughts which would be beneficial for current or future NetApp customers out there.

As I work for a global NetApp partner, I am looking forward to using the access to information I have as a part of this program to better leverage our partnership with NetApp as well as to educate our joint customers on future NetApp strategy. As I am also an independent contributor (outside of work), I intend to share some of the information (outside of NDA stuff) with my general audiences to help you understand various NetApp solutions, strategy and my independent thoughts on them which I think is important. I have been working with NetApp for a long time, initially as a customer and then as a partner where I’ve always been a great fan of their core strategy which was always about Software, despite being a HW product manufacturer. They have some extremely awesome innovation already available in their portfolio and even better innovation in the making for future (Have a look at the recently concluded #SFD15 presentation from them about the Data Pipeline vision here) and I am looking forward to sharing some of these along with my thoughts with everyone.

The full list of all the NetApp United 2018 members can be found here. Congratulations to all those who got selected and Thank you NetApp & @PeytStefanova for the invitation and the recognition!

Cheers

Chan

Cohesity: A secondary storage solution for the Hybrid Cloud?

Background

A key part of my typical day job involves staying on top of new technologies and key developments in the world of enterprise IT, with an aim to spot commercially viable, disruptive technologies that are not just cool tech but also have a good business value proposition with a sustainable use case.

To this effect, I’ve been following Cohesity since its arrival to the mainstream market back in 2015, keeping up to date on some of their platform developments with various feature upgrades such as v2.0, v3.0…etc with interest. SFD15 gave me another opportunity to catch up with them and get an up to date view on their latest offerings & the future direction. I liked what I heard from them! Their solution now looks interesting, their marketing message is a little sharper than it was a while ago and I like the direction they are heading in.

Cohesity: Overview


Cohesity claims to be a specialist, software defined, secondary storage vendor who specializes in modernization of the secondary storage tier within the hybrid cloud. Such secondary storage requirements typically include copies of your primary / tier 1 data sets (Such as test & dev VM data and reporting & analytics data) or file shares (CIFS, NFS…etc.). These types of data  tends to be often quite large and therefore typically cost more to store and process. Therefor storing them on the same storage solution as your tier 1 data can be un-necessarily expensive which I can relate to, as an enterprise storage customer as well as a channel SE in my past lives, involved in sizing and designing various storage solutions for my customers. Often, most enterprise customers need separate, dedicated storage solutions to store such data outside of the primary storage cluster but they are stuck with the same, expensive primary storage vendors for choice. Cohesity offers to provide a single, tailor made secondary data platform that spans across both ends of the hybrid cloud to address all these secondary storage requirements. They also provide the ability to act as a hybrid cloud backup storage target too with some added data management capabilities on top so that not only can they store data backups, but also do interesting things with those backup data, across the full Hybrid Cloud spectrum.

With what appears to be decent growth last year (600% revenue growth YoY) and some good customers already onboard, it appears that customers may be taking notice too.

Cohesity: Solution Architecture


A typical Cohesity software defined storage (SDS) solution on-premises comes as an appliance and can start with 3 nodes to form a cluster that provide linear scalable growth. An appliance will typically be a 2U chassis that accommodate 4 nodes and any commodity or an OEM HW platform is supported. Storage itself consist of PCI-e Flash (up to 2TB per node) + capacity disk, which is the typical storage architecture of every SDS manufacturer these days. Again, similar to most other SDS vendors, Cohesity uses Erasure coding or RF2 data sharding across the Cohesity nodes (within each cluster) to provide data redundancy, as a part of the SpanFS file system. Note that given its main purpose as a secondary storage unit, it doesn’t have (or need) an All Flash offering, though they may move in to the primary storage use case, at least indirectly in the future.

Cohesity storage solution can be deployed across to remote and branch office locations as well as to cloud platforms using virtual Cohesity appliances to work hand in hand with the on-premises cluster. Customers can then enable cross cluster data replication and various other integration / interaction activities in a similar way to NetApp Data Fabric works for example for primary data. Note however that Cohesity does not permit the configuration of a single cluster across platforms as of yet (where you can deploy nodes from the same cluster on premises as well as on the cloud enabling Erasure Coding to perform data replication in the way Hedvig storage solution permits for example), but we were hinted that this is in the works for a future release.

Cohesity also have some analytics capabilities built in to the platform which can be handy. The analytics engine uses MapReduce natively within its engine to avoid the need to build external analytic focused compute clusters (such as Hadoop clusters) and having to move (duplicate) data sets to be presented for analysis. The Analytics Workbench on Cohesity platform currently permits external custom code to be injected in to the platform. This can be used to search for contents inside various files held on the Cohesity platform including pattern matching that enables customers to search for social security or credit card numbers which would be quite handy to enforce regulatory compliance. During the SFD15 presentation, we were explained that the capabilities of this platform is being rapidly enhanced to enhance additional regulatory compliance policy enforcements such as those of GDPR. Additional information on Cohesity Analytics capabilities can be found here. Additional video explaining how this works can also be found here.

Outside of these, given the whole Cohesity solution is backed by a distributed file system that is software defined, they naturally have all the software defined goodness expected from any SDS solution such as global deduplication, compression, replication, file indexing, snapshots, multi protocol access, Multi tenancy and QoS within their platform.

My thoughts

I like Cohesity’s current solution and where they are potentially heading. However, the key to their success in my view, would ultimately be their price point which I am yet to see to make sense of where they belong amongst competition.

From a technology and strategy standpoint, Cohesity’s key use cases are very valid and the way they aim to address those is pretty damn good. When you think about the secondary storage use case, cost of serving out less performance hungry, tier 2 data (often large and clunky in size) through an expensive tier 1 storage array (where you have to include larger SAN & NAS storage controllers + additional storage), I cannot help but think that Cohesity’s secondary storage play is quite relevant for many customers. Tier 1 storage solutions, classic SAN /NAS solutions as well HCI solutions such as VMware vSAN or Nutanix, are typically priced to reflect their tier 1 use case. So, a cheaper, more appropriate secondary storage solution such as Cohesity could help save lots of un-necessary SAN / NAS / HCI costs for many customers by being able to now downsize their primary storage solution requirements. This may even further enable more and more customers to embrace HCI solutions for their tier 1 workload too resulting in even less of a need to have expensive, hardware centric SAN / NAS solutions except for when they are genuinely necessary. After all, we are all being taught the importance of rightsizing everything (thanks to the utility computing model introduced by the Public clouds), so perhaps it’s about time that we all look to break down the tier 1 and tier 2 data in to appropriately sized tier 1 and tier 2 storage solutions to benefit from the reduced TCO for the customer? It’s important to note though, that this rightsizing will only likely going to appeal to customers with heavy storage use cases such as typical enterprises and large corporate customers rather than the average small to medium customer who requires a typical multipurpose storage solution to host some VMs + some file data. This is evident in the customer stats provided to us during SFD15, where 70% of their customers are enterprise customers.

Both their 2 key use cases, Tier 2 data storage as well as backup storage now looks to incorporate cloud capabilities and allows customers to do more than just storing tier 2 data and storing back ups. This is good and is very time relevant indeed. They seem to take a very data centric approach to their use cases and their secret source behind most of the capabilities, the proprietary file system called SpanFS looks and feels very much like NetApp’s cDOT architecture with some enhancements in parts. They are also partnering up with various primary storage solutions such as Pure to enable replication of backup snapshots from Pure to Cohesity, while introducing additional features like built in NAS data protection from NetApp, EMC, Pure, direct integration with VMware vCF for data protection, direct integration with Nutanix for AHV protection kind of moves them closer to Rubrik’s territory which is interesting and ultimately provides customers the choice which is a good thing.

From a hardware & OEM standpoint, Cohesity has partnered up with both HPe and Cisco already and have also made themselves available on HPe pricebook so that customers can order the Cohesity solution using a HPe SKU which is convenient, though I’d personally urge customers to order directly from Cohesity (using your trusted solutions provider) where possible, rather than ordering through an OEM vendor where the pricing may be fixed or engineered to position OEM HW when its not always required.

Given their mixed capabilities of tier 2 data storage, backup storage, and ever-increasing data management capabilities across platforms, they are coopeting if not competing with a number of others such as NetApp who has a similar data management strategy in their “Data pipeline” vision (who also removes the need to have multiple storage silos in the DC for Tier 2 data due to features such as Clustered Data OnTAP & FlexClones), Veeam or even Pure storage. Given their direct integration with various SW & HCI platforms removing the need to have 3rd party backup vendors, they are likely going to be competing directly with Rubrik more and more in the future. Cohesity’s strategy is primarily focused on tier 2 data management and the secondary focus is on data backups and management of that data whereas Rubrik’s strategy appears to be the same but opposite order of priorities (backup 1st, data management 2nd). Personally, I like both vendors and their solution positioning’s as I can see the strategic value in both solutions offerings for customers. But most importantly for Cohesity, there don’t appear to be any other storage vendor, specifically focused on the secondary storage market like they do so I can see a great future for them, as long as their price point remains relevant and that great innovation keeps continuing.

You can watch all the videos from the #SFD15 recorded at the Cohesity HW in Santa Clara here.

If you are an existing Cohesity user, I’d be very keen to get your thoughts, feedback using the comments section below.

A separate post to follow looking at Cohesity’s SmapFS file system and their key use cases!

Chan

A look at the Hedvig distributed Hybrid Cloud storage solution

During the recently concluded Storage Field Day event (SFD15), I had the chance to travel to the Software Defined Storage company Hedvig in their HQ in Santa Clara where were given a presentation by their engineering team (including the founder) of their solution offering. Now luckily, I knew Hedvig already due to my day job (which involves evaluating new, disruptive tech start-ups to form solutions reseller partnerships – I had already gone through this process with Hedvig a while back). However I learnt about a number of new updates to their solution and this post aims to cover their current solution offering and my thoughts of it, in the current enterprise storage market.

Hedvig: Company Overview

Similar to a number of new storage or backup start-ups came out of stealth in recent times, Hedvig too was founded by an engineer with a technology background, back in 2012. The founder Avinash Lakshman came from a distributed software engineering background, having worked on large scale distributed storage applications such as Amazon Dynamo and Cassandra. While they came out of stealth in 2015, they did not appear to have an aggressive growth strategy backed by an aggressive (read “loud”) marketing effort behind them and looked rather content at natural, organic growth. At least that was my impression seeing how they operated in the UK market anyway. However, during the SFD15 presentation, we found out that they’ve somewhat revamped their logo and related marketing collateral. So perhaps they may well have started to address this already?

Hedvig: Solution Overview


At the outset, they are similar to most other software defined storage start-ups these days that leverages any commodity server hardware on top of their software tier to build a comparatively low cost, software defined storage (SDS) solution. They also have genuine distributed capability to be able to distribute the SDS nodes not just within the data center, but also across data centers as well as cloud platforms, though it’s important to note most SDS vendors these days have got the same capability or are in the process of adding it to their SDS platforms.

Hedvig has positioned themselves as a SDS solution that is a perfect fit for traditional workload such as VMs, backup & DR as well as modern workloads such as Big data, HPC, object storage and various cloud native workloads too. Their solution provides block & file storage capability like most other vendors in their category, as well as object storage which is again another potentially (good) differentiator, especially compared to some of the other HCI solutions out there that often only provide one type or the other.

The Hedvig storage platform typically consist of Hedvig SW platform + commodity server hardware with local disks. Each server node can be a physical server or a VM on a cloud platform that runs the Hedvig software. The Hedvig software consist of,

  • Hedvig Storage Proxy
    • This is a piece of software deployed on the compute node (app server, container instance, hypervisor…etc.)
    • Presents file (NFS) & block (iSCSI) storage to compute environments and coverts that to Hedvig proprietary communication protocol with storage service.
    • Also performs caching of reads (writes are redirected).
    • Performs dedupe up front and writes deduped blocks to the back end (storage nodes) only if necessary
    • Each hypervisor runs a proxy appliance VM / VSA (x2 as a HA pair) which will serve all local IO on that hypervisor
  • Hedvig API
    • Presents object storage via S3 or Swift and full RESTful API from the storage nodes to the storage proxy.
    • Runs on the storage nodes
  • Hedvig Storage Services
    • Manages the storage cluster activities and interface with server proxies
    • Runs on the storage nodes and similar to the role of a typical storage processor / SAN or NAS controller
    • Each storage server has 2 parts
      • Data process
        • Local persistence
        • Replication
      • Metadata process
        • Communicate with each other
        • Distributed logic
        • Stored in a proprietary DB on each node
    • Each virtual disk provisioned in the front end is mapped 1:1 to a Hedvig virtual disk in the back end

The Hedvig storage nodes can be commodity or mainstream OEM vendor servers as customer’s chose to use. They will consist of SSD + Mechanical drives which is typical for other SDS vendors too and the storage nodes which runs the Storage services SW will typically be connected to each other using 10Gbe (or higher) standard Ethernet networking.

Like most other SDS solutions, they have typical SDS features and benefits such as dedupe, compression, auto-tiering, caching, snapshots & clones, data replication…etc. Another potentially unique offering they have here is the ability to set storage policies per virtual disk or per container granularity (in the back end), which is nice. The below are some of key storage policy configuration items that can be set per VM / vDisk granularity.

  • Replication Factor (RF) – Number of copies of the data to keep. Range form 1-6. Quorum = (RF/2)+1. This is somewhat similar to the VMware vSAN FTT if you are a vSAN person.
  • Replication policy – Agnostic, Rack aware and DC aware – Kind of similar to the concept of Fault Domains in vSAN for example. Set the scope of data replication for availability
  • Dedupe – Global dedupe across the cluster. Happens at 512B or 4K block size and is done in-line. Important to node that dedupe happens at the storage proxy level which is ensures no un-necessary writes take place in the back end. This is another USP compared to other SDS solution which is also nice.
  • Compression
  • Client caching
  • …etc.

Data replication, availability & IO operations

Hedvig stores data as containers across the cluster nodes to provide redundancy and enforce the policy configuration items regarding availability at container level. Each vDisk is broken down to 16GB chunks and based on the RF level assigned to the vDisk, will ensure the number of RF copies are maintained across a number of nodes (This is somewhat similar to VMware vSAN component size which is set at 256GB). Each of these 16GB chunks is what is known as a container. Within each node, Hedvig SW will group 3 disks in to a logical group called a storage pool and each container that belong to that storage pool will typically stripe the data across that storage pool’s disks. Storage pool and disk rebalancing occurs automatically during less busy times. Data replication will also take in to account the latency considerations if the cluster spans across multiple geo boundaries / DCs / Cloud environments.

Hedvig software maintains an IO locality in order to ensure best performance for read and write IOs where it will prioritise servicing IO from local & less busy nodes. One of the key things to note that during a write, the Hedvig software doesn’t wait for all the acknowledgement from all the storage nodes unlike some of its competitor solutions. As soon as the quorum is met (Quorum = RF/2 + 1, so if the RF is 4, with a remote node on the cloud or on a remote DC over a WAN link, as soon as the data is written locally to 3 local nodes), it will send the ACK back to the sender and the rest of the data writing / offloading can happen in the background. This ensures the faster write response times, and is probably a key architectural element in how they enable truly distributed nodes in a cluster, which can often include remote nodes over a low latency link, without a specific performance hit to a write operation. This is another potential USP for them, at least architecturally on paper, however in reality, will only likely to benefit if you have a higher RF factor in a large cluster.

Reads are also optimised through using a combination of caching at the storage proxy level as well as actual block reads in the back end prioritising local nodes (with a lower cost) to remote nodes. This is markedly different to how VMware vSAN works for example where it avoids the client-side cache locality in order to avoid skewed flash utilisation across the cluster as well as frequent cache re-warning during VMotion…etc. Both architectural decisions have their pros and cons in my view and I like Hedvig’s architecture as it optimises performance which is especially important in a truly distributed cluster.

A deep dive on this subject including the anatomy of a read and a write is available here.

Hedvig: Typical Use Cases

Hedvig, similar to most of its competition, aim to address number of use cases.

Software Defined Primary Storage

Hedvig operates in traditional storage mode (dedicated storage server nodes providing storage to a number of external compute nodes such as VMware ESXi or KVM or even a bare metal application server) or in Hyper-Converged mode where both compute and storage is provided on a single node. They also state that these deployment architectures can be mixed in the same cluster which is pretty cool.

  • Traditional SDS – Agent (storage proxy) running on the application server accessing the storage and speaks storage protocols. Agent also host local metadata and provide local caching amongst other things. Used in a non-hypervisor deployment such as bare metal deployments of app servers.
  • HCI mode – Agent (storage proxy) running on the Hypervisor (as a control VM / VSA – Similar to Nutanix). This is probably their most popular deployment mode.

Software Defined Hybrid Cloud Storage

Given the truly distributed nature of Hedvig solution platform, they provide a nice Hybrid cloud use case for the complete solution to extend the storage cluster across geographical boundaries including cloud platforms (IaaS instances). Currently supported cloud platforms by Hedvig include AWS, Azure and GCP. Stretching a cluster over to a cloud platform would involve IaaS VMs from the cloud platform being used as cluster nodes with available block storage from the cloud platform providing virtual disks as local drives for each cloud node. When you define Hedvig virtual disks, you get specify the data replication topology across the hybrid cloud. Important to note though that the client accessing those disks will be advised to be run within the same data center / cloud platform / region for obvious performance reasons.

Hedvig also now supports integrating with Docker for containerised workloads through their Docker volume plugin & integration with Kubernetes volume integration framework, similar to most of the other SDS solutions.

Hyper-Converged Backup

This is a something they’ve recently introduced but unless I’ve misunderstood, this is not so much a complete backup solution including offsite backups, but more of a snapshot capability at the array level (within the Hedvig layer). Again, this is similar to most other array level snapshots from other vendor’s solutions and can be used for immediate restores without having to rely on a hypervisor snapshot which would be inefficient. An external backup solution using a backup partner (such as Veeam for example) to offsite those snapshot backups is highly recommended as with any other SDS solution.

My thoughts

I like the Hedvig solution and some of its neat littles tricks such as the clever use of the storage proxy agent to offload some of the backend storage operations to the front end (i.e. dedupe) and therefore potentially reduce back end IO as well as network performance penalty to a minimum between the compute and storage layers. They are a good hybrid SDS solution that can cater for a mixed workload across the private data center as well as public cloud platforms. It’s NOT a specialised solution for a specific workload and doesn’t claim to provide a sub millisecond latency solution and instead, provide a good all-around storage solution that is architected from ground up to be truly distributed. Despite its ability to be used in a traditional storage as well as HCI mode, most of the real-life applications of its technology however, would likely be in a HCI world, with some kind of a Hyper-visor like vSphere ESXi or KVM.

Looking at the organisation itself and their core solution, it’s obvious that they’ve tried to solve a number of hardware defined storage issues that were prevalent in the industry at the time of their inception (2012), through the clever use of software. That act is commendable. However, the sad truth is that, since then, a lot has happened in the industry and a number of other start-ups and established vendors have also attempted to do the same, some with perhaps an unfair advantage due to having their own hypervisor too, which is a critical factor when it comes to your capabilities. Nutanix and VMware vSAN for example, developed similar SDx design principles and tried to address most of the same technical challenges. I fear that those vendors and their solutions were little aggressive in their plans and managed to get their go to market process right in my view, at a much bigger scale as well. Nutanix pioneered in creating a new SDS use case (HCI) in the industry and capitalised on it before everyone else did and VMware vSAN came out as a credible, and potentially better challenger to dominate this space. While Hedvig is independent from a hypervisor platform and therefore provide same capabilities across multiple platforms, the reality is that not many customers would need that capability as they’d be happy with a single Hypervisor & a storage platform. I also think Hedvig potentially missed a trick in their solution positioning in the market to create a differentiated message and win market share. As a result, their growth is nowhere near comparable to that of VMware vSAN or Nutanix for example.

As much as I like the Hedvig technology, I fear for their future and their future survival. Without some significant innovation and some business leadership involved in setting a differentiated strategy for their business, life would be somewhat be difficult, especially if they are to make a commercial success out of the as a company. Their technology is good and engineering team seems credible, but the competition is high and the market is somewhat saturated with so many general purpose SDS solutions as well as specialist SDS solutions aimed at specific workloads. Most of their competition also have much more resources at their disposal to throw at their solution, including more comprehensive marketing engines too. For these reasons, I fear that Hedvig may struggle to survive in their current path of generalised SDS solution and would potentially be better off in focusing on a specific use case / vertical …etc and focusing all their innovation efforts on that.

The founder and the CEO of the company still appears to be very much an engineer at heart still and having an externally sourced business leader with start-up experience to lead Hedvig in to the future may not be a bad thing for them in the long run either, in my view.

Keen to get your thoughts, especially if you are an existing Hedvig customer – Please comment below.

Slide credit goes to Hedvig and Tech Field Day team.

P.S. You can find all the TFD and SFD presentations about Hedvig via the link here.

Chan

VMware vExpert 2018

The latest batch of VMware vExperts in 2018 has just been announced and I’m glad to say I’ve made the cut for the 4th year which was fantastic news personally. The vExpert programme is VMware’s global evangelism and advocacy programme and is held in high regards within the community due to the expertise of the selected vExperts and their contribution towards enabling and empowering customers around the world with their virtualisation and software defined datacentre projects through knowledge sharing. The candidates are judged on their contribution to the community through activities such as community blogs, personal blogs, participation of events, producing tools…etc.. and in general, maintaining their expertise in related subject matters. vExperts typically get access to private betas, free licenses, early access product briefings, exclusive events, free access to VMworld conference materials, and other opportunities to directly interact with VMware product teams which is totally awesome and in return, help us to feed the information back to our customers…

It’s been a great honour to have been recognised by VMware again for this prestigious title and I’d like to thank VMware as well as congratulate the other fellow vExperts who have also made it this year. Let’s keep up the good work…!!

The full list of VMware vExperts 2018 can be found below

My vExpert profile link is below

Cheers

Chan

My Online Technical Diary