NetApp Data Fabric: A la Hybrid Cloud! – An update from NetApp Insight 2018


History

For those of you who have genuinely been following NetApp as a storage company over the years, you may already know that NetApp, contrary to the popular belief as a storage company, has always been a software company at their core. Unlike most of their competitors back in the day such as EMC or even HPe, who were focused primarily on raw hardware capabilities and purpose built storage offerings specific for each use case, NetApp always had a single storage solution (FAS platform) with fit for purpose hardware. However their real strength was in the piece of software they developed on top (Data OnTAP) which offered so many different data services that often would require 2 or 3 different solutions altogether to achieve when it comes to their competition. That software driven innovation kept them punching well beyond their weight to be in the same league as their much bigger competitors.

Over the last few years however, NetApp did expand out their storage offerings to include some additional purpose built storage solutions out of necessity to address many niche customer use cases. They built the E series for raw performance use cases with minimal data services, EF for extreme all flash performance and acquired SolidFire offering which was also a very software driven, scalable storage solution built on commodity HW. The key for most of these storage solution offerings was still the software defined storage & software defined data management capabilities of each platform and the integration of all them through the software technologies such as SnapMirror and SnapVault to move data seamlessly in between these various platform.

In an increasingly software defined world (Public & Private cloud all powered primarily through software), the model of leading with software defined data storage and data management services enables many additional possibilities to expand things out beyond just these Data Center solutions for NetApp, as it turned out.

NetApp Data Fabric

NetApp Data Fabric was an extension of that OnTAP & various other software centric storage capabilities beyond the customer data centers in to other compute platforms such as Public clouds and 3rd party CoLo facilities that NetApp set their vision a while ago.

The idea was that customers can seamlessly move data across all these infrastructure platforms as and when needed without having to modify (think “convert”) the data. NetApp’s Data Fabric at its core, aims to address the data mobility problem caused by platform locking of data, by providing a common layer of core NetApp technologies to host data across all those tiers in a similar manner. In addition, it also aims to provide common set of tools that can be used to manage those data, on any platform, during their lifetime, from the initial creation of data at the Edge location, to processing the data at the Core (DC) and / or on various cloud platforms to then long term storage & archival storage on the core and / or Public cloud platforms. In a way, this provide customers the choice of platform neutrality when it comes to their data which, lets admit it, that is the life blood of most digital (that means all) businesses of today.

New NetApp Data Fabric

Insight 2018 showcased how NetApp managed to extend the initial scope of their Data Fabric vision beyond Hybrid Cloud to new platforms such as Edge locations too, connecting customer’s data across Edge to Core (DC) to Cloud platforms providing data portability. In addition, NetApp also launched a number of new data services to help manage and monitor these data, as they move from one pillar to another across the data fabric. NetApp CEO George Kurian described this new Data Fabric as a way of “Simplifying and integrating orchestration of data services across the Hybrid Cloud providing data visibility, protection and control amongst other features”. In a way, its very similar to VMware’s “Any App, Any device, Any cloud” vision, but in the case of NetApp, the focus is all about the data & data services.

The new NetApp Data Fabric consist of the following key data storage components at each of its pillars.

NetApp Hybrid Cloud Data Storage
  • Private data center
    • NetApp FAS / SolidFire / E / EF / StorageGRID series storage platforms & AltaVault backup appliance. Most of these components now directly integrates with public cloud platforms.
  • Public Cloud
    • NetApp Cloud Volumes        – SaaS solution that provides file services (NFS & SMB) on the cloud using a NetApp FAS xxxx SAN/NAS array running Data OnTAP that is tightly integrated to the native cloud platform.
    • Azure NetApp files        – PaaS solution running on physical NetApp FAS storage solutions on Azure DCs. Directly integrated in to Azure Resource Manager for native storage provisioning and management.
    • Cloud volumes ONTAP        – NetApp OnTAP virtual appliance that runs the same ONTAP code on the cloud. Can be used for production workloads, DR, File shares and DB storage, same as on-premises. Includes Cloud tiering and Trident container support as well as SnapLock for encryption.
  • Co-Lo (Adjacent to public clouds)NetApp private storage        – Dedicated, Physical NetApp FAS (ONTAP) or a FlexArray storage solution owned by the customer, that is physical adjacent to major cloud platform infrastructures. The storage unit is hosted in an Equinix data center with direct, low latency 10GBe link to Azure, AWS and GCP cloud back ends. Workloads such as VMs and applications deployed in the native cloud platform can consume data directly over this low latency link.
  • Edge locationsNetApp HCI            – Recently repositioned as a “Hybrid Cloud Infrastructure” rather than a “Hyper-Converged Infrastructure”, this solution provides a native NetApp compute + Storage solution that is tightly integrated with some of the key data services & Monitoring and management solutions from the Data Fabric (described below).

Data Fabric + NetApp Cloud Services

While the core storage infrastructure components of Data Fabric enables data mobility without the need to transform data across each hop, customers still need the tools to be able to provision, manage, monitor these data on each pillar of the data fabric. Furthermore, customers would also need to use these tools to manage the data across non NetApp platforms that are also linked to the Data Fabric storage pillars described above (such as native cloud platforms).

Insight 2018 (US) revealed the launch of some of these brand new data services & Tool from NetApp most of which are actually SaaS solutions hosted and managed by NetApp themselves on a cloud platform. While some of these services are fully live and GA, not all of these Cloud services are live just yet, but customers can trial them all free today.

Given below is a full list of the announced NetApp Cloud services that fall in to 2 categories. By design, these are tightly integrated with all the data storage pillars of the NetApp Data Fabric as well as other 3rd party storage and compute platforms such as AWS, Azure and 3rd party data center components.

NetApp Hybrid Cloud Data Services (New)

  • NetApp OnCommand Cloud Manager    – Deploy and manage Cloud Volumes ONTAP as well as discover and provision on-premises ONTAP clusters. Available as a SaaS or an on-premises SW.
  • NetApp Cloud Sync            – A NetApp SaaS offering that enables easier, automated data migration & synchronisation across NetApp and non NetApp storage platforms across the hybrid cloud. Currently supports Syncing data across AWS (S3, EFS), Azure (Blob), GCP (Storage bucket), IBM (Object storage) and NetApp StorageGRID.
  • NetApp Cloud Secure            – A NetApp SaaS security tool that aim to identify malicious data access across all Hybrid Cloud storage solutions. Connects to various storage back ends via a data collector and support NetApp Cloud Volumes, OnTAP, StorageGRID, Microsoft OneDrive, AWS, Google GSuite, HPe Command View. Dropbox, Box, Workplace and Office 365 as end points to be monitored. Not live yet and more details here.
  • NetApp Cloud Tiering            – Based on ONTAP Fabric Pools, enables direct tiering of infrequently used data from an ONTAP solution (on premises or on cloud) seamlessly to Azure blob, AWS S3 and IBM Cloud Object Storage. Not a live solution just yet but a technical preview is available.
  • NetApp SaaS Backup            – A NetApp SaaS backup solution for backing up Office 365 (Exchange online, SharePoint online, One drive for business, MS Teams and O365 Groups) as well as Salesforce data. Formerly known as NetApp Cloud Control. Can back up data to native storage or to Azure blob or AWS S3. Additional info here.
  • NetApp Cloud backup            – Another NetApp SaaS offering, purpose built for backing up NetApp Cloud Volumes (described above)
NetApp Cloud Management & Monitoring (New)
  • NetApp Kubernetes service        – New NetApp SaaS offering to provide enterprise Kubernetes as a service. Built around the NetApp acquisition of Stackpoint. Integrated with other NetApp Data Fabric components (NetApp’s own solutions) as well as public cloud platforms (Azure, AWS and GCP) to enable container orchestration across the board. Integrates with NetApp TRIDENT for persistent storage vlumes.
  • NetApp Cloud Insights            – Another NetApp SaaS offering built around ActiveIQ, that provides a single monitoring tool for visibility across the hybrid cloud and Data Fabric components. Uses AI & ML for predictive analytics, proactive failure prevention, dynamic topology mapping and can also be used for resource rightsizing and troubleshooting with infrastructure correlation capabilities.

My thoughts

In the world of Hybrid Cloud, customer data, from VMs to file data can now be stored in various different ways across various data centers, various different Edge locations and various different Public cloud platforms, all underpinned by different set of technologies. This presents an inevitable problem for customers where their data requires transformation each time it gets moved or copied across from one pillar to another (known as platform locking of data). This also means that it is difficult to seamlessly move that data across those platforms during its life time should you want to benefit from every pillar of the Hybrid cloud and different benefits inherent to each. NetApp’s new strategy, powered by providing a common software layer to store, move and manage customer data, seamlessly across all these platforms can resonate well with customers. By continuing to focus on the customer’s data, NetApp are focusing on the most important asset organisations of today, and most definitely the organisations of tomorrow, have. So enabling their customers to avoid un-necessary hurdles to move this asset from one platform to another is only going to go down well with enterprise customers.

This strategy is very similar to that of VMware’s for example (Any App, Any Device, Any Cloud) that aim to also address the same problem, albeit with a more application centric perspective. To their credit, NetApp is the only “Legacy Storage vendor” that has this all-encompassing strategy of having a common data storage layer across the full hybrid cloud spectrum where as most of their competition are either still focused on their data centre solutions with limited or minor integration to cloud through extending backup and DR capabilities at best.

Only time will tell how successful this strategy would be for NetApp, and I suspect most of that success or the failure will rely on the continued execution of this strategy successfully through building additional data and data management services and their positioning to address various Hybrid cloud use cases. But the initial feedback from the customers appears to be positive which is good to see. Being focused on the software innovation has always provided NetApp with an edge over their competitors and continuing on that strategy, especially in an increasingly software defined world is only bound to bring good things in my view.

Slide credit to NetApp & Tech Field Day!

NetApp & Next Generation Storage Technologies

There are some exciting technology developments taking place in the storage industry, some behind closed doors but some that are also publicly announced and already commercially available that most of you may already have come across. Some of these are organic developments to build on existing technologies but some are inspired by megascalers like AWS, Azure, GCP and various other cloud platforms. I’ve been lucky enough to be briefed on some of these when I was at SFD12 last year I the Silicon Valley, by SNIA – The Storage and Networking Industry Association that I’ve previously blogged about here.

This time around, I was part of the Storage Filed Day (SFD15) delegate panel that got a chance to visit NetApp at their HQ at Sunnyvale, CA to find out more about some of exciting new product offerings that are in NetApp’s roadmap, either in the works or starting to just come out, incorporating some of these new storage technologies. This post aim to provide a summary of what I learnt there and my respective thoughts.

Introduction

It is no secret that Flash media has changed the dynamics of the storage market over the last decade due to their inherent performance characteristics. While the earliest incarnations of flash media were prohibitively expensive to be used in mass quantities, the invention of SSDs commoditised the use of flash media across the entire storage industry. For example, most tier 1 workloads in the enterprises today are held on a SSD backed storage system where SSD disk drives form the whole or a key part of the storage media stack.

When you look at some of the key storage solutions in use today, there are 2 key, existing flash technologies that stand out, DRAM & SSD. DRAM is the fastest possible flash storage media that is most easily accessible by the data processing compute subsystem while SSD’s fall in to next best place when it comes to speed of access and the level of performance (IOPS & bandwidth). As such, most enterprise storage solutions in the world, be that the ones aimed at the customer data centers or on the megascaler’s cloud platforms utilise one or both of these flash media types to either accelerate (caching) or simply store tier 1 data sets.

It is important to note that, while the SSD’s benefitted from the overall higher performance and lower latency compared to mechanical drives due to the internal architecture of the SSD disks themselves (flash storage cells that don’t require spinning magnetic media), both the SSD drives and classic mechanical (spinning) drives are typically attached & accessed by the compute subsystem via the same SATA or the SaS interface subsystem with the same interface speed & latency. Often the internal performance of an SSD was not fully realised to its maximum potential, especially in an aggregated scenario like that of an enterprise storage array, due to these interface controller access speed and latency limitations, as illustrated in the diagram below.

One of the more recent technology developments in the storage and compute industry, namely “Non-Volatile Memory Express” (NVMe) aims to address these SAS & SATA interface driven performance and the latency limitations through the introduction of new, high performance host controller interface that has been engineered from the ground up to be able to fully utilise flash storage drives. This new NVMe storage architecture is designed to be future proof and would be compatible with various future disk drive technologies that are NAND based as well as non-NAND based storage media.

NVMe SSD drives connected via these NVMe interfaces will not only outperform traditional SSD drives attached via SAS or SATA, but most importantly will enable higher future capabilities such as being able to utilise Remote Direct Memory Address (RDMA) for super high storage performance extending the storage subsystem over a fabric of interconnected storage and compute nodes. A good introduction to the NVMe technology and its benefits over SAS / SATA interfaces can be viewed here.

Another much talked about development on the same front is the subject of the Storage Class Memory (SCM) – Also known as Persistent Memory (PMEM). SCM is an organic successor to the NAND technology based SSD drives that we see in mainstream use in flash accelerated as well as all flash storage arrays today.

At a theoretical level, SCM can come in 2 main types as shown in the above diagram (from a really great IBM research paper published in 2013).

  • M-Type SCM (Synchronous) = Incorporate non-volatile memory based storage in to the memory access subsystem (DDR) rather than SCSI block based storage subsystem through PCIe, achieving DRAM like throughput and latency benefits for persistent storage. Typically take the form of NVDIMM (that is attached to the memory BUS, similar to traditional DRAM) which is the fastest and best performant thing, next to DRAM itself. Uses memory card slots and appear to the system to use as a caching layer or as pooled memory (extended DRAM space) depending on the NVDIMM type (NVDIMMs come in 3 types, NVDIMM-N, NVDIMM-F and NVDIMM-P. A good explanation available here).
  • S-Type SCM (Asynchronous) = Incorporate non-volatile memory based storage but attached via the PCIe connector to the storage subsystem. While this is theoretically slower than the above, it’s still significantly faster than NAND based SSD drives that are in common use today, including those attached via NVMe host controller interface. Intel and Samsung both have already launched S-type SCM drives, Intel with their 3D XPoint architecture and Samsung with Z-SSD respectively but current drive models available are aimed more at consumer / workstation rather than server workloads. Server based implementations of similar SCM drives will likely arrive around 2019. (Along with supported server based software included within operating systems such as Hypervisors – vSphere 7 anyone?)

The idea of the SCM is to address the latency and performance gap that exist in every computer system when it comes to memory and storage since the advent of X86 computing. Typically, access latency for DRAM is around 60ns, and the next best option today, NVMe SSD drives will have a typical latency of around 20-200us and the SCM will fit in between these 2, at a typical latency between 60ns-20uS, depending on the type of the SCM, with a significantly high bandwidth that is incomparable to SSD drives. It is important to note however that most ordinary workloads do not need this type of super latency sensitive, extremely high bandwidth storage performance, the next generation data technologies involving Artificial Intelligence techniques such as machine learning, real-time analytics that relies on processing extremely large swathes of data at super quick time, would absolutely benefit, and in most instance, necessitate the need for these next gen storage technologies to be fully effective.

NetApp’s NVMe & SCM vision

NetApp was one of the first classic storage vendors who incorporate flash in to their storage systems, in an efficient manner to accelerate the workloads that is typically stored on spinning disks. This started with the concept of NVRAM that was included in their flagship FAS storage solutions as an acceleration layer. Then came the flash cache (PAM cards) which were flash media attached via the PCIe subsystem to act as a cashing layer for reads which was also popular. Since the advent of all flash storage arrays, NetApp went another step by introducing all flash storage in to their portfolio through the likes of All Flash FAS platform that was engineered and tuned for all flash media as well as the EF series.

NetApp innovation and constant improvement process hasn’t stopped there. During SFD15 event, we were treated to the next step of this technology evolution by NetApp when they discussed how they plan to incorporate the above mentioned NVMe and SCM storage technologies in to their storage portfolio, in order to provide next gen storage capabilities to serve next gen use cases such as AI, big data and real-time analytics. Given below is a holistic roadmap plan of where NetApp see NVMe and SCM technologies fitting in to their roadmap, based on the characteristics, benefits and costs of each technology.

The planned use of NVMe is clearly in 2 different points of the host->storage array communication path.

  • NVMe SSD drives : NVMe SSD drives in a storage array, attached via NVMe host controller interface in order to be able to fully utilise the latency and throughput potential of the SSD drives themselves by the storage processor (in the controllers). This will provide additional performance characteristics to the existing arrays.
  • NVMe-OF : NVMe over fabric which is attached to the storage consumer nodes (Servers) via a ultra-low latency NVMe fabric. NVMe-OF enable the use of RDMA capabilities to reduce the distance between the IO generator and the IO processor thereby significantly reducing the latency. NVMe-OF therefore is widely touted to be the next big thing in storage industry and a number of specialists start-ups like Excelero have already come out to market with specialist solutions and you can find out more about it in my blog here. An example of the NVMe-OF storage solution available from NetApp is the new NetApp EF570 all flash array. This product is already shipping and more details can be found here or here. This platform offers some phenomenal performance numbers at ultra-low latency, built around their trusted, mature, feature rich, yet simple EF storage platform which is also a bonus.

The planned (or experimented) use of SCM is in 2 specific areas of the storage stack, driven primarily by the costs of the media vs the need for acceleration.

  • Storage controller side caching:        NetApp mentioned that some of the experiments they are working on with prototype solutions already built are looking at using SCM media on the storage controllers as a another tier to accelerate performance, in the same way PAM cards or Flash cache was used on the older FAS system. This a relatively straight forward upgrade and would be specially effective in an all flash FAS solution with SSD drives in the back end where a traditional flash cache card based on NAND cells would be less effective.
  • Server (IO generator) side caching:        This use case looks at using the SCM media on the host compute systems that generates the IO to act as a local cache, but most importantly, used in conjunction with the storage controllers rather than in isolation, performing tiering and snapshots from the host cache to a backend storage system like an All Flash FAS.
  • NetApp are experimenting on this front primarily using their recent acquisition of Plexistor and their proprietary software that performs the function of combining DRAM and SCM as a single address space that is byte addressable (via memory semantics which is much faster than scsi / NVMe addressable storage) and presenting that to the applications as a cache while also presenting the backend NetApp storage array such as an All Flash FAS as a persistent storage tier. The applications achieve significantly lower latency and ultra-high throughput this way through caching the hot data using the Plexistor file system which incidentally bypasses the complex Linux IO stack (Comparison below). The Plexistor tech is supposed to provide enterprise grade feature as a part of the same software stack though the specifics of what those enterprise grade features meant were lacking (Guessing the typical availability and management capabilities as natively available within OnTAP?)

Based on some of the initial performance benchmarks, the effect of this is significant, as can be seen below when compared to a normal

My thoughts

As an IT strategist and an Architect at heart with a specific interest in storage who can see super data (read “extremely large quantities of data”) processing becoming a common use case soon across most industries due to the introduction of big data, real-time analytics and the accompanying Machine Learning tech, I can see value in this strategy from NetApp. Most importantly, they are looking at using these advanced technologies in harmony with some the proven, tried and tested data management platforms they already have in the likes of OnTAP software could be a big bonus. The acquisition of Plexistor was a good move for NetApp and integrating their tech and having a shipping product would be super awesome if and when that happens but I would dare say the use cases would be somewhat limited prohibitive initially given the Linux dependency. Others are taking note and the HCI vendor Nutanix’s acquisition of PernixData kind of hints Nutanix also having a similar strategy to that of Plexistor and NetApp.

While the organic growth of current product portfolio with capabilities through incorporating new tech such as NVMe is fairly straight forward and help NetApp stay relevant, it remains to be seen however how well acquisition driven integration such as that of Plexistor with SCM technologies to the NetApp platform would pan out to become a shipping product. NetApp has historically had issues around the efficiency of this integration process which in the past has known to be slow but this time around, under the new CEO George Kurian who brought in a more agile software development methodology and therefore, a more frequent feature & update release cycle, things may well be different this time around. The evidence seen during SFD15 pretty much suggest the same to me which is great.

Slide credit to NetApp!

Thanks

Chan

NetApp United 2018 – No it’s not another football team!

I was glad to see an email from the NetApp united team this afternoon confirming that I’ve been selected as a member of the prestigious NetApp United (#NetAppUnited) team for 2018 which is a great honour indeed. Thanks NetApp!

Contrary to popular belief – NetApp United is NOT a football team but global community of individuals united by the passion for great technology. Similar to the VMware vExpert and Dell EMC elect programmes, NetAppUnited is a community programme run by NetApp (@PeytStefanova is the organiser in chief) to recognise global NetApp technology experts and community influencers with a view to giving them a platform to share more of their thoughts, contents, influence and ultimately share more of their expertise publicly though various community channels. Similar to the other community programs from other vendors, NetApp united is all about giving back to the community which is a good cause and I was happy to support.

Being recognised a member of the NetApp United program entitles you to a number of exclusive benefits such as dedicated NetApp technology update sessions with product engineers, exclusive briefings about future and upcoming NetApp solutions and products, Access to a private slack channel for the community members to discuss all things technical and related to NetApp and other exclusive events at NetApp Insight events in US and EMEA. All of these perks are nice to have indeed as they enable us to share some of those information with the others out there as well as provide our own thoughts which would be beneficial for current or future NetApp customers out there.

As I work for a global NetApp partner, I am looking forward to using the access to information I have as a part of this program to better leverage our partnership with NetApp as well as to educate our joint customers on future NetApp strategy. As I am also an independent contributor (outside of work), I intend to share some of the information (outside of NDA stuff) with my general audiences to help you understand various NetApp solutions, strategy and my independent thoughts on them which I think is important. I have been working with NetApp for a long time, initially as a customer and then as a partner where I’ve always been a great fan of their core strategy which was always about Software, despite being a HW product manufacturer. They have some extremely awesome innovation already available in their portfolio and even better innovation in the making for future (Have a look at the recently concluded #SFD15 presentation from them about the Data Pipeline vision here) and I am looking forward to sharing some of these along with my thoughts with everyone.

The full list of all the NetApp United 2018 members can be found here. Congratulations to all those who got selected and Thank you NetApp & @PeytStefanova for the invitation and the recognition!

Cheers

Chan

Storage Futures With Intel Software From #SFD12

 

As a part of the recently concluded Storage Field Day 12 (#SFD12), we traveled to one of the Intel campuses in San Jose to listen to the Intel Storage software team about future of storage from an Intel perspective. This was a great session that was presented by Jonathan Stern (Intel Solutions Architect /  and Tony Luck (Principle Engineer) and this post is to summarise few things I’ve learnt during those sessions that I thought were quite interesting for everyone. (prior to this session we also had a session from SNIA that was talking about future of storage industry standards but I think that deserves a dedicated post so I won’t mention those here – stay tuned for a SNIA event specific post soon!)

First session from Intel was on the future of storage by Jonathan. It’s probably fair to say Jonathan was by far the most engaging presenter out of all the SFD12 presenters and he covered somewhat of a deep dive on the Intel plans for storage, specifically on the software side of things and the main focus was around the Intel Storage Performance Development Kit (SPDK) which Intel seem to think is going to be a key part of the future of storage efficiency enhancements.

The second session with Tony was about Intel Resource Director Technology (addresses shared resource contention that happens inside an Intel processor in processor cache) which, in all honesty was not something most of us storage or infrastructure guys need to know in detail. So my post below is more focused on Jonathan’s session only.

Future Of Storage

As far as Intel is concerned, there are 3 key areas when it comes to the future of storage that need to be looked at carefully.

  • Hyper-Scale Cloud
  • Hyper-Convergence
  • Non-Volatile memory

To put this in to some context, see the below revenue projections from Wikibon Server SAN research project 2015 comparing the revenue projections for

  1. Traditional Enterprise storage such as SAN, NAS, DAS (Read “EMC, Dell, NetApp, HPe”)
  2. Enterprise server SAN storage (Read “Software Defined Storage OR Hyper-Converged with commodity hardware “)
  3. Hyperscale server SAN (Read “Public cloud”)

It is a known fact within the storage industry that public cloud storage platforms underpinned by cheap, commodity hardware and intelligent software provide users with an easy to consume, easily available and most importantly non-CAPEX storage platform that most legacy storage vendors find hard to compete with. As such, the net new growth in the global storage revenue as a whole from around 2012  has been predominantly within the public cloud (Hyperscaler) space while the rest of the storage market (non-public cloud enterprise storage) as a whole has somewhat stagnated.

This somewhat stagnated market was traditionally dominated by a few storage stalwarts such as EMC, NetApp, Dell, HPe…etc. However the rise of the server based SAN solutions where commodity servers with local drives combined with intelligent software to make a virtual SAN / storage pool (SDS/HCI technologies) has made matters worse for these legacy storage vendors and such storage solutions are projected to eat further in to the traditional enterprise storage landscape within next 4 years. This is already evident by the recent popularity & growth of such SDS/HCI solutions such as VMware VSAN, Nutanix, Scality, HedVig while at the same time, traditional storage vendors announcing reducing storage revenue. So much so that even some of the legacy enterprise storage vendors like EMC & HPe have come up with their own SDS / HCI offerings (EMC Vipr, HPe StoreVirtual, annoucement around SolidFire based HCI solution…etc.) or partnered up with SDS/HCI vendors (EMC VxRail, VxRail…etc.) to hedge their bets against a loosing back drop of traditional enterprise storage.

 

If you study the forecast in to the future, around 2020-2022, it is estimated that the traditional enterprise storage market revenue & market share will be even further squeezed by even more rapid  growth of the server based SAN solutions such as SDS and HCI solutions. (Good luck to legacy storage folks)

An estimate from EMC suggest that by 2020, all primary storage for production applications would sit on flash based drives, which precisely co-inside with the timelines in the above forecast where the growth of Enterprise server SAN storage is set to accelerate between 2019-2022. According to Intel, one of the main reasons behind this forecasted increase of revenue (growth) on the enterprise server SAN solutions is estimated to be the developments of Non-Volatile Memory (NVMe) based technologies which makes it possible achieve very  low latency through direct attached (read “locally attach”) NVMe drives along with clever & efficient software that are designed to harness this low latency. In other words, drop of latency when it comes to drive access will make Enterprise server SAN solutions more appealing to customers who will look at Software Defined, Hyper-Converged storage solutions in favour of external, array based storage solutions in to the immediate future and legacy storage market will continue to shrink further and further.

I can relate to this prediction somewhat as I work for a channel partner of most of these legacy storage vendors and I too have seen first hand the drop of legacy storage revenue from our own customers which reasonably backs this theory.

 

Challenges?

With the increasing push for Hyper-Convergence with data locality, the latency becomes an important consideration. As such, Intel’s (& the rest of the storage industry’s) main focus going in to the future is primarily around reducing the latency penalty applicable during a storage IO cycle, as much as possible. The imminent release of this next gen storage media from Intel as a better alternative to NAND (which comes with inherent challenges such as tail latency issues which are difficult to get around) was mentioned without any specific details. I’m sure that was a reference to the Intel 3D XPoint drives (Only just this week announced officially by Intel http://www.intel.com/content/www/us/en/solid-state-drives/optane-solid-state-drives-dc-p4800x-series.html) and based on the published stats, the projected drive latencies are in the region of < 10μs (sequential IO) and < 200μs (random IO) which is super impressive compared to today’s ordinary NVMe SSD drives that are NAND based. This however presents a concern as the current storage software stack that process the IO through the CPU via costly context switching also need to be optimised in order to truly benefit from this massive drop in drive latency. In other words, the level of dependency on the CPU for IO processing need to be removed or minimised through clever software optimisation (CPU has long been the main IO bottleneck due to how MSI-X interrupts are handled by the CPU during IO operations for example). Without this, the software induced latency would be much higher than the drive media latency during an IO processing cycle which will contribute to an overall higher latency still. (My friend & fellow #SFD12 delegate Glenn Dekhayser described this in his blog as “the media we’re working with now has become so responsive and performant that the storage doesn’t want to wait for the CPU anymore!” which is very true).

Furthermore,

Storage Performance Development Kit (SPDK)

Some companies such as Excelero are also addressing this CPU dependency of the IO processing software stack by using NVMe drives and clever software  to offload processing from CPU to NVMe drives through technologies such as RDDA (Refer to the post I did on how Excelero is getting around this CPU dependency by reprogramming the MSI-X interrupts to not go to the CPU). SPDK is Intel’s answer to this problem and where as Excelero’s RDDA architecture primarily avoid CPU dependency by bypassing CPU for IOs, Intel SPDK minimizes the impact on CPU & Memory bus cycles during IO processing by using the user-mode for storage applications rather than the kernel mode, thereby removing the need for costly context switching and the related interrupt handling overhead. According to http://www.spdk.io/, “The bedrock of the SPDK is a user space, polled mode, asynchronous, lockless NVMe driver that provides highly parallel access to an SSD from a user space application.”

With SPDK, Intel claims that you can reach up to around 3.6million IOPS per single Xeon CPU core before it ran out of PCI lane bandwidth which is pretty impressive. Below is a IO performance benchmark based on a simple test of CentOS Linux kernel IO performance (Running across 2 x Xeon E5-2965 2.10 GHz CPUs each with 18 cores + 1-8 x Intel P3700 NVMe SSD drives) Vs SPDK with a single dedicated 2.10 GHz core allocated out of the 2 x Xeon E5-2965  for IO. You can clearly see the significantly better IO performance with SPDK, which, despite having just a single core, due to the lack of context switching and the related overhead, is linearly scaling the IO throughput in line with the number of NVMe SSD drives.

(In addition to these testing, Jonathan also mentioned that they’ve done another test with Supermicro off the shelf HW and with SPDK & 2 dedicated cores for IO, they were able to get 5.6 million IOPS before running out of PCI bandwidth which was impressive)

 

SPDK Applications & My Thoughts

SPDK is an end-to-end reference storage architecture & a set of drivers (C libraries & executables) to be used by OEMs and ISV’s when integrating disk hardware. According to Intel’s SPDK introduction page, the goal of the SPDK is to highlight the outstanding efficiency and performance enabled by using Intel’s networking, processing and storage technologies together. SPDK is available freely as an open source product that is available to download through GitHub. It also provide NVMeF (NVMe Over Fabric) and iSCSI servers to be built using the SPDK architecture, on top of the user space drivers that are even capable of servicing disks over the network. Now this can potentially revolutionise how the storage industry build their next generation storage platforms.  Consider for example any SDS or even  a legacy SAN manufacturer using this architecture to optimise the CPU on their next generation All  Flash storage array? (Take NetApp All Flash FAS platform for example, they are known to have a ton of software based data management services available within OnTAP that are currently competing for CPU cycles with IO and often have to scale down data management tasks during heavy IO processing. With Intel DPDK architecture for example, OnTAP can free up more CPU cycles to be used by more data management services and even double up on various other additional services too without any impact on critical disk IO? I mean its all hypothetical of course as I’m just thinking out loud here. Of course it would require NetApp to run OnTAP on Intel CPUs and Intel NVMe drives…etc but it’s doable & makes sense right? I mean imagine the day where you can run “reallocate -p” during peak IO times without grinding the whole SAN to a halt? :-). I’m probably exaggerating its potential here but the point here though is that SDPK driven IO efficiencies can apply same to all storage array manufacturers (especially all flash arrays) where they can potentially start creating some super efficient, ultra low latency, NVMe drive based storage arrays and also include a ton of data management services that would have been previously too taxing on CPU (think inline de dupe, inline compression, inline encryption, everything inline…etc.) that’s on 24×7 by default, not just during off peak times due to zero impact on disk IO?

Another great place to apply SPDK is within virtualisation for VM IO efficiency. Using SPDK with QEMU as follows has resulted in some good IO performance to VM’s

 

I mean imagine for example, a VMware VSAN driver that was built using the Intel DPDK architecture running inside the user space using a dedicated CPU core that will perform all IO and what would be the possible IO performance? VMware currently performs IO virtualisation in kernel right now but imagine if SPDK was used and IO virtualisation for VSAN was changed to SW based, running inside the user-space, would it be worth the performance gain and reduced latency? (I did ask the question and Intel confirmed there are no joint engineering currently taking place on this front between 2 companies). What about other VSA based HCI solutions, especially take someone like Nutanix Acropolis where Nutanix can happily re-write the IO virtualisation to happen within user-space using SPDK for superior IO performance?

Intel & Alibaba cloud case study where the use of SPDK was benchmarked has given the below IOPS and latency improvements

NVMe over Fabric is also supported with SPDK and some use cases were discussed, specifically relating to virtualisation where VM’s tend of move between hosts and a unified NVMe-oF API that talk to local and remote NVMe drives being available now (some part of the SPDK stack becoming available in Q2 FY17)

Using the SPDK seems quite beneficial for existing NAND media based NVMe storage, but most importantly for newer generation non-NAND media to bring the total overall latency down. However that does mean changing the architecture significantly to process IO in user-mode as opposed to kernel-mode which I presume is how almost all storage systems, Software Defined or otherwise work and I am unsure whether changing them to be user-mode with SPDK is going to be a straight forward process. It would be good to see some joint engineering or other storage vendors evaluating the use of SPDK though to see if the said latency & IO improvements are realistic in complex storage solution systems.

I like the fact that Intel has made the SPDK OpenSource to encourage others to freely utilise (& contribute back to) the framework too but I guess what I’m not sure about is whether its tied to Intel NVMe drives & Intel processors.

If anyone wants to watch the recorded video of our session from # SFD12 the links are as follows

  1. Jonathan’s session on SPDK
  2. Tony’s session on RDT

Cheers

Chan

#SFD12 #TechFieldDay @IntelStorage

Storage Field Day (#SFD12) – Vendor line up

Following on from my previous post about a quick intro to Storage Field Day (#SFD12) that I was invited to attend in San Jose this week as an independent thought leader, I wanted to get a quick post out on the list of vendors we are supposed to be seeing. If you are new to what Tech Field Day / Storage Field Day events are, you’ll also find an intro in my above post.

The event is starting tomorrow and I am currently waiting for my flight to SJC at LHR, and its fair to say I am really looking forward to attending the event. Part of that excitement is due to being given the chance to meet a bunch of other key independent thought leaders, community contributors, Technology evangelists from around the world as well as the chance to meet Stephen Foskett (@SFoskett) and the rest of the #TFD crew from Gestalt IT (GestaltIT.com) at the event. But most of that excitement for me is simply due to the awesome (did I say aaawwwesommmmmmeee?) list of vendors that we are supposed to be meeting with to discuss their technologies.

The full list & event agenda goes as follows

Wednesday the 8th

  • Watch the live streaming of the event @ https://livestream.com/accounts/1542415/events/6861449/player?width=460&height=259&enableInfoAndActivity=false&defaultDrawer=&autoPlay=false&mute=false
  • 09:00 – MoSMB presentation
    • MoSMB is a fully compliant, light weight adaptation of SMB3 made available as proprietory offering by Ryussi technologies. In effect its a BMS3 server on Linux & Unix systems. They are not a technology I had come across before so really looking forward to getting to know more about them and their offerings and their partnership with Microsoft…etc.
  • 10:00 – StarWind Presents
    • Again, new technology to me personally, which appears to be a Hyper-Converged appliance that seem to unify commodity server disks and flash with multiple hypervisors. Hyper-Converged platforms are very much of interest to me and I know the industry leading offerings on this front such as VMware VSAN & Nutanix fairly well. So its good to get to know these guys too and understanding what are their Unique Selling Points / differentiators to the big boys.
  • 13:00 – Elastifile Presents
    • Elastic Loud File System from Elastafile is supposed to be able to provide application level distributed file / object system spanning private cloud and public cloud to provide a hybrid cloud data infrastructure. This one is again new to me so keen to understand more about what makes them different to other similar distributed object / storage solutions such as HedVig / Scality from my perspective. Expect my analysis blog post on this one after I’ve met up with them for my initial take!
  • 16:00 – Excelero Presents (hosted at Excelero office in the Silicon Valley)
    • These guys are a new vendor that is literally due to launch themselves on the same day as we speak to them. Effectively they don’t exists quite yet. So quite exciting to find out who they are what they’ve got to offer us in this increasingly growing, rapidly changing world of enterprise IT.
  • 19:00 – Dinner and Reception (Storage Cocktails?) with presenters and friends at Loft Bar and Bistro in San Jose
    • Good networking event with the presenters from the day for peer to peer networking and further questioning on what we’ve heard from them during the day.

Thursday the 9th of March

  • 08:00 (4pm UK time) – Nimble Storage Presents
    • Nimble are a SAN vendor that I am fairly familiar with and have known them for a fairly long time and I also have few friends that work at Nimble UK. To be fair, I was never a very big fan of Nimble personally as a hybrid SAN vendor as I was  more a NetApp, EMC, HPe 3Par kinda person for hybrid SAN offering which I’ve always thought offer the same if not better tech for roughly a similar price point, with the added benefit of being large established vendors. Perhaps I can use this session to understand where Nimble is heading now as an organisation and what differentiators / USP’s they may have compared to big boys and how they plan to stay relevant in an industry which is generally in decline as a whole.
  • 10:45 – NetApp Presents (At NetApp head office in Silicon Valley)
    • Now I know a lot about NetApp :-). NetApp was my main storage skill in the past (still is to a good level) and I have always been very close to most NetApp technologies, from both presales and deliver perspective and was also awarded as the NetApp partner System Engineer of the Year (2013) for UK & Ireland by NetApp. However since the introduction of cDOT properly to their portfolio, I’ve always felt like they’ve lost a little market traction a little. I’m very keen to listen to NetApp’s current messaging and understand where their heads are at, and how their new technology stack including SolidFire is going to be positioned against other larger vendors such as Dell EMC, HPe 3Par as well as all the disruption from Software Defined storage vendors.
  • 12:45 (20:45 UK time) – Lunch at NetApp with Dave Hitz
    • Dave Hitz  (@DaveHitz) who was the NetApp founder is a legend… Nuff said!
  • 14:00 – Datera Presents
    • Datera is a high performance elastic block storage vendor and is again quite new to me. So looking forward to understanding more about what they have to offer.
  • 19:30 – San Jose Sharks hockey game at SAP Center
    • Yes, its an evening watching a bit of Ice Hockey which, I’ve never done before. To be clear, Ice Hockey is not one of my favourite sports but happy to take part in the event :0).

Friday the 10th of March

  • 09:00 (17:00 UK time) – SNIA Presents (@Intel Head office)
    • The Storage Networking Industry Association is a non profit organisation made up of various technology vendor companies.
  • 10:30 (18:30 UK time) – Intel Presents (@Intel Head office)
    • I don’t think I need to explain / introduce Intel to anyone. If I must, they kinda make some processors :-). Looking forward to visiting Intel office in the valley.

All and all, its an exciting line up of vendors and some old and some new vendors which I’m looking forward to meeting.

Exciting stuff, cant wait…! Now off to board the flight. See you on the other side!

Chan

 

FlexPod: The Joint Wonder From NetApp & Cisco (often with VMware vSphere on Top)

Logo

During attending the NetApp Insight 2015 in Berlin this week, I was reminded of the monumental growth in the number of customers who has been deploying FlexPods as their preferred converged solutions platform, which now celebrates its 5th year in operation. So I thought I’d do a very short post on it to give you my personal take of it and highlight some key materials.

FlexPod has been gaining lots of market traction as the preferred converged solution platform of choice for many customers of over the last 4 years. This has been due to the solid hardware technologies that underpins the solution offering (Cisco UCS compute + Cisco Nexus unified networking + NetApp FAS range of Clustered ONTAP SAN). Often, customers deploy FlexPod solutions together with VMware vSphere or MS Hyper-V on top (other hypervisors are also supported) which together, provide a complete, ready to go live, private and hybrid cloud platform that has been pre-validated to run most if not all typical enterprise data center workloads. I have been a strong advocate of FlexPod (simply due its technical superiority as a converged platform) for many of my customers since it’s inception.

Given below are some of the interesting FlexPod validated designs from Cisco & NetApp for Application performance, Cloud and automation, all in one place.

There are over 100+ FlexPod validated designs available in addition to the above, and they can all be found below

There is a certified, pre-validated, detailed FlexPod design and deployment guide for almost every datacentre workload and based on my 1st hand experience, FlexPod with VMware vSphere has always been a very popular choice amongst most customers as things just work together beautifully. Given the joint vendor support available, sourcing support from a single vendor for all tech in the solution is easy too. I also think customers prefer FlexPod over other similar converged solutions, say VBLOCK for example, due to the non prescriptive nature of FlexPod whereby you can tailor make a FlexPod solution that meet your need (a FlexPod partner can do this for a customer) which keeps the costs down too.

There are many FlexPod certified partners available who can size, design, sell and implement a FlexPod solution for a customer and my employer Insight is also one of them (in fact we were amongst the first few partners to get FlexPod partnership in the UK). So if you have any questions around the potential use of a FlexPod system, feel free to get in touch directly with me (contact details on the About Me section of this site) or through the Flexpod section of the Insight Direct UK web site.

Cheers

Chan