VMware & DataGravity Solution – Data Management For the Digital Enterprise

 

 

Yesterday, I had the priviledge to be invited to an exclusive VMware #vExpert only webinar oraganised by the vExpert community manager, Corey Romero and DataGravity, one of their partner ISV’s to get a closer look at the DataGravity solution and its integration with VMware.  My initial impression was that its a good solution and a good match with VMware technology too and I kinda like what I saw. So decided to post a quick post about it to share what I’ve learned.

DataGravity Introduction

DataGravity (DG from now on) solution appear to be all about data managament, and in perticular its about data management in a virtualised data center. In a nutshell, DG is all about providing a simple, virtualisation friendly data management solution that, amongst many other things, focuses on the following key requirements which are of primary importance to me.

  • Data awareness – Understand different types of data available within VMs, structured or unstructured along with various metadata about all data. It automatically keeps a track of data locations, status changes and various other metadata information about data including any sensitive contents (i.e. Credit card information) in the form of an easy to read, dashboard style interface
  • Data protection & security –  DG tracks sensitive data and provide a complete audit trail including access history helpo remediate any potential loss or compromise of data

DG solution is currently specific to VMware vSphere virtual datacenter platforms only and serves 4 key use cases as shown below

Talking about the data visulation itself, DG claim to provide a 360 degree view of all the data that reside within your virtualised datacenter (on VMware vSphere) and having see the UI on the live demo, I like that visualisation of it which very much resemblbes the interface of VMware’s own vrealise operations screen.

The unified, tile based view of all the data in your datacenter with vROPS like context aware UI makes navigating through the information about data pretty self explanatory.

Some of the information that DG automatically tracks on all the data that reside on the VMware datacenter include information as shown below

Some of the cool capabilities DG has when it comes to data protection itself include behaviour based data protection where it proactively monitor user and file activities and mitigates potential attacks through sensing anomolous behaviours and taking prevenetive measures such as orchestratin protection points, alerting administrators to even blocking user access automatically.

During a recovery scenario, DG claims to assemble the forensic information needed to perform a quick recovery such as cataloging files and incremental version information, user activity information and other key important meta data such as known good state of various files which enable the recovery with few clicks.

Some Details

During the presentaiton, Dave Stevens (Technical Evangelist) took all the vExperts through the DG solution in some detail and its integration with VMware vSphere which I intend to share below for the benefit of all others (sales people: feel free to skip this section and read the next).

The whole DG solution is deployed as a simple OVA in to vCenter and typically requires connecting the appliance to Microsoft Active Directory (for user access tracking) initially as a one off task. It will then perform an automated initial discovery of data and the important thing to note here is that it DOES NOT use an agent in each VM but simply uses the VMware VADP, or now known as vSphere Storage API to silently interrogate data that live inside the VMs in the data center with minimal overhead. Some of the specifics around the overhead around this are as follows

  • File indexing is done at a DiscoveryPoint (Snapshot) either on a schedule or user driven. (No impact to real-time there access from a performance point of view).
  • Real time access tracking overhead is minimal to non existent
    • Real-time user activity is 200k of memory
    • Network bandwidth about 50kbps per VM.
    • Less than 1% of CPU

From an integration point of view, while DG solution integrates with vSphere VM’s as above irrespective of the underlying storage platform, it also has the ability to integrate with specific storage vendors too (licensing prerequisites apply)

Once the data discovery is complete, further discoveries are done on an incremental basis and the management UI is a simple web interface which looks pretty neat.

Similar to VMware vROPS UI for example, the whole UI is context aware so depending on what object you select, you are presented with stats in the context of the selected object(s).

The usage tracking is quite granular and keeps a track of all types of user access for data in the inventory which is handy.


 

Searching for files is simple and you can also use tags to search using, which are simple binary expressions. Tags can be grouped together in to profiles too to search against which looks pretty simple and efficient.

I know I’ve mentioned this already but the simple, intuitive user interface makes consuming the information on the UI about all your data in  singple pane of glass manner looks very attractive.

Current Limitations

There are some current limitations to be aware of however and some of the important ones include,

  • Currently it doesn’t look inside structured data files (i.e. Database files for example)
    • Covers about 600 various file types
  • File content analytics is available for Windows VMs only at present
    • Linux may follow soon?
  • VMC (VMware Cloud on AWS) & VCF (Vmware Cloud Foundation) support is not there (yet)
    • Is this to be annouced during a potential big event?
  • No current availability on other public cloud platforms such as AWS or Azure (yet)

 

My Thoughts

I lilke the solution and its capabilities due to various reasons. Primarily its because the focus on data that reside in your data center is more important now that its ever been. Most organisaitons simply do not have a clue of the type of th data they hold in a datacenter, typically scattered around various server, systrems, applications etc, often duplicated and most importantly left untracked on their current relevence or even the actual usage as to who access what. Often, most data that is generated by an organisation serves its initial purpose after a certain intial period and that data is now simply just kept on the systems forever, intentionally or unintentionally. This is a costly exercise, especially on the storage front and you are typically filling your SAN storage with stale data. With a simple, yet intelligent data management solution like DG, you now have the ability to automatically track data and their ageing across the datacenter and use that awareness of your data to potentially move stale data on to a different tier, especially a cheaper tier such as a public cloud storage platform.

Furthermore, not having an understanding of data governance, especifically not monitoring the data access across the datacenter is another issue where many organisations do not collectively know what type of data is available where within the datacenter and how secure that data is including their access / usage history over their existence. Data security is probably the most important topic in the industry today as organisations are in creasingly becoming digital thanks to the Digital revelution / Digital Enterprise phoenomena (in other words, every organisation is now becoming digital) and a guranteed by product of this is more and more DATA being generated which often include all if not most of an organisations intelectual property. If theres no credible way of providing a data management solution focusing around security for such data, you are risking loosing the livelyhood of your organisation and its potential survival in a fiercely coimpetitive global economy.

It is important to note that some regulatory compliance has always enforced the use of data management & governance solutions such as DG tracking such information about data and their security for certain type of data platforms (i.e.  PCI for credit card information…etc). But the issue is no such requirement existed for all types of data that lives in your datacenter. This about to change, at least here in the Europe now thanks to the European GDPR (General Data Protection Regulation) which now legally oblighes every orgnisation to be able to provide auditeble history of all types of data that they hold and most organisations I know do not have a credible solution covering the whole datacenter to meet such demands rearding their data today.

A simple, easily integrateble solution that uses little overhead like DataGravity that, for the most part harness the capabilities of the underlying infrastructure to track and manage the data that lives on it could be extremely attractive to many customers. Most customers out there today use VMware vSphere as their preferred virtualisaiton platform and the obvious integration with vSphere will likely work in favour of DG. I have already signed up for a NFR download for me to have doiwnload and deploy this software in my own lab to understand in detail how things work in detail and I will aim to publish a detailed deepdive post on that soon. But in the meantime, I’d encourage anyone that runs a VMware vSphere based datacenter that is concerned about data management & security to check the DG solution out!!

Keen to get your thoughts if you are already using this in your organisation?

 

Cheers

Chan

Slide credit to VMware & DataGravity!

Storage Futures With Intel Software From #SFD12

 

As a part of the recently concluded Storage Field Day 12 (#SFD12), we travelled to one of the Intel campuses in San Jose to listen to the Intel Storage software team about future of storage from an Intel perspective. This was a great session that was presented by Jonathan Stern (Intel Solutions Architect /  and Tony Luck (Principle Engineer) and this post is to summarise few things I’ve learnt during those sessions that I thought were quite interesting for everyone. (prior to this session we also had a session from SNIA that was talking about future of storage industry standards but I think that deserves a dedicated post so I won’t mention those here – stay tuned for a SNIA event specific post soon!)

First session from Intel was on the future of storage by Jonathan. It’s probably fair to say Jonathan was by far the most engaging presenter out of all the SFD12 presenters and he covered somewhat of a deep dive on the Intel plans for storage, specifically on the software side of things and the main focus was around the Intel Storage Performance Development Kit (SPDK) which Intel seem to think is going to be a key part of the future of storage efficiency enhancements.

The second session with Tony was about Intel Resource Director Technology (addresses shared resource contention that happens inside an Intel processor in processor cache) which, in all honesty was not something most of us storage or infrastructure guys need to know in detail. So my post below is more focused on Jonathan’s session only.

Future Of Storage

As far as Intel is concerned, there are 3 key areas when it comes to the future of storage that need to be looked at carefully.

  • Hyper-Scale Cloud
  • Hyper-Convergence
  • Non-Volatile memory

To put this in to some context, see the below revenue projections from Wikibon Server SAN research project 2015 comparing the revenue projections for

  1. Traditional Enterprise storage such as SAN, NAS, DAS (Read “EMC, Dell, NetApp, HPe”)
  2. Enterprise server SAN storage (Read “Software Defined Storage OR Hyper-Converged with commodity hardware “)
  3. Hyperscale server SAN (Read “Public cloud”)

It is a known fact within the storage industry that public cloud storage platforms underpinned by cheap, commodity hardware and intelligent software provide users with an easy to consume, easily available and most importantly non-CAPEX storage platform that most legacy storage vendors find hard to compete with. As such, the net new growth in the global storage revenue as a whole from around 2012  has been predominantly within the public cloud (Hyperscaler) space while the rest of the storage market (non-public cloud enterprise storage) as a whole has somewhat stagnated.

This somewhat stagnated market was traditionally dominated by a few storage stalwarts such as EMC, NetApp, Dell, HPe…etc. However the rise of the server based SAN solutions where commodity servers with local drives combined with intelligent software to make a virtual SAN / storage pool (SDS/HCI technologies) has made matters worse for these legacy storage vendors and such storage solutions are projected to eat further in to the traditional enterprise storage landscape within next 4 years. This is already evident by the recent popularity & growth of such SDS/HCI solutions such as VMware VSAN, Nutanix, Scality, HedVig while at the same time, traditional storage vendors announcing reducing storage revenue. So much so that even some of the legacy enterprise storage vendors like EMC & HPe have come up with their own SDS / HCI offerings (EMC Vipr, HPe StoreVirtual, annoucement around SolidFire based HCI solution…etc.) or partnered up with SDS/HCI vendors (EMC VxRail, VxRail…etc.) to hedge their bets against a loosing back drop of traditional enterprise storage.

 

If you study the forecast in to the future, around 2020-2022, it is estimated that the traditional enterprise storage market revenue & market share will be even further squeezed by even more rapid  growth of the server based SAN solutions such as SDS and HCI solutions. (Good luck to legacy storage folks)

An estimate from EMC suggest that by 2020, all primary storage for production applications would sit on flash based drives, which precisely co-inside with the timelines in the above forecast where the growth of Enterprise server SAN storage is set to accelerate between 2019-2022. According to Intel, one of the main reasons behind this forecasted increase of revenue (growth) on the enterprise server SAN solutions is estimated to be the developments of Non-Volatile Memory (NVMe) based technologies which makes it possible achieve very  low latency through direct attached (read “locally attach”) NVMe drives along with clever & efficient software that are designed to harness this low latency. In other words, drop of latency when it comes to drive access will make Enterprise server SAN solutions more appealing to customers who will look at Software Defined, Hyper-Converged storage solutions in favour of external, array based storage solutions in to the immediate future and legacy storage market will continue to shrink further and further.

I can relate to this prediction somewhat as I work for a channel partner of most of these legacy storage vendors and I too have seen first hand the drop of legacy storage revenue from our own customers which reasonably backs this theory.

 

Challenges?

With the increasing push for Hyper-Convergence with data locality, the latency becomes an important consideration. As such, Intel’s (& the rest of the storage industry’s) main focus going in to the future is primarily around reducing the latency penalty applicable during a storage IO cycle, as much as possible. The imminent release of this next gen storage media from Intel as a better alternative to NAND (which comes with inherent challenges such as tail latency issues which are difficult to get around) was mentioned without any specific details. I’m sure that was a reference to the Intel 3D XPoint drives (Only just this week announced officially by Intel http://www.intel.com/content/www/us/en/solid-state-drives/optane-solid-state-drives-dc-p4800x-series.html) and based on the published stats, the projected drive latencies are in the region of < 10μs (sequential IO) and < 200μs (random IO) which is super impressive compared to today’s ordinary NVMe SSD drives that are NAND based. This however presents a concern as the current storage software stack that process the IO through the CPU via costly context switching also need to be optimised in order to truly benefit from this massive drop in drive latency. In other words, the level of dependency on the CPU for IO processing need to be removed or minimised through clever software optimisation (CPU has long been the main IO bottleneck due to how MSI-X interrupts are handled by the CPU during IO operations for example). Without this, the software induced latency would be much higher than the drive media latency during an IO processing cycle which will contribute to an overall higher latency still. (My friend & fellow #SFD12 delegate Glenn Dekhayser described this in his blog as “the media we’re working with now has become so responsive and performant that the storage doesn’t want to wait for the CPU anymore!” which is very true).

Furthermore,

Storage Performance Development Kit (SPDK)

Some companies such as Excelero are also addressing this CPU dependency of the IO processing software stack by using NVMe drives and clever software  to offload processing from CPU to NVMe drives through technologies such as RDDA (Refer to the post I did on how Excelero is getting around this CPU dependency by reprogramming the MSI-X interrupts to not go to the CPU). SPDK is Intel’s answer to this problem and where as Excelero’s RDDA architecture primarily avoid CPU dependency by bypassing CPU for IOs, Intel SPDK minimizes the impact on CPU & Memory bus cycles during IO processing by using the user-mode for storage applications rather than the kernel mode, thereby removing the need for costly context switching and the related interrupt handling overhead. According to http://www.spdk.io/, “The bedrock of the SPDK is a user space, polled mode, asynchronous, lockless NVMe driver that provides highly parallel access to an SSD from a user space application.”

With SPDK, Intel claims that you can reach up to around 3.6million IOPS per single Xeon CPU core before it ran out of PCI lane bandwidth which is pretty impressive. Below is a IO performance benchmark based on a simple test of CentOS Linux kernel IO performance (Running across 2 x Xeon E5-2965 2.10 GHz CPUs each with 18 cores + 1-8 x Intel P3700 NVMe SSD drives) Vs SPDK with a single dedicated 2.10 GHz core allocated out of the 2 x Xeon E5-2965  for IO. You can clearly see the significantly better IO performance with SPDK, which, despite having just a single core, due to the lack of context switching and the related overhead, is linearly scaling the IO throughput in line with the number of NVMe SSD drives.

(In addition to these testing, Jonathan also mentioned that they’ve done another test with Supermicro off the shelf HW and with SPDK & 2 dedicated cores for IO, they were able to get 5.6 million IOPS before running out of PCI bandwidth which was impressive)

 

SPDK Applications & My Thoughts

SPDK is an end-to-end reference storage architecture & a set of drivers (C libraries & executables) to be used by OEMs and ISV’s when integrating disk hardware. According to Intel’s SPDK introduction page, the goal of the SPDK is to highlight the outstanding efficiency and performance enabled by using Intel’s networking, processing and storage technologies together. SPDK is available freely as an open source product that is available to download through GitHub. It also provide NVMeF (NVMe Over Fabric) and iSCSI servers to be built using the SPDK architecture, on top of the user space drivers that are even capable of servicing disks over the network. Now this can potentially revolutionise how the storage industry build their next generation storage platforms.  Consider for example any SDS or even  a legacy SAN manufacturer using this architecture to optimise the CPU on their next generation All  Flash storage array? (Take NetApp All Flash FAS platform for example, they are known to have a ton of software based data management services available within OnTAP that are currently competing for CPU cycles with IO and often have to scale down data management tasks during heavy IO processing. With Intel DPDK architecture for example, OnTAP can free up more CPU cycles to be used by more data management services and even double up on various other additional services too without any impact on critical disk IO? I mean its all hypothetical of course as I’m just thinking out loud here. Of course it would require NetApp to run OnTAP on Intel CPUs and Intel NVMe drives…etc but it’s doable & makes sense right? I mean imagine the day where you can run “reallocate -p” during peak IO times without grinding the whole SAN to a halt? :-). I’m probably exaggerating its potential here but the point here though is that SDPK driven IO efficiencies can apply same to all storage array manufacturers (especially all flash arrays) where they can potentially start creating some super efficient, ultra low latency, NVMe drive based storage arrays and also include a ton of data management services that would have been previously too taxing on CPU (think inline de dupe, inline compression, inline encryption, everything inline…etc.) that’s on 24×7 by default, not just during off peak times due to zero impact on disk IO?

Another great place to apply SPDK is within virtualisation for VM IO efficiency. Using SPDK with QEMU as follows has resulted in some good IO performance to VM’s

 

I mean imagine for example, a VMware VSAN driver that was built using the Intel DPDK architecture running inside the user space using a dedicated CPU core that will perform all IO and what would be the possible IO performance? VMware currently performs IO virtualisation in kernel right now but imagine if SPDK was used and IO virtualisation for VSAN was changed to SW based, running inside the user-space, would it be worth the performance gain and reduced latency? (I did ask the question and Intel confirmed there are no joint engineering currently taking place on this front between 2 companies). What about other VSA based HCI solutions, especially take someone like Nutanix Acropolis where Nutanix can happily re-write the IO virtualisation to happen within user-space using SPDK for superior IO performance?

Intel & Alibaba cloud case study where the use of SPDK was benchmarked has given the below IOPS and latency improvements

NVMe over Fabric is also supported with SPDK and some use cases were discussed, specifically relating to virtualisation where VM’s tend of move between hosts and a unified NVMe-oF API that talk to local and remote NVMe drives being available now (some part of the SPDK stack becoming available in Q2 FY17)

Using the SPDK seems quite beneficial for existing NAND media based NVMe storage, but most importantly for newer generation non-NAND media to bring the total overall latency down. However that does mean changing the architecture significantly to process IO in user-mode as opposed to kernel-mode which I presume is how almost all storage systems, Software Defined or otherwise work and I am unsure whether changing them to be user-mode with SPDK is going to be a straight forward process. It would be good to see some joint engineering or other storage vendors evaluating the use of SPDK though to see if the said latency & IO improvements are realistic in complex storage solution systems.

I like the fact that Intel has made the SPDK OpenSource to encourage others to freely utilise (& contribute back to) the framework too but I guess what I’m not sure about is whether its tied to Intel NVMe drives & Intel processors.

If anyone wants to watch the recorded video of our session from # SFD12 the links are as follows

  1. Jonathan’s session on SPDK
  2. Tony’s session on RDT

Cheers

Chan

#SFD12 #TechFieldDay @IntelStorage

Excelero – The Latest Software Defined Storage Startup

 

As a part of the recently concluded Storage Field Day 12 (#SFD12), I had the privilege to sit in front of the engineering & the CxO team of the silicon valley’s newest (and should I say one of hottest) storage start up, Excelero, to find out more about their solution offering on the launch day itself of the company. This post is to summarise what I’ve learnt, and my honest thoughts on what I saw and heard about their solution.

Apologies about the length of the post – Excelero has some really cool tech and I wanted to provide my thoughts in detail, in a proper context! 🙂

Enterprise Storage Market

So lets start with a bit of context first. If you look at the total storage market as a whole, it has been growing due to the vast and vast amounts of the data being generated from everything we do (consumer as well as enterprise activities). This growth in storage requirements I believe will likely accelerate even faster in the future as we are going to be generating even more data and most of such data are likely going to end up on megascaler’s cloud storage platforms (public cloud). Due to this impact from Public cloud platforms such as AWS, Azure, Google cloud…etc. that suck up most of those storage requirements, the traditional enterprise storage market (where customers typically used to own their own storage) has been very competitive and is perceived to be going through a bit of a downward trend. This has prompted a number of consolidation activities across the storage tech industry and the obvious elephant in the room is the Dell acquisition of EMC  while similar other events include HPe’s recent acquisition of Nimble storage and Dell killing off its DSSD array plans etc. So, specially in this supposedly dwindling enterprise storage market (non cloud), the continuous innovation is critical for these enterprise storage vendors in order to compete with Public cloud storage platforms and demand a larger portion of this dwindling, enterprise storage pie. This constant innovation, often software & hardware lead rather than just hardware lead, gives them the ability to provide their customers with faster, larger storage solutions and most importantly a differentiated storage solution offering, together with added data management technologies to meet various 21st century business requirements.

Now, as someone that work in an organisation that partners with almost all of these storage tech companies (legacy and start-ups), I know fairly well that almost every single one of the storage tech vendors are prioritising the use of NVMe flash drives (in its various form factors) as a key focus area for innovation (in most cases, NVMe in itself is their only option without any surrounding innovation which is poor). This was also evident during the #SDF12 event I attended as almost all the storage vendors that presented to us touted NVMe based flashed drives as their key future road map items. Even SNIA – The body for defining standards in the Storage and Networking industry themselves included a number of new NVMe focused standards as being in their immediate and future focus areas.  (separate blog post to cover SNIA presentation content from #SDF12 – Stay tuned!).

Talking about NVMe in particular, the forecast on the NVMe technology roadmap going in to the future looks somewhat like below (image credit to www.nvmexpress.org) where the future is all focused around NVMe over fabric technologies that can integrate multiple NVMe drives across various different hosts via a high bandwidth NVMe fabric (read “low latency, high bandwidth network”).

 

Introduction to Excelero

-Company Overview-


Excelero is a brand new, Israel tech startup, that was founded in 2014 in Tel Aviv that has a base in Silicon valley, that just came out of stealth on the 8th of March 2017. Despite only just coming out of stealth, as you can see they have some impressive list of high end customers on the list already.

Their offering is primarily a SDS solution  with some real innovation in the form of an efficient (& patented) software stack, engineered to exploit latest development in the storage hardware technologies such as NVMe drives, NVMe over Fabric (NVMesh) and RDMA technology all working in harmony to boost NVMesh performance, unlike any other storage vendor that I know of (At this point in time). They have built a patented software stack focused around RDDA (Remote Direct Data Access which is similar to RDMA in how it operates – more details below) which by passes CPU (& memory to a level) on the storage controllers / nodes / servers when it comes to storage IO processing, such that it requires zero CPU power on the storage node to serve IO. If you are a storage person, you’d know the biggest problem is / has been in the storage industry for scaling performance up is the CPU power on the storage nodes / controllers, especially when you use NAND technologies such as SSD and this is why if you look at every All Flash Array, you’ll see a ton of CPU cores operating at a very high frequency in every controller node. Excelero is conveniently getting around this issue by de-coupling storage management & data and most importantly, using RDDA technology to bypass storage server CPU for disk IO (which is now offloaded to a dedicated RDMA capable network card (RoCE / RNIC) such as Mellanox Connectrix-3/4/5 card).

In a nutshell (and this is for the sales people that reads this) – Excelero is a scale out, innovative, Software Defined Storage solution that is unlike any other in the market right now. They use next generation storage & networking hardware technologies to provide a low cost, extremely  low latency, extremely high bandwidth storage solution for specific enterprise use cases that demands such a solutions. Excelero claims that they can produce 100 million IOPS along with some obscene bandwidth throughputs with little to no CPU capacity on the server side (storage nodes) and I believe them, especially after having seen a scaled down demo in action. I would say they are probably one of the best if not the best solution of its kind available in the market right now and that is my honest view. Please read on the understand why!

-Use Cases-

Typical use cases Excelero initially aims to address are any enterprise business requirements looking for a high bandwidth, ultra low latency, block protocol storage solution. Some of the typical examples are as follows.

                                           

 

Technical Overview

Advanced warning:

  • If you are a SALES GUY, FORGET THIS SECTION!. I’m certain it aint gonna work for you! Honestly, don’t waste your time. Just skip this section and read the next section 🙂
  • If you are a techie however, do carry on.

It is important first of all to understand some key concepts used within the Excelero solution and their architectural overview.

-NVMesh Architecture-

The NVMesh server SAN Architecture that Excelero has built is a key part of the Excelero software stack & given below is a high level overview.

 

NVMesh is the technology using which Excelero aggregates various remote NVMe drives in to a single pool of drives that are accessible by all participating storage nodes over the NVMe fabric network, but most importantly as local drives (with local drive characteristics such as latency)

  • NVMesh design goals:
    • Performance, Scalability, Integration, Flexibility, Efficiency, Ease of use
  • Components involved
    • Control path
      • Centralised management for provisioning, management & all control activities. (All intelligence reside at this layer)
      • Runs as a Node.js application on top of MongoDB
      • Pools drives, provisions volumes and monitor
      • Transforms drives from raw storage into a pool
      • Also includes topology manager which
        • Runs on all the nodes as a distributed service
        • Implements the cluster management and HA
        • Performs volume lifecycle management
        • Uses Multi-RAFT protocols avoid split brain and RAIN data protection across a large node cluster
        • Communicates with the target software that runs on the storage nodes
      • Rest-full API for integration with automation and external orchestration platforms such as Mesos (Kubernetes support is on the road map)
      • Docker support with a persistent storage plugin available
    • Data path
      • Kernel module that manages drives and act as a storage server for clients
      • Provide true convergence which removes target module from the data path (CPU) such that storage nodes can run applications and other data services on the server nodes without CPU conflict / Impact (node that this is a new definition to hyper-converged compared to other HCI vendors such as VMware / Nutanix …etc.)
      • Point to point communication with other storage nodes, management & clients
      • NVMesh client
        • Intelligent client block driver (Linux) where client side storage services are implemented
        • Kernel module that presents logical volumes via the above block driver API
        • No need for iSCSI or FC as the Excelero solution is not using SCSI protocol for communication
      • NVMesh target
    • Hardware components
      • Standard X86 servers with NVMe drives (more details on HW below) & RNIC cards
      • Next generation switches
    • Software components (Excelero Intellectual Property)

Note that the current version of NVMesh (1.1) is supported on Linux only and lacks any specific data services such as de-duplication & compression etc but these services are included in the roadmap. Based on what was disclosed to us, some these future improvements include QoS capabilities, Additional drive media support, additional Hardware architecture support, Non-Linux OS support, Additional deployment methods, reduced power configurations as well as integration with Software Defined Networking solutions which sounds very promising as a total solution as was very good to hear.

-RDDA (Remote Direct Data Access)-

RDDA is the patented secret sauce that Excelero has developed which is the other most important part of their solution stack. It works hand in hand with the NVMesh software stack (described above) and its primary purpose is to avoid the use of CPU on the storage nodes, when processing client IO.

They key points to note about the RDDA technology is

  • Developed a while ago by Excelero to fill a gap that was in the market. (A new replacement for this is coming soon)
  • No CPU utilisation on the target side – This is achieved through the RDDA technology which bypasses the target side CPU
  • RDDA only works with NVMe & is highly optimised for NVMe & remote NVMe access (so if you use non NVMe drives, which is also possible, there are no RDDA capabilities)
  • However if used in converged mode (centralised storage), RDDA is not used so target side CPU will have an impact

Now I’m not an expert of how a typical NVMe read / write would occur and all the typical sub-protocol level steps that are involved. But from what I gathered based on their architect’s description, given below are the high level steps involved in Excelero’s RDDA technology when it comes to local and remote NVMe writes and reads. I’ve included this in order to explain at a high level how this technology differs to others and why the CPU is no longer necessary on the target side.

Pre-requisite knowledge: If you are unfamiliar with certain NVMe operations & techniques such as submission queue, completion queue, doorbell..etc which are typically used in a NVMe I/O operation, refer to this article for a basic understanding of it first before reading the below in order to make better sense. The below image is from that article

Now that you supposedly understand the NVMe commend process, lets have a look at Excelero’s high level implementation of RDDA (based on my understanding – Actual process may slightly differ or have many more steps / validations…etc.)

  • Each client side read or a write will always result in a single RDMA write that is sent from the client side directly in to the destination storage node. There are 3 pieces of this NVMe write that occur on the storage node.
    • Local write (if to a local NVMe disk) or a remote write (if reading or writing to a remote NVMe drive) to the NVMe drive’s submission queue. This include any data to be written if that’s a write operation where write data is put in to the local or remote memory and is referenced within this write in the submission queue.
    • Also writes in to the RNIC’s queue (memory) a message (memory buffer called a “Bounce buffer” which is made to point to the completion queue of the NVMe drive) to be sent back to the client (imagine a sort of a pre-paid envelop). This is effectively a pre-prepared message to be sent back to client (to be used if it’s a read request)
    • Ring the doorbell on the NVMe drive (to start executing) after which the NVMe drive on the storage node will start acting to execute the IO operation. (Note that so far, there were no CPU operations on the storage node as no storage side SW has been used).
      • If it’s a read op, it will then fill the bounce buffer that was pre-prepared in the step 2 above with the data that is read from the NVMe disk.
      • If it’s a write op, it will write the data that was put in to memory (above) to the disk.
  • Once the above NVMe operation is complete, it will generate 2 things
    1. Small completion (less important part)
    2. Most important part is that it then generate a MSI-X interrupt. Typically speaking, MSI-X interrupts are targeted at the CPU which is why during IO cycles, the CPU utilisation on the storage controllers go up. Unlike in a typical MSI-X interrupt’s case, with Excelero RDDA, the MSI-X interrupt is pre-programmed to not go to the CPU but instead, go ring the doorbell of the NVMe NIC. Upon this doorbell, the NVMe NIC will then send the pre-prepared Bounce buffer (step 2 above) which was pointing at the completion queue of the NVMe drive read data if that was a read operation, or if that was a write op, it would send the completion queue details along with some other data back to the client via the NVMe fabric, involving “ZERO” CPU on the storage node. This bit is the Excelero’s patented technology that allows them to NOT use any CPU on the storage server side during any IO operation no matter how big the IOPS / bandwidth is

You’ll see from this high level operation flow that during the whole IO process, no server side CPU was ever required & all the IO processing were carried our by the RNIC’s and the NVMe drives, and the traffic transported across the NVMe fabric (network). Note however that on client side, CPU is used as normal (similar to any other client accessing storage). Also note that there is no concept of caching to memory when it comes to write IO (no nvram) and therefore the acknowledgement is only sent back to the client once the data is written to the NVMe drive.

The current RNICs used within the Excelero solution are Mellanox (to be specific, Mellanox Connectrix-3/4/5) but they’ve mentioned that it could also work on Q-Logic RNIC cards too. Excelero also indicated that they are already working with another large networking vendors for additional RNIC card support, though didn’t mention exactly who that is (Cisco or Broadcom??)

-RAIN Architecture-

Excelero uses a concept of RAIN (similar to RAID) when it comes to provisioning volumes and providing high availability. Key details are as follows.

  • A logical volume is a RAIN data protected volume which consist of one or more partial drives. (i.e. high performance volume may include a number of drives but only a part of each drive’s capacities may be used)
  • The key differentiator here is that as opposed to RAID which is across local disks, RAIN is across disks from multiple host (similar to network RAID)
  • Current version of the solution support RAIN10 (imagine RAID 10 over the NW but this time over NVMe fabric)
  • Erasure coding would be available soon

-Deployment Architecture-

Excelero solution is supported as both Local Storage in Application server mode (similar to Hyper-Converged but without a hypervisor) as well as converged (centralised storage) mode and each method will have certain limitations (such as RDDA not being applicable in a converged deployment). You can supposedly deploy them in mixed mode too though I don’t quite remember who that works.

 

 

Performance

Excelero boasts some obscene performance levels to produce 100 million IOPS with little to no CPU capacity on the storage nodes which totally seems believable. It is important to understand that Excelero doesn’t necessarily promise to deliver more IO / bandwidth than a NVMe disk manufacturer claims possible from a raw NVMe disk, but they ensure that the maximum possible capability can be extracted from a NVMe drive with NAND storage when used with their solution.  In order to do that,  Excelero NVMesh & RDDA technologies combine to present remote NVMe drives as local drives which makes a significant difference to the latency that is possible with no CPU penalty on server side. No other storage vendor can provide such capability as far as I know (Perhaps, except for HPe’s prototype “Machine project” that is supposedly looking at the use of Photonics to reduce the distance between processors and persistent local and remote memory which is used as storage. However this is not a shipping product and its doubtful whether it would ever ship, and if does, its likely going to cost an obscenely high amount, compared to low cost option available with Excelero)

In order to see whats possible, we were treated to a live demo of their platform running on a bunch of Supermicro commodity server hardware with lower scale Intel Xeon CPU’s and a mix of Intel and Samsung NVMe drives interlinked with Mellanox ConnectX-5 100Gbs Ethernet RoCE adaptors and a Dell Z9100-ON network switch.

Its fair to say, the demo results blew our socks off!!! And we did have some storage industry stalwarts in our delegate panel at #SFD12 who’s been there in the storage industry since its enterprise inception and every one of those guys (and myself) were grinning from the left year to the right, when watching this demo and the stats that came out (that’s a “GOOD” thing in case it wasn’t clear :-)).

When 4 x Intel 400GB NVMe drives from 4 different hosts were used in a single storage pool and performed a random read operation with 4K IO’s, the stats we saw were around 4.9 million IOPS @ 25GB/s bandwidth and less than or around 200µs of latency (consistently) – This was somewhat previously unseen!

When the write test was shown (on the same disk pool across 4 servers) with 4K writes, the results were around 2.4 million IOPS @ < 200µs latency.

We were also shown the CPU stats during the IO operations where the CPU utilisation on each storage node was hovering around 0.84% throughout the entire demo to prove the RDDA technology clearly bypasses the server side CPU which was super impressive. These IOPS & latency figures were more than good  numbers and while I couldn’t verify this, Excelero team mentioned to us that the server configs used for this demo only cost them $13,000 each which, if true, is a massive cost saving that is potentially available for all future Excelero customers here.

With RDDA, IO performance is now offloaded from the storage node CPU to the RNIC & the Ethernet fabric and while their saturation points are much higher than in a traditional architecture that relies on host CPU, the capabilities of the fabric the RNIC cards are now likely going to be the performance bottleneck in the future so if someone’s architecting a solution using Excelero, it would be pretty important to make the appropriate design choices on the fabric and the RNIC cards to make the solution future proof.

 

Solution Licensing & Costs

We didn’t discuss much around the costs and we were not privy to list pricing, however its very likely that that the licensing & costs would be,

  • Flexible pricing, and likely going to cost similar to that of a matching All Flash Array but Excelero would provide around 20-30% more performance
  • Can be licensed per NVMe drive or per server (Not penalised for capacity of the drive which is good)
    • Hyper-Converged: Storage nodes where the storage is brought local to application servers (no hypervisor involved)
    • Centralized (Converged): Separate price for storage nodes and client nodes

 

Customer Case Studies

Despite being a fresh start-up, they have some impressive customers already on board such as NASA, PayPal, Dell, Micron, Broadcom, HPe, Intel and most importantly they are an integral part of the LinkedIn’s Open19 project (New open standard for servers, storage & networking – I will produce a separate article on Open 19 in the future and much line the OCP project, its going to help define the datacenter of tomorrow).

In the case of the NASA’s deployment of Excelero, we were made to understand that the solution is capable of doing around 140GB/s write throughput across 128 servers which is astonishing.

Some popular customer case studies that are publicly referenceble are as follows

 

My Thoughts

Put simply, I like their solution offering….! A lot…!! – No wait… that’s an understatement. I absolutely love their solution offering and the level of innovation they seem to have put in to it. For certain workloads that are pure performance centric and cares less about advanced data services, I think Excelero solution would be the one to beat, at least in the short term in the industry.

As of right now, if you look at all hardware and software defined storage solutions that are generally available to purchase in the market, in my view, Excelero has a unique offering when it comes to its target market. Some of those uniqueness comes from the below points

  • They have the only virtual SAN (SDS) solution that will harness shared NVMe (NVMe over fabric) in the market today
  • Unified NVMe which enables sharing storage across a network but still accessed at local speeds and latencies
  • No CPU impact for storage IO on the storage node
  • Flexible architecture that provide hyper-converged as well as converged (disaggregated) architectures

There are a ton of SDS solutions out there in the market, some from the legacy storage vendors such as Cisco, HPe, EMC as well as dedicated SDS vendor start-ups (that are no longer start-ups such as Nutanix, VMware vSAN, HedVig…etc.) and typically most of these solutions will look at using industry standard disk drives with industry standard server hardware (X86) plus, their own storage software stack on top, typically as a dedicated (read “centralised”) storage node or as a hyper-converged offering (read “de-centralised”).  Excelero is no different from an architectural perspective to most of these SDS vendors. However due to its  additional unique capabilities such as RDDA & NVMesh, their SDS solution stack is likely going to be very attractive for most high end, ultra low latency storage requirements where no other SDS or a even a purpose build All Flash Array will no longer be able to compete anymore. This is precisely Excelero’s initial target market and I would presume they would do very well within that segment, provided that they get their marketing message right.

However given the current lack of advanced data services, it is unlikely that they’d replace more mature, HCI or SDS offerings that are much richer in advanced data services such as VMware VSAN, Nutanix  as well as All Flash Array offerings from the likes of NetApp (All Flash FAS or SolidFire), HPe 3Par All Flash, EMC XtremIO, when  it comes to more common purpose mixed use cases, such as virtualisation platforms or VDI. Having said that, you can argue that Excelero’s offering is a version 1 product right now and future versions will add these missing advanced data services which will make it equally competitive or even better. Due to the lack of CPU dependency on storage IO processing, Excelero can afford to overload it’s storage nodes with so many advanced data services, all run inline and always on without any impact on any IO performance which is a big headache other All Flash Storage Array or SDS vendors cannot avoid. So in theory, Excelero’s storage platform in time could be even superior than it is today.

At present, Excelero has 2 Go To Market (GTM) routes.

  1. The obvious one is direct to customers in order to address their key use case (ultra low latency, high IOPS / Throughput use cases)
  2. The other route to market is working with OEM manufacturers

While Excelero will continue to enhance their core offering available under 1st GTM route above, I can see many other OEM vendors such as the legacy storage vendors wanting to license Excelero’s patented technology such as RDDA in order to be used in their own storage systems and this could well be quite popular revenue stream for Excelero. In my view, having an awesome technology doesn’t necessarily ensure the survival of a tech start up and this 2nd GTM route may well be the obvious starting point for them, though then they are potentially limiting the technical advantage they have on the GTM 1.

Either way, they have an awesome, credible and based on current customers, a popular storage technology and I sincerely hope the business leaders within the company will make the strategically best decisions on how to monetize this exciting technology they have. Given the lack of similar technology from other vendors at the moment, Excelero may have a slight advantage but the competition is unlikely to sit and wait as they too will likely work on similar but different architectures. Intel have already hinted to us during our session at #SFD12 that there may well be a newer replacement to NAND based drives coming out soon from Intel and pretty soon, NVMesh & RDDA could well be a thing of the past before you know it so I hope my friends in Excelero act fast.

Finally, a group photo of the Excelero team that presented to us along with the #SFD12 delegates panel 🙂

Additional Reading

If you are keen to explorer further in to Excelero and their solutions, the obvious place to start is their web site but if you would like to watch the recording of the presentation they gave to #SFD12 team, its available here.

 

If you have any questions or thoughts, please feel free to submit a comment below

Slide credit goes to Excelero and Tech Field Day

Thanks

Chan

#SDF12 #Excelero

 

 

 

Storage Field Day (#SFD12) – Vendor line up

Following on from my previous post about a quick intro to Storage Field Day (#SFD12) that I was invited to attend in San Jose this week as an independent thought leader, I wanted to get a quick post out on the list of vendors we are supposed to be seeing. If you are new to what Tech Field Day / Storage Field Day events are, you’ll also find an intro in my above post.

The event is starting tomorrow and I am currently waiting for my flight to SJC at LHR, and its fair to say I am really looking forward to attending the event. Part of that excitement is due to being given the chance to meet a bunch of other key independent thought leaders, community contributors, Technology evangelists from around the world as well as the chance to meet Stephen Foskett (@SFoskett) and the rest of the #TFD crew from Gestalt IT (GestaltIT.com) at the event. But most of that excitement for me is simply due to the awesome (did I say aaawwwesommmmmmeee?) list of vendors that we are supposed to be meeting with to discuss their technologies.

The full list & event agenda goes as follows

Wednesday the 8th

  • Watch the live streaming of the event @ https://livestream.com/accounts/1542415/events/6861449/player?width=460&height=259&enableInfoAndActivity=false&defaultDrawer=&autoPlay=false&mute=false
  • 09:00 – MoSMB presentation
    • MoSMB is a fully compliant, light weight adaptation of SMB3 made available as proprietory offering by Ryussi technologies. In effect its a BMS3 server on Linux & Unix systems. They are not a technology I had come across before so really looking forward to getting to know more about them and their offerings and their partnership with Microsoft…etc.
  • 10:00 – StarWind Presents
    • Again, new technology to me personally, which appears to be a Hyper-Converged appliance that seem to unify commodity server disks and flash with multiple hypervisors. Hyper-Converged platforms are very much of interest to me and I know the industry leading offerings on this front such as VMware VSAN & Nutanix fairly well. So its good to get to know these guys too and understanding what are their Unique Selling Points / differentiators to the big boys.
  • 13:00 – Elastifile Presents
    • Elastic Loud File System from Elastafile is supposed to be able to provide application level distributed file / object system spanning private cloud and public cloud to provide a hybrid cloud data infrastructure. This one is again new to me so keen to understand more about what makes them different to other similar distributed object / storage solutions such as HedVig / Scality from my perspective. Expect my analysis blog post on this one after I’ve met up with them for my initial take!
  • 16:00 – Excelero Presents (hosted at Excelero office in the Silicon Valley)
    • These guys are a new vendor that is literally due to launch themselves on the same day as we speak to them. Effectively they don’t exists quite yet. So quite exciting to find out who they are what they’ve got to offer us in this increasingly growing, rapidly changing world of enterprise IT.
  • 19:00 – Dinner and Reception (Storage Cocktails?) with presenters and friends at Loft Bar and Bistro in San Jose
    • Good networking event with the presenters from the day for peer to peer networking and further questioning on what we’ve heard from them during the day.

Thursday the 9th of March

  • 08:00 (4pm UK time) – Nimble Storage Presents
    • Nimble are a SAN vendor that I am fairly familiar with and have known them for a fairly long time and I also have few friends that work at Nimble UK. To be fair, I was never a very big fan of Nimble personally as a hybrid SAN vendor as I was  more a NetApp, EMC, HPe 3Par kinda person for hybrid SAN offering which I’ve always thought offer the same if not better tech for roughly a similar price point, with the added benefit of being large established vendors. Perhaps I can use this session to understand where Nimble is heading now as an organisation and what differentiators / USP’s they may have compared to big boys and how they plan to stay relevant in an industry which is generally in decline as a whole.
  • 10:45 – NetApp Presents (At NetApp head office in Silicon Valley)
    • Now I know a lot about NetApp :-). NetApp was my main storage skill in the past (still is to a good level) and I have always been very close to most NetApp technologies, from both presales and deliver perspective and was also awarded as the NetApp partner System Engineer of the Year (2013) for UK & Ireland by NetApp. However since the introduction of cDOT properly to their portfolio, I’ve always felt like they’ve lost a little market traction a little. I’m very keen to listen to NetApp’s current messaging and understand where their heads are at, and how their new technology stack including SolidFire is going to be positioned against other larger vendors such as Dell EMC, HPe 3Par as well as all the disruption from Software Defined storage vendors.
  • 12:45 (20:45 UK time) – Lunch at NetApp with Dave Hitz
    • Dave Hitz  (@DaveHitz) who was the NetApp founder is a legend… Nuff said!
  • 14:00 – Datera Presents
    • Datera is a high performance elastic block storage vendor and is again quite new to me. So looking forward to understanding more about what they have to offer.
  • 19:30 – San Jose Sharks hockey game at SAP Center
    • Yes, its an evening watching a bit of Ice Hockey which, I’ve never done before. To be clear, Ice Hockey is not one of my favourite sports but happy to take part in the event :0).

Friday the 10th of March

  • 09:00 (17:00 UK time) – SNIA Presents (@Intel Head office)
    • The Storage Networking Industry Association is a non profit organisation made up of various technology vendor companies.
  • 10:30 (18:30 UK time) – Intel Presents (@Intel Head office)
    • I don’t think I need to explain / introduce Intel to anyone. If I must, they kinda make some processors :-). Looking forward to visiting Intel office in the valley.

All and all, its an exciting line up of vendors and some old and some new vendors which I’m looking forward to meeting.

Exciting stuff, cant wait…! Now off to board the flight. See you on the other side!

Chan

 

The impact of digital revolution on software licensing – Or is that the other way around?

I happened to come across the below post which, after reading got me thinking about few things which I thought would be a good idea to write a quick post about and get everyone else’s thoughts too.

http://diginomica.com/2017/02/20/sap-v-diageo-important-ruling-customers-indirect-access-issues/

The article was effectively about a court battle between SAP (an enterprise SW vendor, that in their own admission is “The market leader in enterprise application software”) & Diageo (the drinks manufacturing giant) where SAP was suing Diageo to secure additional licensing revenue for indirect use of the data produced by SAP system software. If you didn’t read the full abstract via the above link, what that essentially meant was that when the data SAP generates (for the legal, fee paying customers that is Diageo) is accessed by a 3rd party, presumably for proving Diageo with a service, that SAP needs to be paid additional licensing revenue for that indirect usage which is the responsibility of Diageo.

In this case, this ability for SAP to claim additional licensing revenue from Diageo was in its contract which was why it was ruled in SAP’s favour by the judge (according to the article). While admitting that I am NOT a legal eagle, this brings the question to my mind that, if this in fact is the final verdict on this case (which I’m sure would be challenged in appeals court…etc.), is this approach fair, especially in a world faced with a massive digital revolution where everything, starting from a small electronic device, to a large multi-node machine, to a piece of software are all connected through digital technologies  to relay data from one to another with the intention of processing and re-processing data as it’s being parsed through each piece of software (disparate consumption)?

In my line of work (IT), almost all IT systems are interconnected and that interconnection is typically there for one system to consume the data produced by another system and the number of hops involved in this inter-connection chain can go up from a couple of systems to few dozen on how digital each customer’s environment is. In a pre “Digital Enterprise” world, typically all these connection hops (i.e. IT Systems) belong to one department, one business unit, or worst case, one organisation and therefore typically licensed to be used by that department / Business unit / Organisation (which covers all the users of that department / business unit / organisation).

But the digital revolution currently sweeping across all forms of industries will increase such inter-connectivity of Software systems to go beyond one organisation as multiple organisations will collaborate through data sharing, often real time across various platforms in order to create a truly digital enterprise. Some of these type of digital integration is already common place, especially amongst finance sector customers…etc. Such digital connectivity of software platforms across organisations will now likely be relevant to many other organisations which previously would have thought that they are mutually exclusive when it comes to their business operations.  So I guess my questions is if the software that underpins those key digital connectivities happened to have such attitude to licensing like SAP did in the instance above, what would be the implications on true digital connectivity, across multiple software platforms? Are we fully aware of the exact small print of each and every software we ever use within our business to fully understand how each one of them define its permitted usage, who’s classed as its users are and where we can connect it to other systems vs where we cannot? How do we know precisely that we are not violating such draconian licensing laws during this multi-platform, API driven, digital inter-connectivity?

What do you think? Just curious on getting people’s views as I’m sure there’s no right or wrong answer. Do you think such draconian licensing rules are wrong or would you argue given a dwindling market for Independent Software Vendors (courtesy of public cloud), they should be allowed to benefit from not just direct but also indirect (2nd and 3rd level) interaction with data initially produced by their software systems? If you would, do you then have the sufficient licensing expertise in house to ensure that you are not violating software licensing agreements that you often sign up to without reading them? If you don’t have in house knowledge, do you have a trusted partner who can advise on such licensing matters in an increasing complex, digitally inter-connected world including Cloud platforms (PaaS, SaaS) and help you achieve a true digitally connected enterprise without paying over the odds for software licenses?

Keen to see what others think!

Cheers

Chan

#DigitalEnterprise #Connected #IoT #DigitalRevolution #Licensing

HPE Ambassador Summit 2017 – A Quick Review

A quick post to summarise the HPE Ambassador summit 2017 that I’ve attended last week as a HPE partner Ambassador from the UK.

Introduction

According to HPE, the World Wide Ambassador Summit, known as WWAS, is the HPE’s Enterprise Group’s (‘EG’ as referred to internally) highest rated conference for sharing technology experience. It is an exclusive event for HPE champions from within HPE organisation globally, as well as from global partner organisations such as those in the channel.  During the event, HPE ambassadors are given a chance to be in front of the HPE senior executives and most importantly, core product engineers from various different business units that are responsible for current and future product developments. The event is a predominantly technical event and idea is supposed to be to discuss current and future strategy of HPE, share deep dive product and technology information as well as sharing product roadmap information (typically subject to Non Disclosure Agreements) with HPE ambassadors from around the world. HPE Ambassador summit is not something similar to HPE Discover (which is more of a public exhibition type of event to discuss some technical as well as commercial information) or even HPE TSS (Technology and Solutions Summit is an event for typical HPE solutions architects and Presales consultants, both from HPE and partners to discuss technical details of HPE products and services). It is supposed to be lot more technical than the TSS event itself, only open to a handful of HPE advocates within HPE itself as well as the partner community and the content is supposed to include road map information far in to the future of HPE.

For the 2nd consecutive year, it was held at Hilton Anatole conference center in Dallas, Texas which is where its earmarked to be hosted for one more year in 2018 before moving to a different location in Houston in 2019.

As for the attendees, there were supposed to be around 1000 total attendees from around the world covering all the geos including around 400 HPE Ambassadors as well as about 100 partner ambassadors globally. From the UK in particular, there were the following partner HPE ambassadors  representing the UK HPE channel partners (list doesn’t include HPE employees) that attended this year.

2017 Ambassador Summit

The ambassador summit was eventful. There were various sessions covering all if not most of HPE technologies such as ProLiant, Synergy & Apollo servers, 3Par storage, HPE networking, Aruba networking…etc.. as well as software solutions such as CloudSystems, OneView…etc. There were also sessions fron HPE partner technologies and their integration with HPE such as Arista, Docker, Kubernetes, Mesos…etc as well as then partner, now part of HPE, Simplivity (However Simplivity sessions were limited to current technology information about their current products rather than roadmap info together with HPE due to the formal acquisition being yet to complete).

In total, apart from the opening key note speeches delivered by HPE executives every day, I’ve attended the following sessions during the day and the evening. I’ve only listed the session title as I am not at liberty to discuss specific details mentioned in the session due to NDA being in place.

  • Composable Strategy & Futures
    • Presentation about HPE Synergy converged compute, server & networking platform and its current & future strategy
  • OCA Rollout, Roadmap & Hands-On Workshop 
    • Presentation of a potential new presales tool. Somewhat similar (but more advanced) to OCS that is publicly accessible right now.
  • HPE Container Strategy & Roadmap
    • Clue is in the name
  • HPE Servers Gen10 Primer
    • Information about upcoming HPE ProLiant gen 10 hardware
  • Converged Edge Systems & Use Cases
    • Edge analytics and how HPE’s portfolio help. Great session!
  • Deep Dive: CloudSystem Architecture
  • Composable Strategy & Futures
    • An interactive session about the digital disruption and its impact on organisations and their IT and how HPE’s composable infrastructure based on Synergy platform will be shaped in its future to keep up with demand.
  • Reference Architecture Strategy & Roadmap
    • All information available here
  • Accelerate your Docker Deployment with HPE
    • Clue is in the name
  • Docker Containers Demystified
    • Back to basics
  • Clouds, Containers, DevOps, Kubernetes, Docker, Mesosphere & Stackato
    • Complete, up to date guide on cloud native infrastructure and how HPE plays a vital role in it. Great session!
  • Virtualize your Data Center with Arista (Integration with NSX, OpenStack & Docker)
    • Delivered by Arista SE. Great session!
  • Arista CloudVision Hands-On Lab
    • Delivered by Arista SE team. Great session!
  • Enabling Hybrid Cloud with HPE ProLiant for Microsoft Azure Stack Solution
    • Azure stack HW information from HPE though more details will be available after the Azure Stack Airlift event in March.

It was a good event from an organisation perspective, very similar to the last years event. There were some valuable sessions with deep technical details as well as some mediocre sessions where the content was either poor or not relevant to the title itself but this is somewhat to be expected in any event, that consist of various presentations from various sources. All in all, it was becoming obvious that HPE is in the midst of a massive, mostly necessitated transformation, from a legacy hardware vendor to something that has to be relevant in the current world of IT that is massively impacted by Software Defined Everything (SDx) including the Public Cloud (which is also SDx). From my perspective, its too early to say whether HPE is on the right track but their messaging is good.

I personally would prefer HPE to spend more focus on its software stack and last years’ ambassador event was full of sessions on how the HPE software will transform and complement their hardware stack. I loved most of those software as they were full of potential (CSA, Helion Stackato, Helion Eucalyptus…etc.). However it was public knowledge that  since then, HPE has spun-off of its non core software to a new entity formed with Micro Focus and while HPE shareholders still retains majority rights in the new company, I’m unsure whether HPe would have direct control over Micro Focus’s future direction which is a little worrying. So at present, it appears that HPE is somewhat in a transiotionary period where its role in a world dominated by SDx is not yet fully defined which is understandable. Having said that, HPE is focusing more and more on partnering up with disruptive cloud native software vendor technologies that are currently shaping the way how applications are developed and how data centers are run in a cloud like manner and these new partnerships include the likes of Docker for container integration (with HPE’s Synergy HW platform), Google Kubernetes, Mesosphere for container orchestration and their integration with HPE composable infrastructure as well as Arista & Nokia DCN integration with HPE product ecosystem. This approach could well pay off for HPE in the short to medium term as these technologies are very popular right now and most organisations looking at Public Cloud as a final destination of a strategic IT roadmap are exploring things like containerisation & cloud native app development as potential phases of that journey and will benefit from HPE’s closer integration of their own HW platform with these technologies which will simplify the adoption of such technologies in the DC.

In addition to these, HPE is also developing on existing partnerships with legacy datacenter and cloud vendors such as VMware (for VMware Cloud Foundation) & Microsoft (for Azure Stack) as well as increasing co-development with Suse & RedHat as technology partners which is good news as some customers would always would want to retain their on-premise datacenter and the legacy application stack on it. Therefore, the continued integration of HPE datacenter infrastructure with these vendors will only help future customers.

I am not at liberty to disclose anything specific about the information released to us during the event due to the governing NDA, however it should be permissible for me to state that it was damningly obvious that unlike the previous HP that was focused mainly on x86 computing, the new HPE is looking at various different markets and various different partnerships with SW manufacturers to stay relevant to every different IT use case in the data center, be that typical X86 computing (DC),  Cloud native app development or consumption (cloud), Networking including wireless networking as well as computing at the edge (IoT)…etc. which is good to see however whether the strategy would work in somewhat of a diminishing market due to public cloud will remain to be seen.

A word must also be mentioned about the level of Hardware centric knowledge and the continued innovation on the HW front by HPE engineering, which seems ever so impressive. Some of these next generation hardware solutions were on display during an engineering night that was held on one of the evenings and other hardware vendors would likely have looked at these in awe, no doubt. As a techie at heart that came from hardware background, this was really pleasing for me to see. However in an increasingly software defined world of IT, I sincerely wish & hope HPE will put an equal (if not a stronger) emphasis on accompanying software to complement the geniuses of  their hardware going forward. Without this dual approach, survival is likely going to be difficult for legacy hardware vendors such as the  likes of HPE / Cisco…etc. and the same thought process likely played a part in Dell (HW) merger with EMC & VMware (SW) not long ago.

Final Thoughts

HPE WWAS is somewhat of a unique event that brings together global HPE champions from around all geos and core HPE business leaders and product engineers for discussions around products and strategy. No other vendor (that I know of) host similar events in person as similar programs such as VMware vExpert programme, NetApp A-Team programme, Cisco Champion program are more focused around granting similar advocates of their respective technologies a title and access to some restricted information through online events (i.e. Webex events) amongst other things but unlike HPE, no other vendor go all the way to host a  dedicated event annually to bring them all together. I believe this is an expensive event to host logistically for HPE and for those attendees to attend, however the idea behind it is a good one and HPE should be applauded for their effort in continuing this format of the event.

The event overall was a good one for me and I’ve picked up number of existing areas to follow up on as well as couple of new HPE solutions I never knew existed to follow up on. Above all else, I’ve managed to network with some key people (technical & executive) at HPE which is often worth the attendance on its own as well as discuss and provide feedback to HPE on their current and future strategy which as a partner and an ambassador, is an important job for me to do.

Massive thanks to HPE for the invitation and various HPE UK channel team for hosting us including @idoneus_brown, @Eugatwork & @AndyDSawyer. It was also great to have caught up with my friends from all other HPE channel partners in the UK. Our respective employers might be competitors out in the field, but its refreshing to see that us techies are still friends 🙂

Cheers

Chan

 

Storage Field Day (#SFD12) – A quick intro!

I’ve been very fortunate enough to be invited to attend the popular Tech Field Day (#SFD12) to be held in March 2017 in Silicon Valey so a quick post to share my initial thoughts & about the event itself.

Tech Field Day is a popular, invitees only, an independant IT influencer event organised and hosted by  Gestalt IT (GestaltIT.com). The idea behind the event is to bring together innovative technology product vendors and independant thought leaders from around the globe with an active community contribution to share information and opinions interactively.  There are various different field day events such as Tech / Storage / Cloud / Mobility / Networking / Virtualisation / Unified Communications / Wirelesss Field Day events that take place throughout the year with respective technology vendors. It’s organised by the long time leader Stephen Foskett (@SFoskett) and has always been an extremely popular event amongst the vendors as it provides an ideal opportunity to present their new products and solutions to a number of thought leaders and community influencers from around the world and get their valueable thoughts & feedback.

I’ve been wanting to attend as a delegate for a while now but as the event was an invitee only event for the delegates, I wasn’t able to just sign up and attend. However this time around, I was extremely lucky to have been invited attend the next event Storage Field Day (#SFD12) in San Jose on March 7th-10th which I’m now looking forward to.

The details around the SFD12 event that I will be attending, including the rest of the invited delegates as well as the presenting vendor details are all available here. I will aim to provide a summary outlining my thoughts on various technologies & solutions we are going to be discussing about focusing on not just their technical value but also the business value to potential customers so stay tuned…!

In the meantime, if you would like to attend a fiuture Tech Field Day Event, all the information you need and how to apply are listed here. If you would like to see what the typical event sessions look like, have a look at their youtube feed here for past event recordings.

Thanks

Chan

 

VMware vExperts 2017 Annouced!

The latest batch of VMware vExperts in 2017 was announced last week on the 8th of February and I’m glad to say I’ve made the cut for the 3rd year which was fantastic news personally. The vExpert programme is VMware’s global evangelism and advocacy programme and is held in high regards within the community due to the expertise of the selected vExperts and their contribution towards enabling and empowering customers around the world with their virtualisation and software defined datacentre projects through knowledge sharing. The candidates are judged on their contribution to the community through activities such as community blogs, personal blogs, participation of events, producing tools…etc.. and in general, maintaining their expertise in related subject matters. vExperts typically get access to private betas, free licenses, early access product briefings, exclusive events, free access to VMworld conference materials, and other opportunities to directly interact with VMware product teams which is totally awesome and in return, help us to feed the information back to our customers…

Its been a great honour to have been recognised by VMware again for this prestigious title and I’d like to thank VMware as well as congratulate the other fellow vExperts who have also made it this year. Lets keep up the good work…!!

The full list of VMware vExperts 2017 can be found below

https://communities.vmware.com/vexpert.jspa

My vExpert profile link is below

https://communities.vmware.com/docs/DOC-31313

Cheers

Chan

 

New Dedicated VSAN Management Plugin For vROps Released

Some of you may have seen the tweets and the article from legendary Duncan Epping here about the release of the new VMware VSAN plugin for vROPS (vRealize Operations Management Pack for vSAN version 1.0)

If you’ve ever had the previous VSAN plugin for vROps deployed, you might know that it was not a dedicated plugin for VSAN alone, but was a vRealize Operations Management Pack for Storage Devices as a whole which included not just the visibility in to VSAN but also legacy storage stats such as FC, iSCSI and NFS for legacy storage units (that used to connect to Cisco DCNM or Brocade Fabric switches).

This vROps plugin for vSAN  however is the first dedicated plugin for VSAN (hence the version 1.0) on vROps. According to the documentation it has the following features

  • Discovers vSAN disk groups in a vSAN datastore.
  • Identifies the vSAN-enabled cluster compute resource, host system, and datastore objects in a vCenter Server system.
  • Automatically adds related vCenter Server components that are in the monitoring state.

How to Install / Upgrade from the previous MPSD plugin

  1. Download the management pack (.pak file)
    1. https://solutionexchange.vmware.com/store/products/vmware-vrealize-operations-management-pack-for-vsan
  2. Login to the vROps instance as the administrator / with administrative privileges and go to Administration -> Solutions
  3. Click add (plus sign) and select the .Pak file and select the 2 check boxes to replace if already installed and reset default content. Accept any warnings and click upload.
  4. Once the upload is complete and staged, verify the signature validity and click next to proceed               
  5. Click next and accept the EULA and proceed. The management plugin will start to install.
  6. Now select the newly installed management plugin for VSAN and click configure. Within this window, connect to the vCenter server (cannot use previously configured credentials for MPSD). When creating the credentials, you need to specify an admin account for the vCenter instance. Connection can be verified using the test button.  
  7. Once connected, wait for the data collection from VSAN cluster to complete and verify collection is showing
  8. Go to Home and verify that the VSAN dedicated dashboard items are now available on vROps               
  9. By Default there will be 3 VSAN specific dashboard available now as follows under default dashboards
    1. vSAN Environment Overview – This section provide some vital high level information on the vSAN cluster including its type, total capacity, used, any congestion if available, and average latency figures along with any active alerts on the VSAN cluster. As you can see I have a number of alerts due to using non-compliant hardware in my VSAN cluster.   
    2. vSAN Performance
      1. This default dashboard provide various performance related information / stats for the vSAN cluster rand datastores as well as the VM’s residing on it. You can also check performance such as VM latency and IOPS levels based on the VM’s you select on the tile view and the trend forecast which is think is going to be real handy.    
      2. Similarly, you can see performance at vSAN disk group level also which shows information such as Write buffer performance or Reach cache performance levels, current as well as future forecasted levels which are new and were not previously accessible easily.
      3. You can also view the performance at ESXi host level which shows the basic information such as current CPU utilisation as well as RAM including current and future (forecast) trend lines in true vROps style which are going to be really well received. Expect the content available on this ppage to be significantly extended in the future iterations of this mgmt. pack.  
    3. Optimize vSAN Deployments – This page provide a high level comparison of vSAN and non vSAN enviorments which would be especially handy if you have vSAN datastores alongside traditional iSCSI or NFS data stores to see how for example, IOPS and latency compares between VM’s on VSAN and an NFS datastore presented to the same ESXi server (I have both)    
  10. Under Environment -> vSAN and Storage Devices, additional vSAN hierarchy information such as vSAN enabled clusters, Fault domains (if relevant), Disk groups and Witness hosts (if applicable) are now visible for monitoring which is real handy.                                                                        
  11. In the inventory explorer, you can see the list of vSAN inventory items that the data are being collected for.   

All in all, this is a welcome addition and will only continue to be improved and new monitoring features added as we go up the versions. I realy like the dedicated plugin factor as well as the nice default dashboards included with this version which no doubt will help customers truly use vROps as a single pane of glass for all things monitoring on the SDDC including VSAN.

Cheers

Chan

VMware vROPS 6.4 – Upgrade Process From 6.3

Most people know by now that the VMware vRealize Operations Management version 6.4 was released by VMware on the 15th of November 2016, amongst a number of other new products.

Since then, I’ve been meaning to upgrade my vROPS appliance from the previous 6.3 version to 6.4 and finally tonight managed to get around to doing it. I’ve documented below (briefly though) the easiest migrations steps to follow.

  1. Backup any customised contents (such as custom views, definitions…etc.)
    1. This is required so that that during the vROps product update you can select “Reset default content, overwriting to a newer version provided by this update” option.
  2. Login to the vRops appliance’s admin page
    1. https://<fqdn of the vROps appliance>/admin
  3. Offline the vROps cluster
    1. Click on the “Take offline” button to offline the vROps cluster                                   
    2. Note that obviously this means downtime and this may take some time to complete
    3. You need to repeat if for each node in the cluster if you have a HA cluster of vROps nodes
    4. Verify its offline properly                                                                                      
  4. Take a snapshot backup
    1. Once offline, go to vSphere web client and take a snapshot of the vROps 6.3 appliance VM’s (precautionary best practise)
  5. Download the appropriate update files from My VMware for vROps 6.4 – I have the vROps appliance deployed rather than a Windows install and you need 2 update files here as follows
    1. Virtual Appliance OS system update PAK file: You need the file titled “vRealize_Operations_Manager-VA-OS-6.4.0.4635873.pak”
    2. vROps product update PAK file: I’m using “vRealize_Operations_Manager-VA-6.4.0.4635873.pak” file as I don’t have windows remote collectors configured (if you do, make sure you download the “vRealize_Operations_Manager-VA-WIN-6.4.0.4635873.pak ” file instead)
  6. Appliance OS update
    1. Go to Software Update and click on Install a software update and locate the .Pak file for the OS update (“vRealize_Operations_Manager-VA-OS-6.4.0.4635873.pak“)
    2. Click Upload. Once complete, read the warning messages (about the appliance restart) and click next
    3. Accept EULA and click next twice & finally click install. The vROps virtual appliance OS upgrade install with begin.  
    4. Once the OS install is complete, the appliance will restart and would prompt you to login back to the admin page.
    5. Login back to the Admin page and wait for the vRops services to become online.
    6. Go to the Software Update section and ensure that the OS update is successfully complete.          
  7. vROps product update
    1. Go to Software Update and click on Install a software update and locate the .Pak file for the OS update (“vRealize_Operations_Manager-VA-6.4.0.4635873.pak“)  
    2. Click Reset default content checkbox and click upload
    3. Once uploaded and staged on the appliance, ensure that the new version (to be upgraded to) is appearing correctly and click next
    4. Accept EULA and click next twice and then click install                                                                          
    5. During the install stage, you will be automatically logged out of the admin interface and will be prompted to log back in.
    6. Log back in to the admin interface and wait until the software update completes. This stage can be time consuming somewhat
    7. Once complete, go to System Status section and verify that the vROps version 6.4 is not showing successfully.  
  8. That’s it. vROps upgrade has successfully completed now for my cluster / node. If you have multiple nodes in your vROps cluster, you may need to repeat this process for each node.
  9. Also do remember to remove the snapshot if all operations are normal, after a while and that you can see all your views, dashboards…etc.

 

Cheers

Chan