As a part of the recently concluded Storage Field Day 12 (#SFD12), we traveled to one of the Intel campuses in San Jose to listen to the Intel Storage software team about future of storage from an Intel perspective. This was a great session that was presented by Jonathan Stern (Intel Solutions Architect / and Tony Luck (Principle Engineer) and this post is to summarise few things I’ve learnt during those sessions that I thought were quite interesting for everyone. (prior to this session we also had a session from SNIA that was talking about future of storage industry standards but I think that deserves a dedicated post so I won’t mention those here – stay tuned for a SNIA event specific post soon!)
First session from Intel was on the future of storage by Jonathan. It’s probably fair to say Jonathan was by far the most engaging presenter out of all the SFD12 presenters and he covered somewhat of a deep dive on the Intel plans for storage, specifically on the software side of things and the main focus was around the Intel Storage Performance Development Kit (SPDK) which Intel seem to think is going to be a key part of the future of storage efficiency enhancements.
The second session with Tony was about Intel Resource Director Technology (addresses shared resource contention that happens inside an Intel processor in processor cache) which, in all honesty was not something most of us storage or infrastructure guys need to know in detail. So my post below is more focused on Jonathan’s session only.
Future Of Storage
As far as Intel is concerned, there are 3 key areas when it comes to the future of storage that need to be looked at carefully.
- Hyper-Scale Cloud
- Hyper-Convergence
- Non-Volatile memory
To put this in to some context, see the below revenue projections from Wikibon Server SAN research project 2015 comparing the revenue projections for
- Traditional Enterprise storage such as SAN, NAS, DAS (Read “EMC, Dell, NetApp, HPe”)
- Enterprise server SAN storage (Read “Software Defined Storage OR Hyper-Converged with commodity hardware “)
- Hyperscale server SAN (Read “Public cloud”)
It is a known fact within the storage industry that public cloud storage platforms underpinned by cheap, commodity hardware and intelligent software provide users with an easy to consume, easily available and most importantly non-CAPEX storage platform that most legacy storage vendors find hard to compete with. As such, the net new growth in the global storage revenue as a whole from around 2012 has been predominantly within the public cloud (Hyperscaler) space while the rest of the storage market (non-public cloud enterprise storage) as a whole has somewhat stagnated.
This somewhat stagnated market was traditionally dominated by a few storage stalwarts such as EMC, NetApp, Dell, HPe…etc. However the rise of the server based SAN solutions where commodity servers with local drives combined with intelligent software to make a virtual SAN / storage pool (SDS/HCI technologies) has made matters worse for these legacy storage vendors and such storage solutions are projected to eat further in to the traditional enterprise storage landscape within next 4 years. This is already evident by the recent popularity & growth of such SDS/HCI solutions such as VMware VSAN, Nutanix, Scality, HedVig while at the same time, traditional storage vendors announcing reducing storage revenue. So much so that even some of the legacy enterprise storage vendors like EMC & HPe have come up with their own SDS / HCI offerings (EMC Vipr, HPe StoreVirtual, annoucement around SolidFire based HCI solution…etc.) or partnered up with SDS/HCI vendors (EMC VxRail, VxRail…etc.) to hedge their bets against a loosing back drop of traditional enterprise storage.
If you study the forecast in to the future, around 2020-2022, it is estimated that the traditional enterprise storage market revenue & market share will be even further squeezed by even more rapid growth of the server based SAN solutions such as SDS and HCI solutions. (Good luck to legacy storage folks)
An estimate from EMC suggest that by 2020, all primary storage for production applications would sit on flash based drives, which precisely co-inside with the timelines in the above forecast where the growth of Enterprise server SAN storage is set to accelerate between 2019-2022. According to Intel, one of the main reasons behind this forecasted increase of revenue (growth) on the enterprise server SAN solutions is estimated to be the developments of Non-Volatile Memory (NVMe) based technologies which makes it possible achieve very low latency through direct attached (read “locally attach”) NVMe drives along with clever & efficient software that are designed to harness this low latency. In other words, drop of latency when it comes to drive access will make Enterprise server SAN solutions more appealing to customers who will look at Software Defined, Hyper-Converged storage solutions in favour of external, array based storage solutions in to the immediate future and legacy storage market will continue to shrink further and further.
I can relate to this prediction somewhat as I work for a channel partner of most of these legacy storage vendors and I too have seen first hand the drop of legacy storage revenue from our own customers which reasonably backs this theory.
Challenges?
With the increasing push for Hyper-Convergence with data locality, the latency becomes an important consideration. As such, Intel’s (& the rest of the storage industry’s) main focus going in to the future is primarily around reducing the latency penalty applicable during a storage IO cycle, as much as possible. The imminent release of this next gen storage media from Intel as a better alternative to NAND (which comes with inherent challenges such as tail latency issues which are difficult to get around) was mentioned without any specific details. I’m sure that was a reference to the Intel 3D XPoint drives (Only just this week announced officially by Intel http://www.intel.com/content/www/us/en/solid-state-drives/optane-solid-state-drives-dc-p4800x-series.html) and based on the published stats, the projected drive latencies are in the region of < 10μs (sequential IO) and < 200μs (random IO) which is super impressive compared to today’s ordinary NVMe SSD drives that are NAND based. This however presents a concern as the current storage software stack that process the IO through the CPU via costly context switching also need to be optimised in order to truly benefit from this massive drop in drive latency. In other words, the level of dependency on the CPU for IO processing need to be removed or minimised through clever software optimisation (CPU has long been the main IO bottleneck due to how MSI-X interrupts are handled by the CPU during IO operations for example). Without this, the software induced latency would be much higher than the drive media latency during an IO processing cycle which will contribute to an overall higher latency still. (My friend & fellow #SFD12 delegate Glenn Dekhayser described this in his blog as “the media we’re working with now has become so responsive and performant that the storage doesn’t want to wait for the CPU anymore!” which is very true).
Furthermore,
Storage Performance Development Kit (SPDK)
Some companies such as Excelero are also addressing this CPU dependency of the IO processing software stack by using NVMe drives and clever software to offload processing from CPU to NVMe drives through technologies such as RDDA (Refer to the post I did on how Excelero is getting around this CPU dependency by reprogramming the MSI-X interrupts to not go to the CPU). SPDK is Intel’s answer to this problem and where as Excelero’s RDDA architecture primarily avoid CPU dependency by bypassing CPU for IOs, Intel SPDK minimizes the impact on CPU & Memory bus cycles during IO processing by using the user-mode for storage applications rather than the kernel mode, thereby removing the need for costly context switching and the related interrupt handling overhead. According to http://www.spdk.io/, “The bedrock of the SPDK is a user space, polled mode, asynchronous, lockless NVMe driver that provides highly parallel access to an SSD from a user space application.”
With SPDK, Intel claims that you can reach up to around 3.6million IOPS per single Xeon CPU core before it ran out of PCI lane bandwidth which is pretty impressive. Below is a IO performance benchmark based on a simple test of CentOS Linux kernel IO performance (Running across 2 x Xeon E5-2965 2.10 GHz CPUs each with 18 cores + 1-8 x Intel P3700 NVMe SSD drives) Vs SPDK with a single dedicated 2.10 GHz core allocated out of the 2 x Xeon E5-2965 for IO. You can clearly see the significantly better IO performance with SPDK, which, despite having just a single core, due to the lack of context switching and the related overhead, is linearly scaling the IO throughput in line with the number of NVMe SSD drives.
(In addition to these testing, Jonathan also mentioned that they’ve done another test with Supermicro off the shelf HW and with SPDK & 2 dedicated cores for IO, they were able to get 5.6 million IOPS before running out of PCI bandwidth which was impressive)
SPDK Applications & My Thoughts
SPDK is an end-to-end reference storage architecture & a set of drivers (C libraries & executables) to be used by OEMs and ISV’s when integrating disk hardware. According to Intel’s SPDK introduction page, the goal of the SPDK is to highlight the outstanding efficiency and performance enabled by using Intel’s networking, processing and storage technologies together. SPDK is available freely as an open source product that is available to download through GitHub. It also provide NVMeF (NVMe Over Fabric) and iSCSI servers to be built using the SPDK architecture, on top of the user space drivers that are even capable of servicing disks over the network. Now this can potentially revolutionise how the storage industry build their next generation storage platforms. Consider for example any SDS or even a legacy SAN manufacturer using this architecture to optimise the CPU on their next generation All Flash storage array? (Take NetApp All Flash FAS platform for example, they are known to have a ton of software based data management services available within OnTAP that are currently competing for CPU cycles with IO and often have to scale down data management tasks during heavy IO processing. With Intel DPDK architecture for example, OnTAP can free up more CPU cycles to be used by more data management services and even double up on various other additional services too without any impact on critical disk IO? I mean its all hypothetical of course as I’m just thinking out loud here. Of course it would require NetApp to run OnTAP on Intel CPUs and Intel NVMe drives…etc but it’s doable & makes sense right? I mean imagine the day where you can run “reallocate -p” during peak IO times without grinding the whole SAN to a halt? :-). I’m probably exaggerating its potential here but the point here though is that SDPK driven IO efficiencies can apply same to all storage array manufacturers (especially all flash arrays) where they can potentially start creating some super efficient, ultra low latency, NVMe drive based storage arrays and also include a ton of data management services that would have been previously too taxing on CPU (think inline de dupe, inline compression, inline encryption, everything inline…etc.) that’s on 24×7 by default, not just during off peak times due to zero impact on disk IO?
Another great place to apply SPDK is within virtualisation for VM IO efficiency. Using SPDK with QEMU as follows has resulted in some good IO performance to VM’s
I mean imagine for example, a VMware VSAN driver that was built using the Intel DPDK architecture running inside the user space using a dedicated CPU core that will perform all IO and what would be the possible IO performance? VMware currently performs IO virtualisation in kernel right now but imagine if SPDK was used and IO virtualisation for VSAN was changed to SW based, running inside the user-space, would it be worth the performance gain and reduced latency? (I did ask the question and Intel confirmed there are no joint engineering currently taking place on this front between 2 companies). What about other VSA based HCI solutions, especially take someone like Nutanix Acropolis where Nutanix can happily re-write the IO virtualisation to happen within user-space using SPDK for superior IO performance?
Intel & Alibaba cloud case study where the use of SPDK was benchmarked has given the below IOPS and latency improvements
NVMe over Fabric is also supported with SPDK and some use cases were discussed, specifically relating to virtualisation where VM’s tend of move between hosts and a unified NVMe-oF API that talk to local and remote NVMe drives being available now (some part of the SPDK stack becoming available in Q2 FY17)
Using the SPDK seems quite beneficial for existing NAND media based NVMe storage, but most importantly for newer generation non-NAND media to bring the total overall latency down. However that does mean changing the architecture significantly to process IO in user-mode as opposed to kernel-mode which I presume is how almost all storage systems, Software Defined or otherwise work and I am unsure whether changing them to be user-mode with SPDK is going to be a straight forward process. It would be good to see some joint engineering or other storage vendors evaluating the use of SPDK though to see if the said latency & IO improvements are realistic in complex storage solution systems.
I like the fact that Intel has made the SPDK OpenSource to encourage others to freely utilise (& contribute back to) the framework too but I guess what I’m not sure about is whether its tied to Intel NVMe drives & Intel processors.
If anyone wants to watch the recorded video of our session from # SFD12 the links are as follows
- Jonathan’s session on SPDK
- Tony’s session on RDT
Cheers
Chan
#SFD12 #TechFieldDay @IntelStorage