NVMe Key-Value Standard Q&A

Last month, Bill Martin, SNIA Technical Council Co-Chair, presented a detailed update on what’s happening in the development and deployment of the NVMe Key-Value standard. Bill explained where Key Value fits within an architecture, why it’s important, and the standards work that is being done between NVM Express and SNIA. The webcast was one of our highest rated. If you missed it, it’s available on-demand along with the webcast slides. Attendees at the live event had many great questions, which Bill Martin has answered here:

Q. Two of the most common KV storage mechanisms in use today are AWS S3 and RocksDB. How does NVMe KV standards align or differ from them? How difficult would it be to map between the APIs and semantics of those other technologies to NVMe KV devices?

Read More

25 Questions (and Answers) on Ethernet-attached SSDs

The SNIA Networking Storage Forum celebrated St. Patrick’s Day by hosting a live webcast, Ethernet-attached SSDs – Brilliant Idea or Storage Silliness?” Even though we didn’t serve green beer during the event, the response was impressive with hundreds of live attendees who asked many great questions – 25 to be exact. Our expert presenters have answered them all here:

Q. Has a prototype drive been built today that includes the Ethernet controller inside the NVMe SSD?

Read More

Are Ethernet-attached SSDs Brilliant?

Several solid state disk (SSD) and networking vendors have demonstrated ways to connect SSDs directly to an Ethernet network. They propose that deploying Ethernet SSDs will be more scalable, easier to manage, higher performance, and/or lower cost than traditional storage networking solutions that use a storage controller (or hyperconverged node) between the SSDs and the network.

Who would want to attach SSDs directly to the network? Are these vendors brilliant or simply trying to solve a problem that doesn’t exist? What are the different solutions that could benefit from Ethernet SSDs? Which protocols would one use to access them? How will orchestration be used to enable applications to find assigned Ethernet SSDs? How will Ethernet SSDs affect server subsystems such as Ethernet RAID/mirroring and affect solution management such as Ethernet SAN orchestration?  And how do Ethernet SSDs relate to computational storage?  

Read More

Hyperscalers Take on NVMe™ Cloud Storage Questions

Our recent webcast on how Hyperscalers, Facebook and Microsoft are working together to merge their SSD drive requirements generated a lot of interesting questions. If you missed “How Facebook & Microsoft Leverage NVMe Cloud Storage” you can watch it on-demand. As promised at our live event. Here are answers to the questions we received.

Q. How does Facebook or Microsoft see Zoned Name Spaces being used?

Read More

The Blurred Lines of Memory and Storage – A Q&A

The lines are blurring as new memory technologies are challenging the way we build and use storage to meet application demands. That’s why the SNIA Networking Storage Forum (NSF) hosted a “Memory Pod” webcast is our series, “Everything You Wanted to Know about Storage, but were too Proud to Ask.” If you missed it, you can watch it on-demand here along with the presentation slides  we promised.

Q. Do tools exist to do secure data overwrite for security purposes?

A. Most popular tools are cryptographic signing of the data where you can effectively erase the data by throwing away the keys. There are a number of technologies available; for example, the usual ones like BitLocker (part of Windows 10, for example) where the NVDIMM-P is tied to a specific motherboard. There are others where the data is encrypted as it is moved from NVDIMM DRAM to flash for the NVDIMM-N type. Other forms of persistent memory may offer their own solutions. SNIA is working on a security model for persistent memory, and there is a presentation on our work here.

Q. Do you need to do any modification on OS or application to support Direct Access (DAX)?

A. No, DAX is a feature of the OS (both Windows and Linux support it). DAX enables direct access to files stored in persistent memory or on a block device. Without DAX support in a file system, the page cache is generally used to buffer reads and writes to files, and DAX avoids that extra copy operation by performing reads and writes directly to the storage device.

Q. What is the holdup on finalizing the NVDIMM-P standard? Timeline?

A. The DDR5 NVDIMM-P standard is under development.

Q. Do you have a webcast on persistent memory (PM) hardware too?

A. Yes. The snia.org website has an educational library with over 2,000 educational assets. You can search for material on any storage-related topic. For instance, a search on persistent memory will get you all the presentations about persistent memory.

Q. Must persistent memory have Data Loss Protection (DLP)

A. Since it’s persistent, then the kind of DLP is the kind relevant for other classes of storage. This presentation on the SNIA Persistent Memory Security Threat Model covers some of this.

Q. Traditional SSDs are subject to “long tail” latencies, especially as SSDs fill and writes must be preceded by erasures. Is this “long-tail” issue reduced or avoided in persistent memory?

A. As PM is byte addressable and doesn’t require large block erasures, the flash kind of long tail latencies will be avoided. However, there are a number of proposed technologies for PM, and the read and write latencies and any possible long tail “stutters” will depend on their characteristics.

Q. Does PM have any Write Amplification Factor (WAF) issues similar to SSDs?

A. The write amplification (WA) associated with non-volatile memory (NVM) technologies comes from two sources.

  1. When the NVM material cannot be modified in place but requires some type of “erase before write” mechanism where the erasure domain (in bytes) is larger than the writes from the host to that domain.
  2. When the atomic unit of data placement on the NVM is larger than the size of incoming writes. Note the term used to denote this atomic unit can differ but is often referred to as a page or sector.

NVM technologies like the NAND used in SSDs suffer from both sources 1 and 2. This leads to very high write amplification under certain workloads, the worst being small random writes. It can also require over provisioning; that is, requiring more NVM internally than is exposed to the user externally.

Persistent memory technologies (for example Intel’s 3DXpoint) only suffer from source 2 and can in theory suffer WA when the writes are small. The severity of the write amplification is dependent on how the memory controller interacts with the media. For example, current PM technologies are generally accessed over a DDR-4 channel by an x86 processor. x86 processors send 64 bytes at a time down to a memory controller, and can send more in certain cases (e.g. interleaving, multiple channel parallel writes, etc.). This makes it far more complex to account for WA than a simplistic random byte write model or in comparison with writing to a block device.

Q. Persistent memory can provide faster access in comparison to NAND FLASH, but the cost is more for persistent memory. What do you think on the usability for this technology in future?

A. Very good. See this presentation “MRAM, XPoint, ReRAM PM Fuel to Propel Tomorrow’s Computing Advances” by analysts, Tom Coughlin and Jim Handy for an in-depth treatment.

Q. Does PM have a ‘lifespan’ similar to SSDs (e.g. 3 years with heavy writes, 5 years)?

A. Yes, but that will vary by device technology and manufacturer. We expect the endurance to be very high; comparable or better than the best of flash technologies.

Q. What is the performance difference between fast SSD vs “PM as DAX?”

A. As you might expect us to say; it depends. PM via DAX is meant as a bridge to using PM natively, but you might expect to have improved performance from PM over NVMe as compared with a flash based SSD, as the latency of PM is much lower than flash; micro-seconds as opposed to low milliseconds.

Q. Does DAX work the same as SSDs?

A. No, but it is similar. DAX enables efficient block operations on PM similar to block operations on an SSD.

Q. Do we have any security challenges with PME?

A. Yes, and JEDEC is addressing them. Also see the Security Threat Model presentation here.

Q. On the presentation slide of what is or is not persistent memory, are you saying that in order for something to be PM it must follow the SNIA persistent memory programming model? If it doesn’t follow that model, what is it?

A. No, the model is a way of consuming this new technology. PM is anything that looks like memory (it is byte addressable via CPU load and store operations) and is persistent (it doesn’t require any external power source to retain information).

Q. DRAM is basically a capacitor. Without power, the capacitor discharges and so the data is volatile. What exactly is persistent memory? Does it store data inside DRAM or it will use FLASH to store data?

A. The presentation discusses two types of NVDIMM; one is based on DRAM and a flash backup that provides the persistence (that is NVDIMM-N), and the other is based on PM technologies (that is NVDIMM-P) that are themselves persistent, unlike DRAM.

Q. Slide 15: If Persistent memory is fast and can appear as byte-addressable memory to applications, why bother with PM needing to be block addressed like disks?

A. Because it’s going to be much easier to support applications from day one if PM can be consumed like very fast disks. Eventually, we expect PM to be consumed directly by applications, but that will require them to be upgraded to take advantage of it.

Q. Can you please elaborate on byte and block addressable?

A. Block addressable is the way we do I/O; that is, data is read and written in large blocks of data, typically 4Kbytes in size. Disk interfaces like SCSI or NVMe take commands to read and write these blocks of data to the external device by transferring the data to and from CPU memory, normally DRAM. Byte addressable means that we’re not doing any I/O at all; the CPU instructions for loading & storing fast registers from memory are used directly on PM. This removes an entire software stack to do the I/O, and means we can efficiently work on much smaller units of data; down to the byte as opposed to the fixed 4Kb demanded by I/O interfaces. You can learn more in our presentation “File vs. Block vs. Object Storage.”

There are now 10 installments of the “Too Proud to Ask” webcast series, covering these topics:

If you have an idea for an “Everything You Wanted to Know about Storage, but were too Proud to Ask” presentation, please let comment on this blog and the NSF team will put it up for consideration.

Q&A – When Compute, Networking and Storage Intersect

In Part Vermillion of our SNIA Ethernet Storage Forum (ESF) “Everything You Wanted To Know About Storage But Were Too Proud To Ask” webcast series – we examined the terms and concepts are at the heart of where compute, networking and storage intersect. That’s why we called it “What if Programming and Networking Had a Storage Baby” If you missed the live webcast, you can watch it on-demand.

The discussion from our panel of experts generated a lot of good questions. As promised, here are answers to them all. Read More

The Too Proud to Ask Train Makes Another Stop: Where Does My Data Go?

By now, we at the SNIA Storage Ethernet Storage Forum (ESF) hope you are familiar with (perhaps even a loyal fan of) the “Everything You Wanted To Know About Storage But Were Too Proud To Ask,” popular webcast series. On August 1st, the “Too Proud to Ask” train will make another stop. In this seventh session, “Everything You Wanted to Know About Storage But Were Too Proud To Ask: Turquoise – Where Does My Data Go?, we will take a look into the mysticism and magic of what happens when you send your data off into the wilderness. Once you click “save,” for example, where does it actually go?

When we start to dig deeper beyond the application layer, we often don’t understand what happens behind the scenes. It’s important to understand multiple aspects of the type of storage our data goes to along with their associated benefits and drawbacks as well as some of the protocols used to transport it.

In this webcast we will explain:

  • Volatile v Non-Volatile v Persistent Memory
  • NVDIMM v RAM v DRAM v SLC v MLC v TLC v NAND v 3D NAND v Flash v SSDs v NVMe
  • NVMe (the protocol)

Many people get nervous when they see that many acronyms, but all too often they come up in conversation, and you’re expected to know all of them? Worse, you’re expected to know the differences between them, and the consequences of using them? Even worse, you’re expected to know what happens when you use the wrong one?

We’re here to help.

It’s an ambitious project, but these terms and concepts are at the heart of where compute, networking and storage intersect. Having a good grasp of these concepts ties in with which type of storage networking to use, and how data is actually stored behind the scenes.

Register today to join us for this edition of the “Too Proud To Ask” series, as we work towards making you feel more comfortable in the strange, mystical world of storage. And don’t let pride get in the way of asking any and all questions on this great topic. We will be there on August 1st to answer them!

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

 

 

 

Latency Budgets for Solid State Storage Access

New solid state storage technologies are forcing the industry to refine distinctions between networks and other types of system interconnects.   The question on everyone’s mind is: when is it beneficial to use networks to access solid state storage, particularly persistent memory?

It’s not quite as simple as a “yes/no” answer. The answer to this question involves application, interconnect, memory technology and scalability factors that can be analyzed in the context of a latency budget.

On April 19th, Doug Voigt, Chair SNIA NVM Programming Model Technical Work Group, returns for a live SNIA Ethernet Storage Forum webcast, “Architectural Principles for Networked Solid State Storage Access – Part 2where we will explore latency budgets for various types of solid state storage access.  These can be used to determine which combinations of interconnects, technologies and scales are compatible with Load/Store instruction access and which are better suited to IO completion techniques such as polling or blocking.

In this webcast you’ll learn:

  • Why latency is important in accessing solid state storage
  • How to determine the appropriate use of networking in the context of a latency budget
  • Do’s and don’ts for Load/Store access

This is a technical seminar built upon part 1 of this series. If you missed it, you can view it on demand at your convenience. It will give you a solid foundation on this topic, outlining key architectural principles that allow us to think about the application of  networked  solid state technologies more systematically.

I hope you will register today for the April 19th event. Doug and I will be on hand to answer questions on the spot.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Principles of Networked Solid State Storage – Q&A

At this month’s SNIA Ethernet Storage Forum Webcast, “Architectural Principles for Networked Solid State Storage Access,” Doug Voigt, Chair of the SNIA NVM Programming Technical Working Group, and a member of the SNIA Technical Council, outlined key architectural principles surrounding the application of  networked  solid state technologies. We had a flurry of questions near the end of the Webcast that we did not have enough time to answer. Here are Doug’s answers to all the questions we received during the event:

Q. Are there wait cycles in accessing persistent memory?

A. It depends entirely on which persistent memory (PM) technology is being accessed and how the memory interconnect is used.   Some technologies have write times that are quite different from read times.   When using tightly timed interconnects such as DDR with those technologies it may be difficult to avoid wait cycles.

Q. How do Pmalloc and malloc share the virtual address space of the application?

A. This is entirely up to the OS and other libraries operating within any constraints of the processor architecture-specific memory management units.   A good mental model would be fairly large regions of contiguous address space in both the physical and virtual domains, where each region will comprise a single type of memory. Capacity will be reserved for pmalloc and malloc in the appropriate regions.

Q. Always flush after doing your memory-mapped IO.   Is that simply good hygiene?

A. Not exactly. The term “Memory Mapped IO” is used to reference control plane (as opposed to data plane) access.   It is often reasonable to set up control plane memory as uncacheable. The need for strict order of access to physical control plane registers is so pervasive that caching is generally not useful. Uncacheable writes are always flushed by the processor, as opposed to the application.

Generally with memory mapped IO devices the data plane uses direct memory access (DMA).   With memory mapped files (as opposed to memory mapped IO) Load/Store (more commonly referred to as “Ld/St”), not DMA, is used in the data plane. Disabling caching in the data plane is generally a big performance sacrifice for small byte range access.

In the Ld/St datapath, strategically placed flushing is required to retain both performance and power failure recovery. The SNIA NVM Programming Model describes this type of functionality.

Q. Once NVDIMM support become pervasive with support from NVMe drives in the server box, should network storage be more focused on SAS Flash or just SAS HDDs?

A. Not necessarily.   NVMe over Fabric, Fibre Channel and iSCSI are also types of networked storage that will likely retain significant market share relative to SAS.

Q. Are the ‘Big Data’ Data Warehouse applications starting to use the persistence memory and domain technologies in their applications?

A. It is too early to see much of this yet. PM technologies might become a priority as a staging area for analytic applications with high ingest or checkpoint rates. NVDIMMs are likely to be too expensive to store anything “big” for quite a while.

Q. Also, is the persistence memory/domains being used in the Hyper-converged and Converged hardware infrastructures?

A. Persistent memory is quintessentially (Hyper-) converged.   It wouldn’t be unreasonable to expect some traction with hyper-converged solutions that experience high storage-performance demand.

Q. What distance would you associate with 10’s of microseconds?

A. In terms of transmission delay, 10’s of uS align with a campus or small city scale, but the distance itself is often not the primary factor.   Switching delays, transmission line properties and software overhead are generally bigger factors.

Q. So latency would be the binding factor for distances…not a question, an observation.

A. Yes, in effect, either through transmission or relay.   See above.

Q. Aren’t there multi-threaded SSDs?

A. Yes, but since the primary metric in this presentation is latency we ignore multi-threading.   It can enable more work to get done, but it generally increases latency rather than reducing it.

Q. Is Pmalloc universal usage?

A. The term is starting to be recognized among developers and has been used in research. Various similar names have been used in early research prototypessuch as pmalloc in Mnemosyne and nvmalloc in SCMFS.

Q. So how would PM help in a (stock broking) requirement, where we currently prophesize an RDMA or iWARP solution?

A. With PM the answer is always lower latency.   PM can be litegrated like memory or like flash. RDMA network paths for both of these options were discussed in the presentation. In either case, PM is low-latency enough that networking and software overheads will completely determine performance, even when using RDMA. The performance boost from PM is greatest when it is accessed locally.   If remote access is a requirement then the new work being done in the RDMA community should help.

Q. If data stored in memory requires to be copied to a different host, memory (for consistency) how does PM assist, or is there an extension to PM? Coherency between multiple hosts in a cluster, if you will?

A. PM technology does not help with this; the methods of managing consistency across hosts remain unchanged by PM.   All PM offers is low latency persistence.

Coordination across hosts or nodes in a cluster must use existing clustering techniques such as locking and quorums. In addition, the relative timescales of memory access and network communication suggest the application of asynchronous remote replication techniques used in today’s storage solutions.

Regarding coherency, PM brings nothing new to the known techniques for managing coherency.   Classical cluster architecture must be applied outside of symmetric multi-processing coherency domains. Within coherency domains, all of the logic is above the PM level in a processor side memory controller or a software emulation of the same algorithms.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.