John Kim - SNIA on Data, Networking & Storage

Data Reduction: Don’t Be Too Proud to Ask

July 29, 2020July 29, 2020 John Kim Leave a comment

It’s back! Our SNIA Networking Storage Forum (NSF) webcast series “Everything You Wanted to Know About Storage but Were Too Proud to Ask” will return on August 18, 2020. After a little hiatus, we are going to tackle the topic of data reduction.

Everyone knows data volumes are growing rapidly (25-35% per year according to many analysts), far faster than IT budgets, which are constrained to flat or minimal annual growth rates. One of the drivers of such rapid data growth is storing multiple copies of the same data. Developers copy data for testing and analysis. Users email and store multiple copies of the same files. Administrators typically back up the same data over and over, often with minimal to no changes.

To avoid a budget crisis and paying more than once to store the same data, storage vendors and customers can use data reduction techniques such as deduplication, compression, thin provisioning, clones, and snapshots.

On August 18th, our live webcast “Everything You Wanted to Know about Storage but Were Too Proud to Ask – Part Onyx” will focus on the fundamentals of data reduction, which can be performed in different places and at different stages of the data lifecycle. Like most technologies, there are related means to do this, but with enough differences to cause confusion. For that reason, we’re going to be looking at:

Object Storage Questions: Asked and Answered

March 20, 2020March 29, 2020 John Kim Leave a comment

Last month, the SNIA Networking Storage Forum (NSF) hosted a live webcast, “Object Storage: What, How and Why.” As the title suggests, our NSF members and invited guest experts delivered foundational knowledge on object storage, explaining how object storage works, use cases, and standards. They even shared a little history on how object storage originated. If you missed the live event, you can watch the on-demand webcast or find it on our SNIAVideo YouTube Channel.

We received some great questions from our live audience. As promised, here are the answers to them all.

Why Object Storage is Important

January 3, 2020March 29, 2020 John Kim 3 Comments

Object storage is a secure, simple, scalable, and cost-effective means of embracing the explosive growth of unstructured data enterprises generate every day. Object storage adoption is on the rise. That’s why the SNIA Networking Storage Forum (NSF) is hosting “Object Storage: What, How and Why.” This webcast, with experts Chris Evans of Bookend LTD, Rick Vanover of Veeam, and Alex McDonald, Vice Chair of SNIA NSF and NetApp, the will explain how object storage works, its benefits and why it’s important.

Like other storage technologies, object storage brings its own set of unique characteristics to the market. Join us on February 19^th at 10:00 am PT/1:00 pm ET to learn:

The Blurred Lines of Memory and Storage – A Q&A

July 22, 2019August 21, 2019 John Kim 1 Comment

The lines are blurring as new memory technologies are challenging the way we build and use storage to meet application demands. That’s why the SNIA Networking Storage Forum (NSF) hosted a “Memory Pod” webcast is our series, “Everything You Wanted to Know about Storage, but were too Proud to Ask.” If you missed it, you can watch it on-demand here along with the presentation slides we promised.

Q. Do tools exist to do secure data overwrite for security purposes?

A. Most popular tools are cryptographic signing of the data where you can effectively erase the data by throwing away the keys. There are a number of technologies available; for example, the usual ones like BitLocker (part of Windows 10, for example) where the NVDIMM-P is tied to a specific motherboard. There are others where the data is encrypted as it is moved from NVDIMM DRAM to flash for the NVDIMM-N type. Other forms of persistent memory may offer their own solutions. SNIA is working on a security model for persistent memory, and there is a presentation on our work here.

Q. Do you need to do any modification on OS or application to support Direct Access (DAX)?

A. No, DAX is a feature of the OS (both Windows and Linux support it). DAX enables direct access to files stored in persistent memory or on a block device. Without DAX support in a file system, the page cache is generally used to buffer reads and writes to files, and DAX avoids that extra copy operation by performing reads and writes directly to the storage device.

Q. What is the holdup on finalizing the NVDIMM-P standard? Timeline?

A. The DDR5 NVDIMM-P standard is under development.

Q. Do you have a webcast on persistent memory (PM) hardware too?

A. Yes. The snia.org website has an educational library with over 2,000 educational assets. You can search for material on any storage-related topic. For instance, a search on persistent memory will get you all the presentations about persistent memory.

Q. Must persistent memory have Data Loss Protection (DLP)

A. Since it’s persistent, then the kind of DLP is the kind relevant for other classes of storage. This presentation on the SNIA Persistent Memory Security Threat Model covers some of this.

Q. Traditional SSDs are subject to “long tail” latencies, especially as SSDs fill and writes must be preceded by erasures. Is this “long-tail” issue reduced or avoided in persistent memory?

A. As PM is byte addressable and doesn’t require large block erasures, the flash kind of long tail latencies will be avoided. However, there are a number of proposed technologies for PM, and the read and write latencies and any possible long tail “stutters” will depend on their characteristics.

Q. Does PM have any Write Amplification Factor (WAF) issues similar to SSDs?

A. The write amplification (WA) associated with non-volatile memory (NVM) technologies comes from two sources.

When the NVM material cannot be modified in place but requires some type of “erase before write” mechanism where the erasure domain (in bytes) is larger than the writes from the host to that domain.
When the atomic unit of data placement on the NVM is larger than the size of incoming writes. Note the term used to denote this atomic unit can differ but is often referred to as a page or sector.

NVM technologies like the NAND used in SSDs suffer from both sources 1 and 2. This leads to very high write amplification under certain workloads, the worst being small random writes. It can also require over provisioning; that is, requiring more NVM internally than is exposed to the user externally.

Persistent memory technologies (for example Intel’s 3DXpoint) only suffer from source 2 and can in theory suffer WA when the writes are small. The severity of the write amplification is dependent on how the memory controller interacts with the media. For example, current PM technologies are generally accessed over a DDR-4 channel by an x86 processor. x86 processors send 64 bytes at a time down to a memory controller, and can send more in certain cases (e.g. interleaving, multiple channel parallel writes, etc.). This makes it far more complex to account for WA than a simplistic random byte write model or in comparison with writing to a block device.

Q. Persistent memory can provide faster access in comparison to NAND FLASH, but the cost is more for persistent memory. What do you think on the usability for this technology in future?

A. Very good. See this presentation “MRAM, XPoint, ReRAM PM Fuel to Propel Tomorrow’s Computing Advances” by analysts, Tom Coughlin and Jim Handy for an in-depth treatment.

Q. Does PM have a ‘lifespan’ similar to SSDs (e.g. 3 years with heavy writes, 5 years)?

A. Yes, but that will vary by device technology and manufacturer. We expect the endurance to be very high; comparable or better than the best of flash technologies.

Q. What is the performance difference between fast SSD vs “PM as DAX?”

A. As you might expect us to say; it depends. PM via DAX is meant as a bridge to using PM natively, but you might expect to have improved performance from PM over NVMe as compared with a flash based SSD, as the latency of PM is much lower than flash; micro-seconds as opposed to low milliseconds.

Q. Does DAX work the same as SSDs?

A. No, but it is similar. DAX enables efficient block operations on PM similar to block operations on an SSD.

Q. Do we have any security challenges with PME?

A. Yes, and JEDEC is addressing them. Also see the Security Threat Model presentation here.

Q. On the presentation slide of what is or is not persistent memory, are you saying that in order for something to be PM it must follow the SNIA persistent memory programming model? If it doesn’t follow that model, what is it?

A. No, the model is a way of consuming this new technology. PM is anything that looks like memory (it is byte addressable via CPU load and store operations) and is persistent (it doesn’t require any external power source to retain information).

Q. DRAM is basically a capacitor. Without power, the capacitor discharges and so the data is volatile. What exactly is persistent memory? Does it store data inside DRAM or it will use FLASH to store data?

A. The presentation discusses two types of NVDIMM; one is based on DRAM and a flash backup that provides the persistence (that is NVDIMM-N), and the other is based on PM technologies (that is NVDIMM-P) that are themselves persistent, unlike DRAM.

Q. Slide 15: If Persistent memory is fast and can appear as byte-addressable memory to applications, why bother with PM needing to be block addressed like disks?

A. Because it’s going to be much easier to support applications from day one if PM can be consumed like very fast disks. Eventually, we expect PM to be consumed directly by applications, but that will require them to be upgraded to take advantage of it.

Q. Can you please elaborate on byte and block addressable?

A. Block addressable is the way we do I/O; that is, data is read and written in large blocks of data, typically 4Kbytes in size. Disk interfaces like SCSI or NVMe take commands to read and write these blocks of data to the external device by transferring the data to and from CPU memory, normally DRAM. Byte addressable means that we’re not doing any I/O at all; the CPU instructions for loading & storing fast registers from memory are used directly on PM. This removes an entire software stack to do the I/O, and means we can efficiently work on much smaller units of data; down to the byte as opposed to the fixed 4Kb demanded by I/O interfaces. You can learn more in our presentation “File vs. Block vs. Object Storage.”

There are now 10 installments of the “Too Proud to Ask” webcast series, covering these topics:

If you have an idea for an “Everything You Wanted to Know about Storage, but were too Proud to Ask” presentation, please let comment on this blog and the NSF team will put it up for consideration.

Network Speeds Questions Answered

June 25, 2019June 25, 2019 John Kim Leave a comment

Last month, the SNIA Networking Storage Forum (NSF) hosted a webcast on how increases in networking speeds are impacting storage. If you missed the live webcast, New Landscape of Network Speeds, it’s now available on-demand. We received several interesting questions on this topic. Here are our experts’ answers:

Q. What are the cable distances for 2.5 and 5G Ethernet?

A. 2.5GBASE-T and 5GBASE-T Ethernet are designed to run on existing UTP cabling, so it should reach 100 meters on both Cat5e and Cat6 cabling. Reach of 5GBASE-T on Cat 5e may be less under some conditions, for example if many cables are bundled tightly together. Cabling guidelines and field test equipment are available to aid in the transition.

Q. Any comments on why U.2 drives are so rare/uncommon in desktop PC usage? M.2 are very common in laptops, and some desktops, but U.2’s large capacity seems a better fit for desktop.

A. M.2 SSDs are more popular for laptops and tablets due to their small form factor and sufficient capacity. U.2 SSDs are used more often in servers, though some desktops and larger laptops also use a U.2 SSD for the larger capacity.

Q. What about using Active Copper cables to get a bit more reach over Passive Copper cables before switching to Active Optical cables?

A. Yes active copper cables can provide longer reach than passive copper cables, but you have to look at the expense and power consumption. There may be many cases where using an active optical cable (AOC) will cost the same or less than an active copper cable.

Q. For 100Gb/s signaling (future standard) is it expected to work over copper cable (passive or active) or only optical?

A. Yes, though the maximum distances will be shorter. With 25Gb/s signaling the maximum copper cable length is 5m. With 50Gb/s signaling the longest copper cables are 3m long. With 100Gb/s we expect the longest copper cables will be about 2m long.

Q. So what do you see as the most prevalent LAN speed today and what do you see in next year or two?

A. For Ethernet, we see desktops mostly on 1Gb with some moving to 2.5G, 5Gb or 10Gb. Older servers are largely 10Gb but new servers are mostly using 25GbE or 50GbE, while the most demanding servers and fastest flash storage arrays have 100GbE connections. 200GbE will show up in a few servers starting in late 2019, but most 200GbE and 400GbE usage will be for switch-to-switch links during the next few years. In the world of Fibre Channel, most servers today are on 16G FC with a few running 32G and a few of the most demanding servers or fastest flash storage arrays using 64G. 128G FC for now will likely be just for switch-to-switch links. Finally for InfiniBand deployments, older servers are running FDR (56Gb/s) and newer servers are using EDR (100Gb/s). The very newest, fastest HPC and ML/AI servers are starting to use HDR (200Gb/s) InfiniBand.

If you’re new to SNIA NSF, we encourage you to check out the SNIA NSF webcast library. There you’ll find more than 60 educational, vendor-neutral on-demand webcasts produced by SNIA experts.

The Impact of New Network Speeds on Storage

April 26, 2019April 26, 2019 John Kim Leave a comment

In the last few years, Ethernet equipment vendors have announced big increases in line speeds, shipping 25, 50, and 100 Gigabits-per -second (Gb/s) speeds and announcing 200/400 Gb/s. At the same time Fibre Channel vendors have launched 32GFC, 64GFC and 128GFC technology while InfiniBand has reached 200Gb/s (called HDR) speed.

But who exactly is asking for these faster new networking speeds, and how will they use them? Are there servers, storage, and applications that can make good use of them? How are these new speeds achieved? Are new types of signaling, cables and transceivers required? How will changes in PCIe standards and bandwidth keep up? And do the faster speeds come with different distance limitations?

These are among the questions our panel of experts will answer at the next live SNIA Networking Storage Forum (NSF) webcast on May 21, 2019, “New Landscape of Network Speeds.” Join us to learn:

How these new speeds are achieved
Where they are likely to be deployed for storage
What infrastructure changes are needed to support them

Register today to save your spot. And don’t forget to bring your questions. Our experts will be available to answer them on the spot.

Everything You Wanted to Know about Memory

April 9, 2019April 9, 2019 John Kim Leave a comment

Many followers (dare we say fans?) of the SNIA Networking Storage Forum (NSF) are familiar with our popular webcast series “Everything You Wanted To Know About Storage But Were Too Proud To Ask.” If you’ve missed any of the nine episodes we’ve done to date, they are all available on-demand and provide a 101 lesson on a range of storage related topics like buffers, storage controllers, iSCSI and more.

Our next “Too Proud to Ask” webcast on May 16, 2019 will be “Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Taupe – The Memory Pod.” Traditionally, much of the IT infrastructure that we’ve built over the years can be divided fairly simply into storage (the place we save our persistent data), network (how we get access to the storage and get at our data) and compute (memory and CPU that crunches on the data). In fact, so successful has this model been that a trip to any cloud services provider allows you to order (and be billed for) exactly these three components.

The only purpose of storage is to persist the data between periods of processing it on a CPU. And the only purpose of memory is to provide a cache of fast accessible data to feed the huge appetite of compute. Currently, we build effective systems in a cost-optimal way by using appropriate quantities of expensive and fast memory (DRAM for instance) to cache our cheaper and slower storage. But fast memory has no persistence at all; it’s only storage that provides the application the guarantee that storing, modifying or deleting data does exactly that.

Memory and storage differ in other ways. For example, we load from memory to registers on the CPU, perform operations there, and then store the results back to memory by loading from and storing to byte addresses. This load/store technology is different from storage, where we tend to move data back and fore between memory and storage in large blocks, by using an API (application programming interface).

It’s clear the lines between memory and storage are blurring as new memory technologies are challenging the way we build and use storage to meet application demands. New memory technologies look like storage in that they’re persistent, if a lot faster than traditional disks or even Flash based SSDs, but we address them in bytes, as we do memory like DRAM, if more slowly. Persistent memory (PM) lies between storage and memory in latency, bandwidth and cost, while providing memory semantics and storage persistence. In this webcast, our SNIA experts will discuss:

Fundamental terminology relating to memory
Traditional uses of storage and memory as a cache
How can we build and use systems based on PM?
Persistent memory over a network
Do we need a new programming model to take advantage of PM?
Interesting use cases for systems equipped with PM
How we might take better advantage of this new technology

Register today for this live webcast on May 16^th. Our experts will be available to answer the questions that you should not be too proud to ask!

And if you’re curious to know why each of the webcasts in this series is associated with a different color (rather than a number), check out this SNIA NSF blog that explains it all.

Scale-Out File Systems FAQ

March 8, 2019March 8, 2019 John Kim Leave a comment

On February 28^th, the SNIA Networking Storage Forum (NSF) took at look at what’s happening in Scale-Out File Systems. We discussed general principles, design considerations, challenges, benchmarks and more. If you missed the live webcast, it’s now available on-demand. We did not have time to answer all the questions we received at the live event, so here are answers to them all.

Q. Can scale-out file systems do Erasure coding?

A. Indeed, Erasure coding is a common method to improve resilience.

Q. How does one address the problem of a specific disk going down? Where does scale-out architecture provide redundancy?

A. Disk failures typically are covered by RAID software. Some of scale-out software also use multiple replicators to mitigate the impact of disk failures.

Q. Are there use cases where a hybrid of these two styles is needed?

A. Yes, for example, in some environments, the foundation layer might be using the dedicated storage server to form the large storage pool, which is the 1st style, and then export LUNs or virtual disks to the compute nodes (either physical or virtual) to run the applications, which is the 2nd style.

Q. Which scale-out file systems present on windows, Linux platforms?

A. Some of the scale-out file systems provide native client software across multiple platforms. Another approach is to use Samba to build SMB gateways to make the scale-out file system available to Windows computers.

Q. Is Amazon elastic file system (EFS) on AWS scale-out file systems?

A. Please see:

https://docs.aws.amazon.com/efs/latest/ug/performance.html

“Amazon EFS file systems are distributed across an unconstrained number of storage servers, enabling file systems to grow elastically to petabyte scale and allowing massively parallel access from Amazon EC2 instances to your data. The distributed design of Amazon EFS avoids the bottlenecks and constraints inherent to traditional file servers.”

Q. Where are the most cost effective price/performance uses of NVMe?

A. NVMe can support very high IOPS and very high throughput as well. The best use case would be to couple NVMe with high performance storage software that would not limit the NVMe.

The Ins and Outs of a Scale-Out File System Architecture

January 18, 2019January 18, 2019 John Kim Leave a comment

To meet the increasingly higher demand on both capacity and performance in large cluster computing environments, the storage subsystem has evolved toward a modular and scalable design. The scale-out file system has emerged as one implementation of the trend, in addition to scale-out object and block storage solutions.

What are the key principles when architecting a scale-out file system? Find out on February 28^th when the SNIA Networking Storage Forum (NSF) hosts The Scale-Out File System Architecture Overview, a live webcast where we will present an overview of scale-out file system architectures. This presentation will provide an introduction to scale-out-file systems and cover:

General principles when architecting a scale-out file system storage solution
Hardware and software design considerations for different workloads
Storage challenges when serving a large number of compute nodes, e.g. name space consistency, distributed locking, data replication, etc.
Use cases for scale-out file systems
Common benchmark and performance analysis approaches

RDMA for Persistent Memory over Fabrics – FAQ

November 14, 2018November 14, 2018 John Kim 2 Comments

In our most recent SNIA Networking Storage Forum (NSF) webcast Extending RDMA for Persistent Memory over Fabrics, our expert speakers, Tony Hurson and Rob Davis outlined extensions to RDMA protocols that confirm persistence and additionally can order successive writes to different memories within the target system. Hundreds of people have seen the webcast and have given it a 4.8 rating on a scale of 1-5! If you missed, it you can watch it on-demand at your convenience. The webcast slides are also available for download.

We had several interesting questions during the live event. Here are answers from our presenters:

Q. For the RDMA Message Extensions, does the client have to qualify a WRITE completion with only Atomic Write Response and not with Commit Response?
A. If an Atomic Write must be confirmed persistent, it must be followed by an additional Commit Request. Built-in confirmation of persistence was dropped from the Atomic Request because it adds latency and is not needed for some application streams.

Q. Why do you need confirmation for writes? From my point of view, the only thing required is ordering.

A. Agreed, but only if the entire target system is non-volatile! Explicit confirmation of persistence is required to cover the “gap” between the Write completing in the network and the data reaching persistence at the target.

Q. Where are these messages being generated? Does NIC know when the data is flushed or committed?

A. They are generated by the application that has reserved the memory window on the remote node. It can write using RDMA writes to that window all it wants, but to guarantee persistence it must send a flush.

Q. How is RPM presented on the client host?

A. The application using it sees it as memory it can read and write.

Q. Does this RDMA commit response implicitly ACK any previous RDMA sends/writes to same or different MR?

A. Yes, the new Commit (and Verify and Atomic Write) Responses have the same acknowledgement coalescing properties as the existing Read Response. That is, a Commit Response is explicit (non-coalesced); but it coalesces/implies acknowledgement of prior Write and/or Send Requests.

Q. Does this one still have the current RMDA Write ACK?

A. See previous general answer. Yes. A Commit Response implicitly acknowledges prior Writes.

Q. With respect to the Race Hazard explained to show the need for explicit completion response, wouldn’t this be the case even with a non-volatile Memory, if the data were to be stored in non-volatile memory. Why is this completion status required only on the non-volatile case?

A. Most networked applications that write over the network to volatile memory do not require explicit confirmation at the writer endpoint that data has actually reached there. If so, additional handshake messages are usually exchanged between the endpoint applications. On the other hand, a writer to PERSISTENT memory across a network almost always needs assurance that data has reached persistence, thus the new extension.

Q. What if you are using multiple RNIC with multiple ports to multiple ports on a 100Gb fabric for server-to-server RDMA? How is order kept there…by CPU software or ‘NIC teaming plus’?

A. This would depend on the RNIC vendor and their implementation.

Q. What is the time frame for these new RDMA messages to be available in verbs API?

A. This depends on the IBTA standards approval process which is not completely predicable, roughly sometime the first half of 2019.

Q. Where could I find more details about the three new verbs (what are the arguments)?

A. Please poll/contact/Google the IBTA and IETF organizations towards the end of calendar year 2018, when first drafts of the extension documents are expected to be available.

Q. Do you see this technology used in a way similar to Hyperconverged systems now use storage or could you see this used as a large shared memory subsystem in the network?

A. High-speed persistent memory, in either NVDIMM or SSD form factor, has enormous potential in speeding up hyperconverged write replication. It will require however substantial re-write of such storage stacks, moving for example from traditional three-phase block storage protocols (command/data/response) to an RDMA write/confirm model. More generally, the RDMA extensions are useful for distributed shared PERSISTENT memory applications.

Q. What would be the most useful performance metrics to debug performance issues in such environments?

A. Within the RNIC, basic counts for the new message types would be a baseline. These plus total stall times encountered by the RNIC awaiting Commit Responses from the local CPU subsystem would be useful. Within the CPU platform basic counts of device write and read requests targeting persistent memory would be useful.

Q. Do all the RDMA NIC’s have to update their firmware to support this new VERB’s? What is the expected performance improvement with the new Commit message?

A. Both answers would depend on the RNIC vendor and their implementation.

Q. Will the three new verbs be implemented in the RNIC alone, or will they require changes in other places (processor, memory controllers, etc.)?

A. The new Commit request requires the CPU platform and its memory controllers to confirm that prior write data has reached persistence. The new Atomic Write and Verify messages however may be executed entirely within the RNIC.

Q. What about the future of NVMe over TCP – this would be much simpler for people to implement. Is this a good option?

A. Again this would depend on the NIC vendor and their implementation. Different vendors have implemented various tests for performance. It is recommended that readers do their own due diligence.