Q&A – When Compute, Networking and Storage Intersect

In Part Vermillion of our SNIA Ethernet Storage Forum (ESF) “Everything You Wanted To Know About Storage But Were Too Proud To Ask” webcast series – we examined the terms and concepts are at the heart of where compute, networking and storage intersect. That’s why we called it “What if Programming and Networking Had a Storage Baby” If you missed the live webcast, you can watch it on-demand.

The discussion from our panel of experts generated a lot of good questions. As promised, here are answers to them all. Read More

What if Programming and Networking Had a Storage Baby? Say What?

The colorful “Everything You Wanted To Know About Storage But Were Too Proud To Ask,” popular webcast series marches on! In this 6th installment, Part – Vermillion – What if Programming and Networking Had a Storage Baby, we look into some of the nitties and the gritties of storage details that are often assumed.

When looking at data from the lens of an application, host, or operating system, it’s easy to forget that there are several layers of abstraction underneath each before the actual placement of data occurs. In this webcast we are going to scratch beyond the first layer to understand some of the basic taxonomies of these layers.

In this webcast we will show you more about the following:

  • Storage APIs and POSIX
  • Block, File, and Object storage
  • Byte Addressable and Logical Block Addressing
  • Log Structures and Journaling Systems

It’s an ambitious project, but these terms and concepts are at the heart of where compute, networking and storage intersect. Having a good grasp of these concepts ties in with which type of storage networking to use, and how data is actually stored behind the scenes.

Register today to join us on July 6th for this session. You can ask all the questions that, until now, you’ve been too proud to ask and we promise not to  to show you any baby pictures!

Update: If you missed the live event, it’s now available on-demand. You can also download the webcast slides.

Latency Budgets for Solid State Storage Access

New solid state storage technologies are forcing the industry to refine distinctions between networks and other types of system interconnects.   The question on everyone’s mind is: when is it beneficial to use networks to access solid state storage, particularly persistent memory?

It’s not quite as simple as a “yes/no” answer. The answer to this question involves application, interconnect, memory technology and scalability factors that can be analyzed in the context of a latency budget.

On April 19th, Doug Voigt, Chair SNIA NVM Programming Model Technical Work Group, returns for a live SNIA Ethernet Storage Forum webcast, “Architectural Principles for Networked Solid State Storage Access – Part 2where we will explore latency budgets for various types of solid state storage access.  These can be used to determine which combinations of interconnects, technologies and scales are compatible with Load/Store instruction access and which are better suited to IO completion techniques such as polling or blocking.

In this webcast you’ll learn:

  • Why latency is important in accessing solid state storage
  • How to determine the appropriate use of networking in the context of a latency budget
  • Do’s and don’ts for Load/Store access

This is a technical seminar built upon part 1 of this series. If you missed it, you can view it on demand at your convenience. It will give you a solid foundation on this topic, outlining key architectural principles that allow us to think about the application of  networked  solid state technologies more systematically.

I hope you will register today for the April 19th event. Doug and I will be on hand to answer questions on the spot.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Would You Like Some Rosé with Your iSCSI?

Would you like some rosé with your iSCSI? I’m guessing that no one has ever asked you that before. But we at the SNIA Ethernet Storage Forum like to get pretty colorful in our “Everything You Wanted To Know about Storage But Were Too Proud To Ask” webcast series as we group common storage terms together by color rather than by number.

In our next live webcast, Part Ros̩ РThe iSCSI Pod, we will focus entirely on iSCSI, one of the most used technologies in data centers today. With the increasing speeds for Ethernet, the technology is more and more appealing because of its relative low cost to implement. However, like any other storage technology, there is more here than meets the eye.

We’ve convened a great group of experts from Cisco, Mellanox and NetApp who will start by covering the basic elements to make your life easier if you are considering using iSCSI in your architecture, diving into:

  • iSCSI definition
  • iSCSI offload
  • Host-based iSCSI
  • TCP offload

Like nearly everything else in storage, there is more here than just a protocol. I hope you’ll register today to join us on March 2nd and learn how to make the most of your iSCSI solution. And while we won’t be able to provide the rosé wine, our panel of experts will be on-hand to answer your questions.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Buffers, Queues, and Caches, Oh My!

Buffers and Queues are part of every data center architecture, and a critical part of performance – both in improving it as well as hindering it. A well-implemented buffer can mean the difference between a finely run system and a confusing nightmare of troubleshooting. Knowing how buffers and queues work in storage can help make your storage system shine.

However, there is something of a mystique surrounding these different data center components, as many people don’t realize just how they’re used and why. Join our team of carefully-selected experts on February 14th in the next live webcast in our “Too Proud to Ask” series, “Everything You Wanted to Know About Storage But Were Too Proud To Ask – Part Teal: The Buffering Pod” where we’ll demystify this very important aspect of data center storage. You’ll learn:

  • What are buffers, caches, and queues, and why you should care about the differences?
  • What’s the difference between a read cache and a write cache?
  • What does “queue depth” mean?
  • What’s a buffer, a ring buffer, and host memory buffer, and why does it matter?
  • What happens when things go wrong?

These are just some of the topics we’ll be covering, and while it won’t be exhaustive look at buffers, caches and queues, you can be sure that you’ll get insight into this very important, and yet often overlooked, part of storage design.

Register today and spend Valentine’s Day with our experts who will be on-hand to answer your questions on the spot!

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Clearing Up Confusion on Common Storage Networking Terms

Do you ever feel a bit confused about common storage networking terms? You’re not alone. At our recent SNIA Ethernet Storage Forum webcast “Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Mauve,” we had experts from Cisco, Mellanox and NetApp explain the differences between:

  • Channel vs. Busses
  • Control Plane vs. Data Plane
  • Fabric vs. Network

If you missed the live webcast, you can watch it on-demand. As promised, we’re also providing answers to the questions we got during the webcast. Between these questions and the presentation itself, we hope it will help you decode these common, but sometimes confusing terms.

And remember, the “Everything You Wanted To Know About Storage But Were Too Proud To Ask” is a webcast series with a “colorfully-named pod” for each topic we tackle. You can register now for our next webcast: Part Teal, The Buffering Pod, on Feb. 14th.

Q. Why do we have Fibre and Fiber

A. Fiber Optics is the term used for the optical technology used by Fibre Channel Fabrics.   While a common story is that the “Fibre” spelling came about to accommodate the French (FC is after all, an international standard), in actuality, it was a marketing idea to create a more unique name, and in fact, it was decided to use the British spelling – “Fibre”.

Q. Will OpenStack change all the rules of the game?

A. Yes. OpenStack is all about centralizing the control plane of many different aspects of infrastructure.

Q. The difference between control and data plane matters only when we discuss software defined storage and software defined networking, not in traditional switching and storage.

A. It matters regardless. You need to understand how much each individual control plane can handle and how many control planes you have from a overall management perspective. In the case were you have too many control planes SDN and SDS can be a benefit to you.

Q. As I’ve heard that networks use stateless protocols, would FC do the same?

A.  Fibre Channel has several different Classes, which can be either stateful or stateless. Most applications of Fibre Channel are Class 3, as it is the preferred class for SCSI traffic, A connection between Fibre Channel endpoints is always stateful (as it involves a login process to the Fibre Channel fabric). The transport protocol is augmented by Fibre Channel exchanges, which are managed on a per-hop basis. Retransmissions are handled by devices when exchanges are incomplete or lost, meaning that each exchange is a stateful transmission, but the protocol itself is considered stateless in modern SCSI-transport Fibre Channel.

iSCSI, as a connection-oriented protocol, creates a nexus between an initiator and a target, and is considered stateful.  In addition, SMB, NFSv4, ftp, and TCP are stateful protocols, while NFSv2, NFSv3, http, and IP are stateless protocols.

Q. Where do CIFS/SMB come into the picture?

A. CIFFS/SMB is part of a network stack.   We need to have a separate talk about network stacks and their layers.   In this presentation, we were talking primarily about the physical layer of the networks and fabrics.   To overly simplify network stacks, there are multiple layers of protocols that run on top of the physical layer.   In the case of FC, those protocols include the control plane protocols (such as FC-SW), and the data plane protocols.   In FC, the most common data plane protocol is FCP (used by SCSI, FICON, and FC-NVMe).   In the case of Ethernet, those protocols also include the control plan (such as TCP/IP), and data plane protocols.   In Ethernet, there are many commonly used data plane protocols for storage (such as iSCSI, NFS, and CIFFS/SMB)

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Storage Basics Q&A and No One’s Pride was Hurt

In the first of our “Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Chartreuse,” we covered the storage basics to break down the entire storage picture and identify the places where most of the confusion falls. It was a very well attended event and I’m happy to report, everyone’s pride stayed intact! We got some great questions from the audience, so as promised, here are our answers to all of them:

Q. What is parity? What is XOR?

A. In RAID, there are generally two kinds of data that are stored: the actual data and the parity data. The actual data is obvious; parity data is information about the actual data that you can use to reconstruct it if something goes wrong.

It’s important to note that this is not simply a copy of A and B, but rather a logical operation that is applied to the data. Commonly for RAID (other than simple mirroring) the method used is called an exclusive or, or XOR for short. The XOR function outputs true only when inputs differ (one is true, the other is false).

There’s a neat feature about XOR, and the reason it’s used by RAID. Calculate the value A XOR B (let’s call it AxB). Here’s an example on a pair of bytes.

A                                                                   10011100

B                                                                   01101100

A XOR B is AxB              11110000

Store all three values on separate disks. Now, if we lose A or B, we can use the fact that AxB XOR B is equal to A, and AxB XOR A is equal to B. For example, for A;

B                                                                   01101100

AxB                                                           11110000

A XOR AxB is A              10011100

We’ve regenerated the A we lost. (If we lose the parity bits, they can just be reconstructed from A and B.)

Q. What is common notation for RAID? I have seen RAID 4+1, and RAID (4,1). In the past, I thought this meant a total of 5 disks, but in your explanation it is only 4 disks.

A. RAID is notated by levels, which is determined by the way in which data is laid out on disk drives (there are always at least two). When attempting to achieve fault tolerance, there is always a trade-off between performance and capacity. Such is life.

There are 4 common RAID levels in use today (there are others, but these are the most common): RAID 0, RAID 1, RAID 5, and RAID 6. As a quick reminder from the webinar (you can see pictures of these in action there):

  • RAID 0: Data is striped across the disks without any parity. Very fast, but very unsafe (if you lose one, you lose all)
  • RAID 1: Data is mirrored between disks without any parity. Slowest, but you have an exact copy of the data so there is no need to recalculate anything to reconstruct the data.
  • RAID 5: Data is striped across multiple disks, and the parity is striped across multiple disks. Often seen as the best compromise: Fast writes and good safety net. Can withstand one disk loss without losing data.
  • RAID 6: Data is striped across multiple disks, and two parity bits are stored on all the disks. Same advantages of RAID 5, except now you can lose 2 drives before data loss.

Now, if you have enough disks, it is possible to combine RAID levels. You can, for instance, have four drives that combine mirroring and striping. In this case, you can have two sets of drives that are mirrored to each other, and the data is striped to each of those sets. That would be RAID 1+0, or often called RAID 10. Likewise, you can have two sets of RAID 5 drives, and you could stripe or mirror to each of those sets, and it would be RAID 50 or RAID 51, respectively.

Erasure Coding has a different notation, however. It does not use levels like RAID; instead, EC identifies the number of data bits and the number of parity bits.

So, with EC, you take a file or object and split it into ‘k’ blocks of equal size. Then, you take those k blocks and generate n blocks of the same size, such that any k out of n blocks suffice to reconstruct the original file. This results in a (n,k) notation for EC.

Since RAID is a subset of EC, RAID6 is the equivalent of EC or RAID(n,2) or n data disks and 2 parity disks. RAID(4,1) is RAID5 with 4 data and 1 parity, and so on.

Q. Which RAIDs are classified/referred to as EC? I have often heard people refer to RAID 5/6 as EC. Is this only limited to 5/6?

A. All RAID levels are types of EC. The math is slightly different; traditional RAID uses XOR, and EC uses Galois Fields or polynomial arithmetic.

Q. What’s the advantage of RAID5 over RAID1?

A. As noted above, there is a tradeoff between the amount of capacity that you need in order to stay fault tolerant, and the performance you wish to have in any system.

RAID 1 is a mirrored system, where you have a single block of data being written twice – one to each disk. This is done in parallel, so it doesn’t take any extra time to do the write, but there’s no speed-up either. One advantage, however, is that if a disk fails there is no need to perform any logical calculations to reconstruct data – you already have a copy of the intact data.

RAID 5 is more distributed. That is, blocks of data are written to multiple disks simultaneously, along with a parity block. That is, you are breaking up the writing obligations across multiple disks, as well as sending parity data across multiple disks. This significantly speeds up the write process, but more importantly it also distributes the recovery capabilities as well so that any disk can fail without losing data.

Q. So RAID improves WRITES? I guess because it breaks the data into smaller pieces that can be written in parallel. If this is true, then why will READ not benefit from RAID? Isn’t it that those pieces can be read and re-combined into a larger piece from parallel sources would be faster?

A. RAID and the “striping” of IO can improve writes by reducing serialization by allowing us to write anywhere. But a specific block can only be read from the disk it was written to, and if we’re already reading or writing to that disk and it’s busy – we must wait.

Q. Why is EC better for object stores than RAID?

A. Because there’s more redundancy, EC can be made to operate across unreliable and less responsive links, and at potentially geographic scales.

Q: Can you explain about the “RAID Penalty?” I’ve heard it called “Write Penalty” or “Read before Write penalty.”

A. When updating data that’s already been written to disk, there’s a requirement to recalculate the parity data used by RAID. For example, if we update a single byte in a block, we need to read all the blocks, recalculate the parity, and write back the updated data block and the parity block (twice in the case of dual parity RAID6).

There are some techniques that can be used to improve the performance impact. For example, some systems don’t update blocks in place, but use pointer-based systems and only write new blocks. This technique is used by flash-based SSDs as the write size is often 256KB or larger. This can be done in the drive itself, or by the RAID or storage system software. It is very important to avoid when using Erasure Coding as there are so many data blocks and parity blocks to recalculate and rewrite that it would become prohibitive to do an update.

Q.  What is the significance of RAIN? We have not heard much about it.

A.A Redundant Array of Independent Nodes works under the same principles of RAID – that is, each node is treated as a failure domain that must be avoided as a Single Point of Failure (SPOF).Where as RAID maintains an understanding of data placement on individual drives within a node, RAIN maintains an understanding of data placement on nodes (that contain drives) within a storage environment.

Q.  Is host same as node?

A.  At its core, a “node” is an endpoint. So, a host can be a node, but so can a storage device at the other end of the wire.

Q.  Does it really matter what Erasure Coding (EC) technologies are named or is EC just EC?

A.  A. Erasure Coding notation refers to the level of resilience involved. This notation underscores not only the write patterns for storage of data, but also the mechanisms necessary for recovery. What ‘matters’ really will depend upon the level of involvement for those particular tasks.

Q. Is the Volume Manager concept related to Logical Unit Numbering (LUNs)?

A.  It can be. A volume manager is an abstraction layer that allows a host operating system to create a Volume out of one or more media locations. These locations can be either logical or physical. A LUN is an aggregation of media on the target/storage side. You can use a Volume Manager to create a single, logical volume out of multiple LUNs, for instance.

A. For additional information on this, you may want to watch our SNIA-ESF webcast, “Life of a Storage Packet (Walk).”

Q. What’s the relationship between disk controller and volume manager?

A. Following on the last question, a disk controller does exactly what it sounds like – it controls disks. A RAID controller, likewise, controls disks and the read/write mechanisms. Some RAID controllers have additional software abstraction capabilities that can act as a volume manager as well.

We hope these answers clear things up a bit more. As you know, our “Everything You Wanted To Know About Storage, But Were Too Proud To Ask” is a series, since this Chartreuse event, we’ve done “Part Mauve – The Architecture Pod” where we explained channel vs. bus, control plane vs. data plane and fabric vs. network. Check it out on-demand and follow us on Twitter @SNIAESF for announcements on upcoming webcasts.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

 

The Everything You Want To Know About Storage Is On Again With Part Mauve – The Architecture Pod

The first installment of our “colorful” Webcast series, “Everything You Wanted To Know about Storage But Were Too Proud To Ask – Part Chartreuse,” covered the fundamental elements of storage systems. If you missed it, you can check it out on-demand. On November 1st, we’ll be back at it, focusing on the network aspect of storage systems with “Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Mauve.”

As with any technical field, it’s too easy to dive into the jargon of the pieces and expect people to know exactly what you mean. Unfortunately, some of the terms may have alternative meanings in other areas of technology. In this Webcast, we look at some of those terms specifically and discuss them as they relate to storage networking systems.

In particular, you’ll find out what we mean when we talk about:

  • Channel versus Busses
  • Control Plane versus Data Plane
  • Fabric versus Network

Register now for Part Mauve of “Everything You Wanted To Know About Storage But Were Too Proud to Ask.

For people who are familiar with data center technology, whether it be compute, programming, or even storage itself, some of these concepts may seem intuitive and obvious… until you start talking to people who are really into this stuff. This series of Webcasts will help be your Secret Decoder Ring to unlock the mysteries of what is going on when you hear these conversations. We hope to see you there!

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

 

 

Everything You Wanted to Know about Storage, but were too Proud to Ask

Many times we know things without even realizing it, or remembering how we came to know them. In technology, this often comes from direct, personal experience rather than some systematic process. In turn, this leads to “best practices” that come from tribal knowledge, rather than any inherent codified set of rules to follow.

In the world of storage, for example, it’s very tempting to simply think of component parts that can be swapped out interchangeably. Change out your spinning hard drives for solid state, for example, you can generally expect better performance. Change the way you connect to the storage device, get better performance… or do you?

Storage is more holistic than many people realize, and as a result there are often unintended consequences for even the simplest of modifications. With the ‘hockey stick-like’ growth in innovation over the past couple of years, many people have found themselves facing terms and concepts in storage that they feel they should have understood, but don’t.

These series of webcasts are designed to help you with those troublesome spots: everything you thought you should know about storage but were afraid to ask.

Here, we’re going to go all the way back to basics and define the terms so that people can understand what people are talking about in those discussions. Not only are we going to define the terms, but we’re going to talk about terms that are impacted by those concepts once you start mixing and matching.

For example, when we say that we have a “memory mapped” storage architecture, what does that mean? Can we have a memory mapped storage system at the other end of a network? If so, what protocol should we use – iSCSI? POSIX? NVMe over Fabrics? Would this be an idempotent system or an object-based storage system?

Now, if that above paragraph doesn’t send you into fits of laughter, then this series of webcasts is for you (hint: most of it was complete nonsense… but which part? Come watch to find out!).

On September 7th, we will start with the very basics – The Naming of the Parts. We’ll break down the entire storage picture and identify the places where most of the confusion falls. Join us in this first webcast – Part Chartreuse – where we’ll learn:

  • What an initiator is
  • What a target is
  • What a storage controller is
  • What a RAID is, and what a RAID controller is
  • What a Volume Manager is
  • What a Storage Stack is

Too proud to ask

 

With these fundamental parts, we’ll be able to place them into a context so that you can understand how all these pieces fit together to form a Data Center storage environment. Future webcasts will discuss:

Part Mauve – Architecture Pod:

  • Channel v. bus
  • Control plane v. data plane
  • Fabric v. network

Part Teal – Buffering Pod:

  • Buffering v. Queueing (with Queue Depth)
  • Flow Control
  • Ring Buffers

Part Ros̩ РiSCSI Pod:

  • iSCSI offload
  • TCP offload
  • Host-based iSCSI

Part Sepia – Getting-From-Here-To-There Pod:

  • Encapsulation v. Tuning
  • IOPS v. Latency v. Jitter

Part Vermillion – The What-if-Programming-and-Networking-Had-A-Baby Pod:

  • Storage APIs v. POSIX
  • Block v. File v. Object
  • Idempotent
  • Coherence v. Cache Coherence
  • Byte Addressable v. Logical Block Addressing

Part Turquoise – Where-Does-My-Data-Go Pod:

  • Volatile v. Non-Volatile v Persistent Memory
  • NVDIMM v. RAM v. DRAM v. SLC v. MLC v. TLC v. NAND v. 3D NAND v. Flash v SSDs v. NVMe
  • NVMe (the protocol)

Part Cyan – Storage Management:

  • Management Processes: Discovery, Provisioning and Configuration
  • Software-Defined Storage
  • Storage Management Standards: SMI-S, SNIA Swordfish

Part Aqua – Storage Controllers:

  • Storage Controllers 101
  • SCSI Controllers
  • Fibre Channel Controllers
  • NVMe Controllers
  • Networking SDN and Storage SDN Controllers

Part Taupe – Memory Pod:

  • Memory Mapping
  • Physical Region Page (PRP)
  • Scatter Gather Lists
  • Offset

Part Burgundy – Orphans Pod

  • Doorbells
  • Controller Memory Buffers

Of course, you may already be familiar with some, or all, of these concepts. If you are, then these webcasts aren’t for you. However, if you’re a seasoned professional in technology in another area (compute, networking, programming, etc.) and you want to brush up on some of the basics without judgment or expectations, this is the place for you.

Oh, and why are the parts named after colors, instead of numbered? Because there is no order to these webcasts. Each is a standalone seminar on understanding some of the elements of storage systems that can help you learn about technology without admitting that you were faking it the whole time! If you are looking for a starting point – the absolute beginning place – please start with Part Chartreuse, “The Naming of the Parts.”

Update: This series is now available on-demand. You can also download the webcast slides. Happy viewing!

Watch: Chartreuse – The Basics  Download: Webcast slides

Watch: Mauve – The Architecture Pod Download: Webcast slides

Watch: Teal – The Buffering Pod  Download: Webcast slides

Watch: Rosé – This iSCSI Pod  Download: Webcast slides

Watch: Sepia – Getting from Here to There Download: Webcast slides

Watch: Vermillion – What if Programming and Networking Had a Storage Baby  Download: Webcast slides

Watch: Turquoise – Where Does My Data Go?  Download: Webcast slides

Watch: Cyan – Storage Management Download: Webcast slides

Watch: Aqua – Storage Controllers Download: Webcast slides  

Watch: Taupe – Memory    Download: Webcast slides

 

Principles of Networked Solid State Storage – Q&A

At this month’s SNIA Ethernet Storage Forum Webcast, “Architectural Principles for Networked Solid State Storage Access,” Doug Voigt, Chair of the SNIA NVM Programming Technical Working Group, and a member of the SNIA Technical Council, outlined key architectural principles surrounding the application of  networked  solid state technologies. We had a flurry of questions near the end of the Webcast that we did not have enough time to answer. Here are Doug’s answers to all the questions we received during the event:

Q. Are there wait cycles in accessing persistent memory?

A. It depends entirely on which persistent memory (PM) technology is being accessed and how the memory interconnect is used.   Some technologies have write times that are quite different from read times.   When using tightly timed interconnects such as DDR with those technologies it may be difficult to avoid wait cycles.

Q. How do Pmalloc and malloc share the virtual address space of the application?

A. This is entirely up to the OS and other libraries operating within any constraints of the processor architecture-specific memory management units.   A good mental model would be fairly large regions of contiguous address space in both the physical and virtual domains, where each region will comprise a single type of memory. Capacity will be reserved for pmalloc and malloc in the appropriate regions.

Q. Always flush after doing your memory-mapped IO.   Is that simply good hygiene?

A. Not exactly. The term “Memory Mapped IO” is used to reference control plane (as opposed to data plane) access.   It is often reasonable to set up control plane memory as uncacheable. The need for strict order of access to physical control plane registers is so pervasive that caching is generally not useful. Uncacheable writes are always flushed by the processor, as opposed to the application.

Generally with memory mapped IO devices the data plane uses direct memory access (DMA).   With memory mapped files (as opposed to memory mapped IO) Load/Store (more commonly referred to as “Ld/St”), not DMA, is used in the data plane. Disabling caching in the data plane is generally a big performance sacrifice for small byte range access.

In the Ld/St datapath, strategically placed flushing is required to retain both performance and power failure recovery. The SNIA NVM Programming Model describes this type of functionality.

Q. Once NVDIMM support become pervasive with support from NVMe drives in the server box, should network storage be more focused on SAS Flash or just SAS HDDs?

A. Not necessarily.   NVMe over Fabric, Fibre Channel and iSCSI are also types of networked storage that will likely retain significant market share relative to SAS.

Q. Are the ‘Big Data’ Data Warehouse applications starting to use the persistence memory and domain technologies in their applications?

A. It is too early to see much of this yet. PM technologies might become a priority as a staging area for analytic applications with high ingest or checkpoint rates. NVDIMMs are likely to be too expensive to store anything “big” for quite a while.

Q. Also, is the persistence memory/domains being used in the Hyper-converged and Converged hardware infrastructures?

A. Persistent memory is quintessentially (Hyper-) converged.   It wouldn’t be unreasonable to expect some traction with hyper-converged solutions that experience high storage-performance demand.

Q. What distance would you associate with 10’s of microseconds?

A. In terms of transmission delay, 10’s of uS align with a campus or small city scale, but the distance itself is often not the primary factor.   Switching delays, transmission line properties and software overhead are generally bigger factors.

Q. So latency would be the binding factor for distances…not a question, an observation.

A. Yes, in effect, either through transmission or relay.   See above.

Q. Aren’t there multi-threaded SSDs?

A. Yes, but since the primary metric in this presentation is latency we ignore multi-threading.   It can enable more work to get done, but it generally increases latency rather than reducing it.

Q. Is Pmalloc universal usage?

A. The term is starting to be recognized among developers and has been used in research. Various similar names have been used in early research prototypessuch as pmalloc in Mnemosyne and nvmalloc in SCMFS.

Q. So how would PM help in a (stock broking) requirement, where we currently prophesize an RDMA or iWARP solution?

A. With PM the answer is always lower latency.   PM can be litegrated like memory or like flash. RDMA network paths for both of these options were discussed in the presentation. In either case, PM is low-latency enough that networking and software overheads will completely determine performance, even when using RDMA. The performance boost from PM is greatest when it is accessed locally.   If remote access is a requirement then the new work being done in the RDMA community should help.

Q. If data stored in memory requires to be copied to a different host, memory (for consistency) how does PM assist, or is there an extension to PM? Coherency between multiple hosts in a cluster, if you will?

A. PM technology does not help with this; the methods of managing consistency across hosts remain unchanged by PM.   All PM offers is low latency persistence.

Coordination across hosts or nodes in a cluster must use existing clustering techniques such as locking and quorums. In addition, the relative timescales of memory access and network communication suggest the application of asynchronous remote replication techniques used in today’s storage solutions.

Regarding coherency, PM brings nothing new to the known techniques for managing coherency.   Classical cluster architecture must be applied outside of symmetric multi-processing coherency domains. Within coherency domains, all of the logic is above the PM level in a processor side memory controller or a software emulation of the same algorithms.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.