John Kim - SNIA on Data, Networking & Storage

Introducing the Networking Storage Forum

October 9, 2018October 9, 2018 John Kim 1 Comment

At SNIA, we are dedicated to staying on top of storage trends and technologies to fulfill our mission as a globally recognized and trusted authority for storage leadership, standards, and technology expertise. For the last several years, the Ethernet Storage Forum has been working hard to provide high quality educational and informational material related to all kinds of storage.

From our “Everything You Wanted To Know About Storage But Were Too Proud To Ask” series, to the absolutely phenomenal (and required viewing) “Storage Performance Benchmarking” series to the “Great Storage Debates” series, we’ve produced dozens of hours of material.

Technologies have evolved and we’ve come to a point where there’s a need to understand how these systems and architectures work – beyond just the type of wire that is used. Today, there are new systems that are bringing storage to completely new audiences. From scale-up to scale-out, from disaggregated to hyperconverged, RDMA, and NVMe-oF – there is more to storage networking than just your favorite transport.

For example, when we talk about NVMe™ over Fabrics, the protocol is broader than just one way of accomplishing what you need. When we talk about virtualized environments, we need to examine the nature of the relationship between hypervisors and all kinds of networks. When we look at “Storage as a Service,” we need to understand how we can create workable systems from all the tools at our disposal.

Bigger Than Our Britches

As I said, SNIA’s Ethernet Storage Forum has been working to bring these new technologies to the forefront, so that you can see (and understand) the bigger picture. To that end, we realized that we needed to rethink the way that our charter worked, to be even more inclusive of technologies that were relevant to storage and networking.

So…

Introducing the Networking Storage Forum. In this group we’re going to continue producing top-quality, vendor-neutral material related to storage networking solutions. We’ll be talking about:

Storage Protocols (iSCSI, FC, FCoE, NFS, SMB, NVMe-oF, etc.)
Architectures (Hyperconvergence, Virtualization, Storage as a Service, etc.)
Storage Best Practices
New and developing technologies

… and more!

Generally speaking, we’ll continue to do the same great work that we’ve been doing, but now our name more accurately reflects the breadth of work that we do.

We’re excited to launch this new chapter of the Forum. If you work for a vendor, are a systems integrator, university or someone who manages storage, we welcome you to join the NSF. We are an active group that honestly has a lot of fun. If you’re one of our loyal followers, we hope you will continue to keep track of what we’re doing. And if you’re new to this Forum, we encourage you to take advantage of the library of webcasts, white papers, and published articles that we have produced here. There’s a wealth of un-biased, educational information there, we don’t think you’ll find anywhere else!

If there’s something that you’d like to hear about – let us know! We are always looking to hear about headaches, concerns, and areas of confusion within the industry where we can shed some light. Stay current with all things NSF:

Visit: snia.org/nsf
Follow: @SNIANSF
Subscribe: sniansfblog.org
Watch: SNIAVideo Network Storage Playlist

Oh What a Tangled Web We Weave: Extending RDMA for PM over Fabrics

October 8, 2018October 8, 2018 John Kim Leave a comment

For datacenter applications requiring low-latency access to persistent storage, byte-addressable persistent memory (PM) technologies like 3D XPoint and MRAM are attractive solutions. Network-based access to PM, labeled here Persistent Memory over Fabrics (PMoF), is driven by data scalability and/or availability requirements. Remote Direct Memory Access (RDMA) network protocols are a good match for PMoF, allowing direct RDMA data reads or writes from/to remote PM. However, the completion of an RDMA Write at the sending node offers no guarantee that data has reached persistence at the target.

Join the Networking Storage Forum (NSF) on October 25, 2018 for out next live webcast, Extending RDMA for Persistent Memory over Fabrics. In this webcast, we will outline extensions to RDMA protocols that confirm such persistence and additionally can order successive writes to different memories within the target system. Learn:

Why we can’t just treat PM just like traditional storage or volatile memory
What happens when you write to memory over RDMA
Which programming model and protocol changes are required for PMoF
How proposed RDMA extensions for PM would work

We believe this webcast will appeal to developers of low-latency and/or high-availability datacenter storage applications and be of interest to datacenter developers, administrators and users. I encourage you to register today. Our NSF experts will be on hand to answer you questions. We look forward to your joining us on October 25^th.

RoCE vs. iWARP Q&A

September 19, 2018September 19, 2018 John Kim Leave a comment

In our RoCE vs. iWARP webcast, experts from the SNIA Ethernet Storage Forum (ESF) had a friendly debate on two commonly known remote direct memory access (RDMA) protocols that run over Ethernet: RDMA over Converged Ethernet (RoCE) and the IETF-standard iWARP. It turned out to be another very popular addition to our “Great Storage Debate” webcast series. If you haven’t seen it yet, it’s now available on-demand along with a PDF of the presentation slides.

We received A LOT of questions related to Performance, Scalability and Distance, Multipathing, Error Correction, Windows and SMB Direct, DCB (Data Center Bridging), PFC (Priority Flow Control), lossless networks, and Congestion Management, and more. Here are answers to them all. Read More

RoCE vs. iWARP – The Next “Great Storage Debate”

July 16, 2018July 16, 2018 John Kim Leave a comment

By now, we hope you’ve had a chance to watch one of the webcasts from the SNIA Ethernet Storage Forum’s “Great Storage Debate” webcast series. To date, our experts have had friendly, vendor-neutral debates on File vs. Block vs. Object Storage, Fibre Channel vs. iSCSI, and FCoE vs. iSCSI vs. iSER. The goal of this series is not to have a winner emerge, but rather educate the attendees on how the technologies work, advantages of each, and common use cases.

Our next great storage debate will be on August 22, 2018 where our experts will debate RoCE vs. iWARP. They will discuss these two commonly known RDMA protocols that run over Ethernet: RDMA over Converged Ethernet (RoCE) and the IETF-standard iWARP. Both are Ethernet-based RDMA technologies that can increase networking performance. Both reduce the amount of CPU overhead in transferring data among servers and storage systems to support network-intensive applications, like networked storage or clustered computing.

Join us on August 22^nd, as we’ll address questions like:

Both RoCE and iWARP support RDMA over Ethernet, but what are the differences?
Use cases for RoCE and iWARP and what differentiates them?
UDP/IP and TCP/IP: which RDMA standard uses which protocol, and what are the advantages and disadvantages?
What are the software and hardware requirements for each?
What are the performance/latency differences of each?

Get this on your calendar by registering now. Our experts will be on-hand to answer your questions on the spot. We hope to see you there!

Visit snia.org to learn about the work SNIA is doing to lead the storage industry worldwide in developing and promoting vendor-neutral architectures, standards, and educational services that facilitate the efficient management, movement, and security of information.

File, Block and Object Storage: Real-world Questions, Expert Answers

May 16, 2018May 16, 2018 John Kim Leave a comment

More than 1,200 people have already watched our Ethernet Storage Forum (ESF) Great Storage Debate webcast “File vs. Block vs. Object Storage.” If you haven’t seen it yet, it’s available on demand. This great debate generated many interesting questions. As promised, our experts have answered them all here.

Q. What about the encryption technologies on file storage? Do they exist, and how do they affect the performance compared to unencrypted storage?

A. Yes, encryption of file data at rest can be done by the storage software, operating system, or the drives themselves (self-encrypting drives). Encryption of file data on the wire can be done by the storage software, OS, or specialized network cards. These methods can usually also be applied to block and object storage. Encryption requires processing power so if it’s done by the main CPU it might affect performance. If encryption is offloaded to the HBA, drive, or SmartNIC then it might not affect performance.

Q. Regarding block size, I thought that block size settings were also used to tune and optimize file protocol transfer, for example in NFS, am I wrong?

A. That is correct, block size refers to the size of data in each I/O and can be applied to block, file and object storage, though it may not be used very often for object storage. NFS and SMB both let you specific block I/O size.

Q. What is the main difference between object and file? Is it true that File has a hierarchical structure, while object does not?

A. Yes that is one important difference. Another difference is the access method–folder/file/offset for files and key-value for objects. File storage also often allows access to specific data within a file and in many cases shared writes to the same file, while object storage typically offers only shared reads and most object storage systems do not allow direct updates to existing objects.

Q. What is the best way to backup a local Object store system?

A. Most object storage systems have built-in data protection using either replication or erasure coding which often replicates the data to one or more remote locations. If you deploy local object storage that does not include any remote replication or erasure coding protection, you should implement some other form of backup or replication, perhaps at the hardware or operating system level.

Q. I feel that this discussion conflates object storage with cloud storage features, and presumes certain cloud features (for example security) that are not universally available or really part of Object Storage. This is a very common problem with discussions of objects — they typically become descriptions of one vendor’s cloud features.

A. Cloud storage can be block, file, and/or object, though object storage is perhaps more popular in public and private cloud than it is in non-cloud environments. Security can be required and deployed in both enterprise and cloud storage environments, and for block, file and object storage. It was not the intention of this webinar to conflate cloud and object storage; we leave that to the SNIA Cloud Storage Initiative (CSI).

Q. How do open source block, file and object storage products play into the equation?

A. Open source software solutions are available for block, file and object storage. As is usually the case with other open-source, these solutions typically make storage (block, file or object) available at a lower acquisition cost than commercial storage software or appliances, but at the cost of higher complexity and higher integration/support effort by the end user. Thus customers who care most about simplicity and minimizing their integration/support work tend to buy commercial appliances or storage software, while large customers who have enough staff to do their own storage integration, testing and support may prefer open-source solutions so they don’t have to pay software license fees.

Q. How is data [0s and 1s in hard disk] converted to objects or vice versa?

A. In the beginning there were electrons, with conductors, insulators, and semi-conductors (we skipped the quantum physics level of explanation). Then there were chip companies, storage companies, and networking companies. Then The Storage Networking Industry Association (SNIA) came along… The short answer is some software (running in the storage server, storage device, or the cloud) organizes the 0s and 1s into objects stored in a file system or object store. The software makes these objects (full of 0s and 1s) available via a key-value systems and/or a RESTful API. You submit data (stream of 1s and 0s) and get a key-value in return. Or you submit a key-value and get the object (stream of 1s and 0s) in return.

Q. What is the difference (from an operating system perspective where the file/object resides) between a file in mounted NFS drive and object in, for example Google drive? Isn’t object storage (under the hood) just network file system with rest API access?

A. Correct–under the hood there are often similarities between file and object storage. Some object storage systems store the underlying data as file and some file storage systems store the underlying data as objects. However, customers and applications usually just care about the access method, performance, and reliability/availability, not the underlying storage method.

Q. I’ve heard that an Achilles’ Heel of Object is that if you lose the name/handle, then the object is essentially lost. If true, are there ways to mitigate this risk?

A. If you lose the name/handle or key-value, then you cannot access the object, but most solutions using object storage keep redundant copies of the name/handle to avoid this. In addition, many object storage systems also store metadata about each object and let you search the metadata, so if you lose the name/handle you can regain access to the object by searching the metadata.

Q. Why don’t you mention concepts like time to first byte for object storage performance?

A. Time to first byte is an important performance metric for some applications and that can be true for block, file, and object storage. When using object storage, an application that is streaming out the object (like online video streaming) or processing the object linearly from beginning to end might really care about time to first byte. But an application that needs to work on the entire object might care more about time to load/copy the entire object instead of time to first byte.

Q. Could you describe how storage supports data temperatures?

A. Data temperatures describe how often data is accessed, where “hot” data is accessed often, “warm” data occasionally, and “cold” data rarely. A storage system can tier data so the hottest data is on the fastest storage while the coldest data is on the least expensive (and presumably slowest) storage. This could mean using block storage for the hot data, file storage for the warm data, and object storage for the cold data, but that is just one option. For example, block storage could be for cold data while file storage is for hot data, or you could have three tiers of file storage.

Q. Fibre channel uses SCSI. Does NVMe over Fibre Channel use SCSI too? That would diminish NVMe performance greatly.

A. NVMe over Fabrics over Fibre Channel does not use the Fibre Channel Protocol (FCP) and does not use SCSI. It runs the NVMe protocol over a FC-NVMe transport on top of the physical Fibre Channel network. In fact none of the NVMe over Fabrics options use SCSI.

Q. I get confused when some one says block size for block storage, also block size for NFS storage and object storage as well. Does block size means different for different storage type?

A. In this case “block size” refers to the size of the data access and it can apply to block, file, or object storage. You can use 4KB “block size” to access file data in 4KB chunks, even though you’re accessing it through a folder/file/offset combination instead of a logical block address. Some implementations may limit which block sizes you can use. Object storage tends to use larger block sizes (128KB, 1MB, 4MB, etc.) than block storage, but this is not required.

Q. One could argue that file system is not really a good match for big data. Would you agree?

A. It depends on the type of big data and the access patterns. Big data that consists of large SQL databases might work better on block storage if low latency is the most important criteria. Big data that consists of very large video or image files might be easiest to manage and protect on object storage. And big data for Hadoop or some machine learning applications might work best on file storage.

Q. It is my understanding that the unit for both File Storage & Object storage is File – so what is the key/fundamental difference between the two?

A. The unit for file storage is a file (folder/file/offset or directory/file/offset) and the unit for object storage is an object (key-value or object name). They are similar but not identical. For example file storage usually allows shared reads and writes to the same file, while object storage usually allows shared reads but not shared writes to the object. In fact many object storage systems do not allow any writes or updates to the middle of an object–they either allow only appends to the end of the object or don’t allow any changes to an object at all once it has been created.

Q. Why is key value store more efficient and less costly for PCIe SSD? Can you please expand?

A. If the SSD supports key-value storage directly, then the applications or storage servers don’t have to perform the key-value translation. They simply submit the key value and then write or read the related data directly from the SSDs. This reduces the cost of the servers and software that would otherwise have to manage the key-value translations, and could also increase object storage performance. (Key-value storage is not inherently more efficient for PCIe SSDs than for other types of SSDs.)

Interested in more SNIA ESF Great Storage Debates? Check out:

If you have an idea for another storage debate, let us know by commenting on this blog. Happy debating!

File vs. Block vs. Object Storage – Are Worlds Colliding?

March 16, 2018March 16, 2018 John Kim Leave a comment

When it comes to storage, a byte is a byte is a byte, isn’t it?

One of the enduring truths about simplicity is that scale makes everything hard, and with that comes complexity. And when we’re not processing the data, how do we store it and access it?

The only way to manage large quantities of data is to make it addressable in larger pieces, above the byte level. For that, we’ve designed sets of data management protocols that help us do several things: address large lumps of data by some kind of name or handle, organize it for storage on external storage devices with different characteristics, and provide protocols that allow us to programmatically write, find, and read it.

On April 17^th, the SNIA Ethernet Storage Forum will host another of its “Great Debates” webcasts. This time, it’s “File vs. Block vs. Object Storage.” In this live webcast, our experts, Mark Carlson, Alex McDonald and Saqib Jang will compare three types of data organization: file, block and object storage, and the access methods that support them. Each has its own set of use cases, advantages and disadvantages. Each provides data management ranging from simple to sophisticated, and each makes different demands on storage devices and programming technologies.

Perhaps you’re comfortable with block and file, but are interested in investigating the more recent class of object storage and access. Perhaps you’re happy with your understanding of objects, but would really like to understand files a bit better. Or perhaps you want to understand how file, block and object are implemented on the underlying storage systems – and how one can be made to look like the other, depending on how the storage is accessed. Join us as we discuss and debate:

Storage devices
- How different types of storage drive different management & access solutions
- Which use cases tend to favor block, file or object
Block
- Where everything is in fixed-size chunks
- SCSI and SCSI-based protocols, and how FC and iSCSI fit in
Files
- When everything is a stream of bytes
- NFS and SMB
Objects
- When everything is a BLOB
- HTTP, key value and RESTful interfaces
Altogether…
- When files, blocks and objects collide, it will rock your world!

I will be moderating this “friendly debate” where there won’t be winners or losers, just more information on these three popular data storage technologies. We hope you will register today to come join the debate on April 17^th.

And if you missed our first hugely popular “Great Debate” – Fibre Channel vs. iSCSI, it’s now available on-demand.

SMB3 – These Questions Rock!

April 24, 2017July 18, 2017 John Kim 2 Comments

Earlier this month, the SNIA Ethernet Storage Forum hosted a live webcast on Server Message Block (SMB), “Rockin’ and Rollin’ with SMB3.” Presenting was Ned Pyle, Microsoft SMB Program Manager. If you missed the live event, I encourage you to watch it on-demand. We had a lot of questions from the big audience this event drew, so as promised, here are answers to them all.

Q. Other than that audit setup, is there a way to determine, via the OS, which SMB version is in use?

A. No. Network captures alone will tell you, but Windows doesn’t track this explicitly other than SMB1 with auditing we added specifically for the task of identifying removal options

Q. SMB 3.1.1 over Ethernet… can you discuss/compare with SMB 3.1.1 over Infiniband?

A. If the question is â€˜what’s better, Infiniband or Ethernet’, my answer is always: it depends. I really don’t want to get into a competitive conversation under the guides of SNIA. I simply recommend looking at the vendor stories and make an informed decision. Overall, Ethernet/TCP/IP versions like RoCE and iWARP configurations are generally less expensive than Infiniband ones. They all have tremendous performance. They all have their various ups and downs.

Q. Do you have statistics regarding SMB-Direct adoption?

A. It’s tricky, as our telemetry for Server usage is quite inaccurate due to firewall rules preventing servers from reaching the Internet. I can say indirectly that we know of thousands of customer deployments.

Q. What’s the name of the IO application?

A. DiskSPD

Q. I don’t believe your I/O data tests, wouldn’t you need to trunk 17 10 Gigabit Network Cards to achieve 168 gigabit I/O capability?

A. This was a misunderstanding, you thought I said 10Gb but it was 100Gb. We used 100Gb RDMA NICs in this demo with RoCEv2. The bottleneck was the storage at that point, the network had plenty of bandwidth left over.

Q. These are great, but how many of these new features will end up locking out FOSS/GPL implementations of SMB such as SAMBA?

A. Absolutely not! We work with Samba team and Linux to ensure that SMB can be broadly deployed with all of its capabilities inside open source software.

Q. NetApp supports CA shares (which uses transparent failover) in two use cases: SQL over SMB and Hyper-V over SMB3.

A. This sounds likes someone from NetApp stating a fact, so I will simply say “good!” 🙂

Q. Can you please post links to the tools mentioned in this presentation, and I/O tests? Is there a comparison using I/O Meter?

A. Here you go:

https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223
https://github.com/Microsoft/diskspd
https://github.com/Microsoft/diskspd/tree/master/Frameworks/VMFleet

Q. You are forced to use SMB1 because of the Windows 2003 issue?

A. Windows Server 2003 and XP (and older, like Win2000) all use SMB1. If they are still around, you will need to leave SMB1 enabled on any machines talking to them.

Q. When will Microsoft officially drop support for SMB1?

A. Overall for the protocol, there is no timeline. It is deprecated however, so no further work will be done in SMB1 other than critical security patches. SMB1 will start being removed *by default* in a coming release of Windows Server and Windows 10 client. This doesn’t mean totally removed forever, but instead “missing by default”, where you must directly opt in to adding it back. It will be done on a per-SKU basis, so that enterprises are first likely to see it, since they are equipped better to understand it and less likely to need SMB1

Q. Is there a way to change block size in SMB3 ?

A. In SMB2_READ processing section 3.3.5.12 (https://msdn.microsoft.com/en-us/library/cc246729.aspx):

The server SHOULD<296> fail the request with STATUS_INVALID_PARAMETER if the Length field is greater than Connection.MaxReadSize.

If Connection.SupportsMultiCredit is TRUE the server MUST validate CreditCharge based on Length, as specified in section 3.3.5.2.5. If the validation fails, it MUST fail the read request with STATUS_INVALID_PARAMETER.

There is similar text for SMB2_WRITE in 3.3.5.13 (https://msdn.microsoft.com/en-us/library/cc246730.aspx).

Then, off to SMB2_NEGOTIATE in 3.3.5.4 (https://msdn.microsoft.com/en-us/library/cc246768.aspx) to discover:

MaxReadSize is set to the maximum size, in bytes, of the Length in an SMB2 READ Request (section 2.2.19) that the server will accept on the transport that established this connection. This value SHOULD<231> be greater than or equal to 65536. MaxReadSize MUST be set to MaxReadSize.
MaxWriteSize is set to the maximum size, in bytes, of the Length in an SMB2 WRITE Request (section 2.2.21) that the server will accept on the transport that established this connection. This value SHOULD<232> be greater than or equal to 65536. MaxWriteSize MUST be set to MaxWriteSize.

Windows version\Connection.Dialect	2.0.2	All other SMB2 dialects
Windows Vista SP1\Windows Server 2008	65536	N/A
Windows 7\Windows Server 2008 R2	65536	1048576
Windows 8 without [MSKB-2934016]\Windows Server 2012 without [MSKB-2934016]	65536	1048576
All other SMB2 servers	65536	8388608

<232> Section 3.3.5.4: If the underlying transport is NETBIOS over TCP, Windows servers set MaxWriteSize to 65536. Otherwise, MaxWriteSize is set based on the following table.

Windows version\Connection.Dialect	2.0.2	All other SMB2 dialects
Windows Vista SP1\Windows Server 2008	65536	N/A
Windows 7\Windows Server 2008 R2	65536	1048576
Windows 8 without [MSKB-2934016]\Windows Server 2012 without [MSKB-2934016]	65536	1048576
All other SMB2 servers	65536	8388608

Update: If you missed the live event, it’s now available on-demand. You can also download the webcast slides.

Buffers, Queues and Caches Explained

April 19, 2017July 18, 2017 John Kim Leave a comment

Finely tuning buffers, queues and caches can make your storage system hum. And that’s exactly what we discussed in our recent SNIA Ethernet Storage Forum webcast, “”Everything You Wanted to Know About Storage But Were Too Proud To Ask – Part Teal: The Buffering Pod.” If you missed it, it’s now available on-demand.

In this blog, you’ll find detailed answers from our panel of experts to all the great questions we received during the live event. I also encourage you to check out the other on-demand webcasts in this “Too Proud To Ask” series here and stay informed on upcoming events in this series by following us on Twitter @SNIAESF.

Q. Question on cache – What would be the right size of cache at each point (clients / Front-end connect / Storage controller / Back end connect / Physical storage).

A. Great question! The main consideration for cache sizing at any point is the workload. If the workload is conducive to cache benefits, then the more cache the merrier! However, when workload is not conducive to cache, adding more cache capacity won’t be beneficial. For example, if the workload is 100% sequential reads of small 4K IOs, having the data be pre-loaded into cache is going to be extremely helpful, and increasing the size of such cache at the end-point will be good. If the workload is random, and the IO size is changing, pre-fetching data into cache may be not a good idea. Similarly, with write cache, the benefit is realized two-fold: first, when the write is stored in cache and ack’ed back to the host (such write is typically called “dirty”, because it hasn’t been flashed back to the disk) and second, when the dirty write is overwritten by the host before it is flashed. Any other combination of workloads and IO will only get partial benefit from the cache. Sizing cache is a very difficult exercise and there are no universal answers. Every implementation has its own pluses and minuses.

Q. Isn’t a higher queue depth increasing latency as well, so applications would run slower as they are waiting longer for IO to complete?

A. The answer to this is very dependent on the environment. In general having more outstanding operations would increase the load on the interconnects and storage media which would result in the per-IO latency increasing. The alternative is having a small queue depth which may produce consistently lower per IOP latency at the expense of less throughput and IOPS. There are numerous techniques for dealing with mixed storage traffic, low-latency and high throughput, such as multi-queues, out-of-order completions, immediate and delayed data transfers in-line, ready to transfer, and policies. The NVM media latency roadmap is also helping with these types of latency vs. throughput decisions by enabling devices that achieve full-throughput at very low queue depths.

Q. Does SCSI protocol have a max queue depth of 32?

A. No, the SCSI Architecture Model allows for up to 64 bits for the command identifier field and each of the SCSI transports (iSCSI, SAS, …) defines a maximum within that range. There may be implementation-dependent SCSI endpoints that define smaller ranges.

Q. How would a distributed software defined storage technology deal with queue depth and how can this be advantageous or not advantageous?

A. Interesting question. Distributed software defined storage is by definition made up of multiple autonomous layers of software components orchestrated to provide stable storage. These types of systems will have many outstanding operations (queue depth) at multiple-stages and layers. It’s also not uncommon to see SDS file systems front-ended with block-based protocols, such as iSCSI, which enable the initiators to build up large queue depths of operations.

Q. Are queue depth and buffer the same?

A. No, queue refers to command and response queues, buffers refer to in-flight data buffers. Command and response queues often contain pointers to these buffers embedded in the read or write commands.

Q. Are caches and buffers made of the same silicon that makes up SSD disks? Which one is faster?

A. As a general idea, yes, SSDs, RAMs, caches, and buffers are all made from the silicon. If we dig a little deeper, device caches and buffers are typically made of high-speed static random access memory (SRAM), which is faster than the slower and cheaper dynamic RAM (DRAM), used for main memory. Modern SSDs are utilizing an even slower memory, which is commonly known as Flash memory, and we differentiate that type of storage by its structure: Single-Level Cell (SLC), Multi-Level Cell (MLC), etc. Although, there are some SSDs that are made out of DRAM, too. And then there are some newer technologies, like NVDIMM, 3D XPoint, etc. So, while the underlying physical material is still the same silicon, it’s the architecture that makes all the difference.

Q. In PFC.. If there are pending items in P1… can P2 or P3 etc. go ahead?

A. Yes. Priority Flow Control (PFC, also called Per Priority Pause, though rarely) is designed specifically to only pause traffic on one priority, allowing the remaining priority Classes of Service to work according to their configurations. So, for example, if PFC were to pause Priority Queue 1, and Priority Queue 3 also had a “no-drop” configuration but was not having any issues, PFC on Queue 1 would be triggered but PFC on Queue 3 would not. In reality, having more than one no-drop lane on a link is very, very rare, but it does illustrate that PFC operates on a per-priority basis, not on the whole link.

Q. Do all Ethernet based NVMe-oF (NVMe over Fabrics) implementations require some form of Data Center Bridging (DCB)? Or, are there versions of Ethernet based NVMe-oF (RoCE & iWARP) that run over standard Ethernet without needing DCB?

A. Yes, both iWARP and RoCE can be run without DCB. To maintain peak performance either DCB or other flow control mechanisms like ECN are recommended.

Q. Do server devices automatically honor the pause frame or does it require configuration?

A. I am assuming “server devices” refers to Ethernet ports on a server. It depends on the default settings of the NIC or LOM or those loaded by the driver during initialization. Generally speaking NIC devices that support PFC also support DCBX (Data Center Bridging Exchange). DCBX is a protocol that allows an end device, like a NIC, to get its proper configuration settings from the switch. That means that in an environment where PFC needs to be assigned to a specific Class of Service (CoS), the switch will send the NIC the proper settings during the setup configuration.

Q. Is it mandatory for all devices in network, host and storage to have same speed ports?

A. No.

Q. What are the theoretical devices for modeling and analyzing cache, buffer or queue behaviors?

A. Computers with software 🙂

Q. What if I have really large sized writes and they fill up the cache quickly? Is there a way to bypass the large sized writes?

A. The time of the presentation limited the amount of material we were able to share. One of the subjects we didn’t talk about was the cache software algorithm. Most storage vendors manage the cache by not letting extremely large IOs to be cached. Back in the spinning storage era, an IO of 2MB would typically be considered too large to be cached, and would be sent directly to disk.

Q. What will be the use of cache in all flash storage please? As flash is the highest performance disk.

A. See the answer to question above “Are caches and buffers made of the same silicon that makes up SSD disks? Which one is faster?” Hardware cache and buffers are typically made out of the fastest memory, then comes RAM and last are the SSDs aka flash disks. Therefore, storing data on a faster layer is still beneficial to the performance.

Q. Does the LUN Queue Depth includes the Queue Depth discussed here?

A. Yes, SCSI LUN queue depth enables the initiator(s) to have multi outstanding I/O operations in flight.

Q. Will you use a queuing algorithm to manage IO queue? If your answer is yes, which algorithm will you use?

A. There are several storage protocols that define mechanisms for a target to dynamically adjust the queue depth available to the initiator through various forms of credit exchanges. Having these types of mechanism enables the target to implement multi-initiator load balancing across targets.

Update: If you missed the live event, it’s now available on-demand. You can also download the webcast slides.

Storage Expert Takes on Hyperconverged Questions

April 17, 2017July 18, 2017 John Kim 1 Comment

Last month, we were fortunate enough to have Greg Schulz, analyst and founder of Server Storage IO, as a guest speaker at our SNIA Ethernet Storage Forum webcast, “What Does Hyperconverged Mean to Storage.” If you missed it, it’s now available on-demand. Greg fielded many great questions during the live event, but we didn’t have time to get to them all. So here they are:

Q. What is the difference between Converged Infrastructure (CI) and Hyperconverged Infrastructure (HCI)?

A. HCI is aggregated. You scale compute and storage in lock step. Converged is disaggregated. You can scale the compute independently of the storage. There are some software solutions that can support both hyper-converged (aggregated) and converged (disaggregated) deployments.

Q. What is your definition of “Little Data”?

A. Little Data is anything that’s not Big Data. It encompasses traditional databases, traditional structured, semi-structured and even some unstructured data.

Q. With convergence, what is the impact on the IT organization?

A. There is an opportunity for organizations to converge how they manage data infrastructure resources and services delivery. In other words, the technology can be leveraged to help an organization itself converge. Another impact is how converged solutions are protected, backed up, BC/BR/DR and related management done. Traditionally there are separate IT teams for compute, storage, and networking, especially in a large organization. New technology solutions may allow an organization to converge those teams.

Q. Is there a hybrid strategy? Where a complete information system is composed of HCI/CI building blocks? If yes, what management tools would span these components?

A. Sure, why not? Certainly you can converge your environment into a particular CI/HCI solution or approach, likewise, different CI/HCI solutions can co-exist along with other solutions in a given environment in hybrid ways. Have a hybrid strategy that looks at how technologies and solutions adapt to your needs and environment. Focus on how it’s going to work for you, vs. you having to work for them.

Q. What does FUZE stand for?

A. FUZE is not an acronym. It is the actual fuzing as in melding and bringing together things – literally fuzing thing together.

Q. Do HCI vendors re-balance (compute, I/O, storage) automatically as more nodes are added?

A. Solutions vary in how they rebalance the workloads. Some are dynamic while others rebalance on intervals; it varies how, when and what they rebalance. So, as you add capacity as you make changes, you need to make sure resources are properly allocated to address performance.

Q. Can’t you offload those CPU cycles caused by I/O to another CPU?

A. That’s an interesting question. Yes, move the application to another CPU. There is software that will leverage the resources on another CPU. Most HCI and CI solutions are running on a stack that requires hardware somewhere.

Q. This discussion has touched on compute and storage scaling. What about network between compute in the CI/HCI infrastructure and external to other compute, databases, or end-users?

A. Both CI and HCI need to connect to other resources, but in most cases the highest levels of network traffic are inside the CI or HCI stack because the compute and storage resources are contained within. Their connections to outside clients or servers data exchange, application integration, or client access is important but usually not very demanding on network bandwidth. (External connections for storage remote replication or backup could be bandwidth-intensive.)

Q. How can the current Enterprise Storage Products blend with either CI or HCI? Enterprise Storage is basically centralized storage architecture however the HCI is built mostly on ‘distributed storage architecture’. So how can current Enterprise Storage show use cases to the customer to sell their Enterprise Storage either as part of the HCI solution or exist along with HCI?

A. Generally enterprise storage products can be included in CI but are not blended with HCI. For example Dell EMC, Cisco (with NetApp and other storage vendors), IBM and Oracle offer CI solutions that include enterprise storage arrays in the rack. Most HCI platforms do not interoperate with enterprise storage arrays because the HCI platforms include their own storage. They can co-exist with enterprise storage arrays and that’s how most customers deploy themâ€”some workloads run on the HCI infrastructure while others continue to use enterprise storage arrays.

Q. One of the HCI selling points is simplicity and cost reductions from a la carte. It seems that from what is being presented, that may not be the case. Can you elaborate on where HCI may become more complex, costly?

A. It comes down to value. You can buy all the components yourself and glue them all together and may come up with a lower total cost, but what is the value of your time? What is the cost of staff time to evaluate, test, deploy and maintain. The total value must be considered. It’s possible that HCI will be more costly than a disaggregated deployment that separates compute and storage, but this depends heavily on the workload and specific vendor product solution implementation.

Q. Current HCI “full stack” solutions claim compute and storage convergence, but what about the network? Given the east/west traffic introduced by HCI solutions, what networking solutions should customers be looking at?

A. Most of the common HCI solutions are packaged with server, storage, compute and most have networking included as wellâ€”typically the network adapters and sometimes also the switches. Some even have a backend software defined networking (SDN) capability as part of their stack.

Q. Related to HCI answer, what about vendors who allow for storage growth and/or server (compute) and storage additions. This allows for aggregated and dis-aggregated…yes?

A. Most HCI vendors require compute and storage to be added simultaneously, though many support different nodes with different ratios of compute and storage. This allow customers to change the ratio of compute and storage by adding different node types. And yes, some HCI vendors also support both a hyper-converged and disaggregated model, with the disaggregated model allowing compute and storage to be added separately.

Q. What are the tools available to make HCI work in a hybrid load environment, with different workload requirements, e.g.: VDI and Databases?

A. There are tools for moving and migrating applications, workloads, systems and VMs into CI/HCI environments, likewise for tuning, optimizing, gaining insight, analytics and reporting. Most of the CI/HCI solutions have tools built into them for optimizing PACE (Performance, Availability, Capacity, Economics) attributes along with server compute, memory, storage, and I/O resources. Some CI/HCI solutions are optimized for VDI/workspaces, while others are able to support general workloads including databases, and some even support HPC/SC or other specialized workloads.

Q. Does network performance affect HCI or CI performance?

A. Sometimes. Most hybrid HCI nodes are happy with the bandwidth of 10GbE, but if the nodes are all-flash or have many disks, then a faster speed may be required to avoid a network bottleneck. Network latency could affect HCI or CI performance in some cases, especially with all-flash storage. Of course a reliable network helps ensure reliable CI/HCI operations.

Update: If you missed the live event, it’s now available on-demand. You can also download the webcast slides.

We’ve Been Thinking…What Does Hyperconverged Mean to Storage?

February 1, 2017July 18, 2017 John Kim Leave a comment

Here at the SNIA Ethernet Storage Forum (ESF), we’ve been discussing how hyperconverged adoption will impact storage. Converged Infrastructure (CI), Hyperconverged Infrastructure (HCI), along with Cluster or Cloud In a Box (CIB) are popular trend topics that have gained both industry and customer adoption. As part of data infrastructures, CI, HCI, and CIB enable simplified deployment of resources (servers, storage, I/O networking, hypervisor, application software) across different environments.

But what do these approaches mean for the storage environment? What are the key concerns and considerations related specifically to storage? How will the storage be connected to (or included in) the platform? Who will protect and backup the data? And most importantly, how do you know that you’re asking the right questions in order to get to the right answers?

Find out on March 15th in a live SNIA-ESF webcast, “What Does Hyperconverged Mean to Storage.” We’ve invited expert Greg Schulz, founder and analyst of Server StorageIO, to answer the questions we’ve been debating. Join us, as Greg will move beyond the hype (pun intended) to discuss:

What are the storage considerations for CI, CIB and HCI
Why fast applications and fast servers need fast I/O
Networking and server-storage I/O considerations
How to avoid aggravation-causing aggregation (bottlenecks)
Aggregated vs. disaggregated vs. hybrid converged
Planning, comparing, benchmarking and decision-making
Data protection, management and east-west I/O traffic
Application and server north-south I/O traffic

Register today and please bring your questions. We’ll be on-hand to answer them during this event. We hope to see you there!

Update: If you missed the live event, it’s now available on-demand. You can also download the webcast slides.