Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factors

NVMe over Fabrics technology is gaining momentum and getting more traction in data centers, but there are three kinds of Ethernet based NVMe over Fabrics transports: iWARP, RoCEv2 and TCP.

How do we optimize NVMe over Fabrics performance with different Ethernet transports? That will be the discussion topic at our SNIA Networking Storage Forum Webcast, “Optimizing NVMe over Fabrics Performance with Different Ethernet Transports: Host Factorson September 16, 2020.

Setting aside the considerations of network infrastructure, scalability, security requirements and complete solution stack, this webcast will explore the performance of different Ethernet-based transports for NVMe over Fabrics at the detailed benchmark level. We will show three key performance indicators: IOPs, Throughput, and Latency with different workloads including: Sequential Read/Write, Random Read/Write, 70%Read/30%Write, all with different data sizes. We will compare the result of three Ethernet based transports: iWARP, RoCEv2 and TCP.

Read More

What Are the Networking Requirements for HCI?

Hyperconverged infrastructures (also known as “HCI”) are designed to be easy to set up and  manage. All  you need to do is add networking. In practice, the “add networking” part has been more difficult than most anticipated. That’s why the SNIA Networking Storage Forum (NSF) hosted a live webcast “The Networking Requirements for Hyperconverged Infrastructure” where we covered what HCI is, storage characteristics of HCI, and important networking considerations. If you missed it, it’s available on-demand.

We had some interesting questions during the live webcast and as we promised during the live presentation, here are answers from our expert presenters: Read More

Experts Answer Virtualization and Storage Networking Questions

The SNIA Networking Storage Forum (NSF) kicked off the New Year with a live webcast “Virtualization and Storage Networking Best Practices.” We guessed it would be popular and boy, were we right! Nearly 1,000 people registered to attend. If you missed out, it’s available on-demand. You can also download a copy of the webcast slides.

Our experts, Jason Massae from VMware and Cody Hosterman from Pure Storage, did a great job sharing insights and lessons learned on best practices on configuration, troubleshooting and optimization. Here’s the blog we promised at the live event with answers to all the questions we received.

Q. What is a fan-in ratio?

A. fan-in ratio (sometimes also called an “oversubscription ratio”) refers to the number of hosts links related to the links to a storage device. Using a very simple example can help understand the principle:

Say you have a Fibre Channel network (the actual speed or protocol does not matter for our purposes here). You have 60 hosts, each with a 4GFC link, going through a series of switches, connected to a storage device, just like in the diagram below:

This is a 10:1 oversubscription ratio; the “fan-in” part refers to the number of host bandwidth in comparison to the target bandwidth.

Block storage protocols like Fibre Channel, FCoE, and iSCSI have much lower fan-in ratios than file storage protocols such as NFS, and deterministic storage protocols (like FC) have lower than non-deterministic (like iSCSI). The true arbiter of what the appropriate fan-in ratio is determined by the application. Highly transactional applications, such as databases, often require very low ratios.

Q. If there’s a mismatch in the MTU between server and switch will the highest MTU between the two get negotiated or else will a mismatch persist?

A. No, the lowest value will be used, but there’s a caveat to this. The switch and the network in the path(s) can have MTU set higher than the hosts, but the hosts cannot have a higher MTU than the network. For example, if your hosts are set to 1500 and all the network switches in the path are set to 9k, then the hosts will communicate over 1500.

However, what can, and usually does, happen is someone sets the host(s) or target(s) to 9k but never changes the rest of the network. When this happens, you end up with unreliable or even loss of connectivity. Take a look at the graphic below:

A large ball can’t fit through a hole smaller than itself. Consequently, a 9k frame cannot pass through a 1500 port. Unless you and your network admin both understand and use jumbo frames, there’s no reason to implement in your environment.

Q. Can you implement port binding when using two NICs for all traffic including iSCSI?

A. Yes you can use two NICs for all traffic including iSCSI, many organizations use this configuration. The key to this is making sure you have enough bandwidth to support  all  the traffic/ IO that will use those NICs. You should, at the very least, use 10Gb NICs faster if possible.

Remember, now all your management, VM and storage traffic are using the same network devices. If you don’t plan accordingly, everything can be impacted in your virtual environment. There are some hypervisors capable of granular network controls to manage which type of traffic uses which NIC, certain failover details and allow setting QoS limits on the different traffic types. Subsequently, you can ensure storage traffic gets the required bandwidth or priority in a dual NIC configuration.

Q. I’ve seen HBA drivers that by default set their queue depth to 128 but the target port only handles 512. So two HBAs would saturate one target port which is undesirable. Why don’t the HBA drivers ask what the depth should be at installation?

A. There are a couple of possible reasons for this. One is that many do not know what it even means, and are likely to make a poor decision (higher is better, right?!). So vendors tend to set these things at defaults and let people change them if needed—and usually that means they have purpose to change them. Furthermore, every storage array handles these things differently, and that can make it more difficult to size these things. It is usually better to provide consistency—having things set uniformly makes it easier to support and will give more consistent expectations even across storage platforms.

Second, many environments are large—which means people usually are not clicking and type through installation. Things are templatized, or sysprepped, or automated, etc. During or after the deployment their automation tools can configure things uniformly in accordance with their needs.

In short, it is like most things: give defaults to keep one-off installations simple (and decrease the risks from people who may not know exactly what they are doing), complete the installations without having to research a ton of settings that may not ultimately matter, and yet still provide experienced/advanced users, or automaters, ways to make changes.

Q. A number of white papers show the storage uplinks on different subnets. Is there a reason to have each link on its own subnet/VLAN or can they share a common segment?  

A. One reason is to reduce the number of logical paths. Especially in iSCSI, the number of paths can easily exceed supported limits if every port can talk to every target. Using multiple subnets or VLANs can drop this in half—and all you really use is logical redundancy, which doesn’t really matter. Also, if everything is in the same subnet or VLAN and someone make some kind of catastrophic change to that subnet or VLAN (or some device in it causes other issues), it is less likely to affect both subnets/VLANs. This gives some management “oops” protection. One change will bring all storage connectivity down.

 

 

 

Networking for Hyperconvergence

“Why can’t I add a 33rd node?”

One of the great advantages of Hyperconverged infrastructures (also known as “HCI”) is that, relatively speaking, they are extremely easy to set up and manage. In many ways, they’re the “Happy Meals” of infrastructure, because you have compute and storage in the same box. All you need to do is add networking.

In practice, though, many consumers of HCI have found that the “add networking” part isn’t quite as much of a no-brainer as they thought it would be. Because HCI hides a great deal of the “back end” communication, it’s possible to severely underestimate or misunderstand the requirements necessary to run a seamless environment. At some point, “just add more nodes” becomes a more difficult proposition.

That’s why the SNIA Networking Storage Forum (NSF) is hosting a live webcast “Networking Requirements for Hyperconvergence” on February 5, 2019. At this webcast, we’re going to take a look behind the scenes, peek behind the GUI, so to speak. We’ll be talking about what goes on back there, and shine the light behind the bezels to see:

  • The impact of metadata on the network
  • What happens as we add additional nodes
  • How to right-size the network for growth
  • Networking best practices to make your HCI work better
  • And more…

Now, not all HCI environments are created equal, so we’ll say in advance that your mileage will vary. However, understanding some basic concepts of how storage networking impacts HCI performance may be particularly useful when planning your HCI environment, or contemplating whether or not it is appropriate for your situation in the first place.

Register here to save your spot for February 5th. Our experts will be on hand to answer your questions.

This webcast is the second installment of our Storage Networking series. Our first was “Networking Requirements for Ethernet Scale-Out Storage.” It’s available on-demand as are all our educational webcasts. I encourage you to peruse the more than 60 vendor-neutral presentations is the NSF webcast library at your convenience.

Too Proud to Ask Webcast Series Continues – Getting from Here to There Pod

As part of the SNIA Ethernet Storage Forum’s successful “Everything You Wanted To Know About Storage But Were Too Proud To Ask” series, we’ve discussed numerous topics about storage devices, protocols, and networks. As we examine some of these topics further, we begin to tease out some subtle nuances; subtle, yet important nevertheless.

On May 9th we’ll take on the terms and concepts that affect Storage Architectures as a whole in “Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Sepia – Getting from Here to There.” In particular, we’ll be looking at those aspects that can help or hinder storage systems inside the network:

  • Encapsulation vs. Tunneling
  • IOPS vs. Latency vs. Jitter
  • Quality of Service (QoS)

Each of these topics has a profound impact on storage designs and performance, but they are often misunderstood. We’re going to help you become clear on all of these very important storage concepts so that you can grok storage just a little bit more.

We hope you will join us on May 9th at 10:00 am PT and that you won’t be “too proud” to ask our experts your questions! Register today.

Think there may be other storage topics you feel you should understand better? Check out the rest of the webcasts in this series here.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.