Ethernet RDMA Protocols Support for NVMe over Fabrics – Your Questions Answered

Our recent SNIA Ethernet Storage Forum Webcast on How Ethernet RDMA Protocols iWARP and RocE Support NVMe over Fabrics generated a lot of great questions. We didn’t have time to get to all of them during the live event, so as promised here are the answers. If you have additional questions, please comment on this blog and we’ll get back to you as soon as we can.

Q. Are there still actual (memory based) submission and completion queues, or are they just facades in front of the capsule transport?

A. On the host side, they’re “facades” as you call them. When running NVMe/F, host reads and writes do not actually use NVMe submission and completion queues. That data just comes from and to RNIC RDMA queues. On the target side, there could be real NVMe submissions and completion queues in play. But the more accurate answer is that it is “implementation dependent.”

Q. Who places the command from NVMe queue to host RDMA queue from software standpoint?

A. This is managed by the kernel host software in code written to the NVMe/F specification. The idea is that any existing application that thinks it is writing to the existing NVMe host software will in fact cause the SQE entry to be encapsulated and placed in an RDMA send queue.

Q. You say “most enterprise switches” support NVMe/F over RDMA, I guess those are ‘new’ ones, so what is the exact question to ask a vendor about support in an older switch?

A. For iWARP, any switch that can handle Internet traffic will do. Mellanox and Intel have different answers for RoCE / RoCEv2. Mellanox says that for RoCE, it is recommended, but not required, that the switch support Priority Flow Control (PFC). Most new enterprise switches support PFC, but you should check with your switch vendor to be sure. Intel believes RoCE was architected around DCB. The name itself, RoCE, stands for “RDMA over Converged Ethernet,” i.e., Ethernet with DCB. Intel believes RoCE in general will require PFC (or some future standard that delivers equivalent capabilities) for efficient RDMA over Ethernet.

Q. Can you comment on when one should use RoCEv2 vs. iWARP?

A. We gave a high-level overview of some of the deployment considerations on slide 30. We refer you to some of the vendor links on slide 32 for “non-vendor neutral” perspectives.

Q. If you take RDMA out of equation, what is the key advantage of NVMe/F over other protocols? Is it that they are transparent to any application?

A. NVMe/F allows the application to bypass the SCSI stack and uses native NVMe commands across a network. Most other block storage protocols require using the SCSI protocol layer, translating the NVMe commands into SCSI commands. With NVMe/F you also gain parallelism, simplicity of the command set, a separation between administrative sessions and data sessions, and a reduction of latency and processing required for NVMe I/O operations.

Q. Is ROCE v1 compatible with ROCE v2?

A. Yes. Adapters speaking RoCEv2 can also maintain RDMA connections with adapters speaking RoCEv1 because RoCEv2 ports are backwards interoperable with RoCEv1. Most of the currently shipping NICs supporting RoCE support both RoCEv1 and RoCEv2.

Q. Are RoCE and iWARP the only way to use Ethernet as a fabric for NMVe/F?

A. Initially yes; only iWARP and RoCE are supported for NVMe over Ethernet. But the NVM Express Working Group is also targeting FCoE. We should have probably been clearer about that, though it is noted on slide 11.

Q. What about doing NVMe over Fibre Channel? Is anyone looking at, or doing this?

A. Yes. This is not in scope for the first spec release, but the NVMe WG is collaborating with the FCIA on this. So NVMe over Fibre Channel is expected as another standard in the near future, to be promoted by T11.

Q. Do RoCE and iWARP both use just IP addresses for management or is there a higher level addressing mechanism, and management?

A. RoCEv2 uses the RoCE Connection Manager, and iWARP uses TCP connection management. They both use IP for addressing.

Q. Are there other fabrics to run NVMe over fabrics? Can you do this over OmniPath or Infiniband?

A. InfiniBand is in scope for the first spec release. Also, there is a related effort by the FCIA to support NVMe over Fibre Channel in a standard that will be promoted by T11.

Q. You indicated NVMe stack is in kernel while RDMA is a user level verb. How are NVMe SQ/ CQ entries transferred from NVMe to RDMA and vice versa? Also, could smaller transfers in NVMe (e.g. SGL of 512B) combined to larger sizes before being sent to RDMA entries and vice versa?

A. NVMe/F supports multiple scatter gather entries to combine multiple incontinuous transfers, nevertheless, the protocol doesn’t support chaining multiple NVMe commands on the same command capsule. A command capsule contains only a single NVMe command. Please also refer to slide 18 from the presentation.

Q. 1) How do implementers and adopters today test NVMe deployments? 2) Besides latency, what other key performance indicators do implements and adopters look for to determine whether the NVMe deployment is performing well or not?

A. 1) Like any other datacenter specification, testing is done by debugging, interop testing and plugfests. Local NVMe is well supported and can be tested by anyone. NVMe/F can be tested using pre-standard drivers or solutions from various vendors. UNH-IOH is an organization with an excellent reputation for helping here. 2) Latency, yes. But also sustained bandwidth, IOPS, and CPU utilization, i.e., the “usual suspects.”

Q. If RoCE CM supports ECN, why can’t it be used to implement a full solution without requiring PFC?

A. Explicit Congestion Notification (ECN) is an extension to TCP/IP defined by the IETF. First point is that it is a standard for congestion notification, not congestion management. Second point is that it operates at L3/L4. It does nothing to help make the L2 subnet “lossless.” Intel and Mellanox agree that generally speaking, all RDMA protocols perform better in a “lossless,” engineered fabric utilizing PFC (or some future standard that delivers equivalent capabilities). Mellanox believes PFC is recommended but not strictly required for RoCE, so RoCE can be deployed with PFC, ECN, or both. In contrast, Intel believes that for RoCE / RoCEv2 to deliver the “lossless” performance users expect from an RDMA fabric, PFC is in general required.

Q. How involved are Ethernet RDMA efforts with the SDN/OCP community? Is there a coming example of RoCE or iWarp on an SDN switch?

A. Good question, but neither RoCEv2 nor iWARP look any different to switch hardware than any other Ethernet packets. So they’d both work with any SDN switch. On the other hand, it should be possible to use SDN to provide special treatment with respect to say congestion management for RDMA packets. Regarding the Open Compute Project (OCP), there are various Ethernet NICs and switches available in OCP form factors.

Q. Is there a RoCE v3?

A. No. There is no RoCEv3.

Q. iWARP and RoCE both fall back to TCP/IP in the lowest communication sense? So they are somewhat compatible?

A. They can speak sockets to each other. In that sense they are compatible. However, for the usage model we’re considering here, NVMe/F, RDMA is required. Because of L3/L4 differences, RoCE and iWARP RNICs cannot speak RDMA to each other.

Q. So in case of RDMA (ROCE or iWARP), the NVMe controller’s fabric port is Ethernet?

A. Correct. But it must be RDMA-enabled Ethernet.

Q. What if I am using soft RoCE, do I still need an RNIC?

A. Functionally, soft RoCE or soft iWARP should work on a regular NIC. Whether the performance is sufficient to keep up with NVMe SSDs without the hardware offloads is a different matter.

Q. How would the NVMe controller know that a command is placed in the submission queue by the Fabric host driver? Is the fabric host driver responsible for notifying the NVMe controller through remote doorbell trigger or the Fabric target driver should trigger the doorbell?

A. No separate notification by the host required. The fabric’s host driver simply sends a command capsule to notify its companion subsystem driver that there is a new command to be processed. The way that the subsystem side notifies the backend NVMe drive is out of the scope of the protocol.

Q. I am chair of ETSI NFV working group on NFV acceleration. We are working on virtual RDMA and how VM can benefit from hardware independent RDMA. One corner stone of this is virtual-RDMA pseudo device. But there is not yet consensus on minimal set of verbs to be supported: Do you think this minimal verb set can be identified? Last, the transport address space is not consistent between IB, Ethernet. How supporting transport independent RDMA?

A. You know, the NVM Express Working Group is working on exactly these questions. They have to define a “minimal verb set” since NVMe/F generates the verbs. Similarly, I’d suggest looking to the spec to see how they resolve the transport address space differences.

Q. What’s the plan for Linux submission of NVMe over Fabric changes? What releases are being targeted?

A. The Linux Driver WG in the NVMe WG expects to submit code upstream within a quarter of the spec being finalized. At this time it looks like the most likely Linux target will be kernel 4.6, but it could end up being kernel 4.7.

Q. Are NVMe SQ/CQ transferred transparently to RDMA Queues or can they be modified?

A. The method defined in the NVMe/F specification entails a transparent transfer. If you wanted to modify an SQE or CQE, do so before initiating an NVMe/F operation.

Q. How common are rNICs for recent servers? i.e. What’s a quick check I can perform to find out if my NIC is an rNIC?

A. rNICs are offered by nearly all major server vendors. The best way to check is to ask your server or NIC vendor if your NIC supports iWARP or RoCE.

Q. This is most likely out of the scope of this talk but could you perhaps share about 30K level on the differences between “NVMe controller” hardware versus “NVMeF” hardware. It’s most likely a combination of R-NIC+NVMe controller, but would be great to get your take on this.

A goal of the NVMe/F spec is that it work with all existing NVMe controllers and all existing RoCE and iWARP RNICs.   So on even a very low level, we can say “no difference.”   That said, of course, nothing stops someone from combining NVMe controller and rNIC hardware into one solution.

Q. Are there any example Linux targets in the distros that exercise RDMA verbs? An iWARP or iSER target in a distro?

A. iSER allows this using a LIO or TGT SCSI target.

Q. Is there a standard or IP for RDMA NIC?

A. The various RNICs are based on IBTA, IETF, and IEEE standards are shown on slide 26.

Q. What is the typical additional latency introduced comparing NVMe over Fabric vs. local NVMe?

A. In the 2014 IDF demo, the prototype NVMe/F stack matched the bandwidth of local NVMe with a latency penalty of only 8 µs over a local iWARP connection. Other demonstrations have shown an added fabric latency of 3 µs to 15 µs. The goal for the final spec is under 10 µs.

Q. How well is NVME over RDMA supported for Windows ?

A. It is not currently supported, but then the spec isn’t even finished. Contract Microsoft if you are interested in their plans.

Q. RDMA over Ethernet would not support Layer 2 switching? How do you deal with TCP over head?

A. L2 switching is supported by both iWARP and RoCE. Both flavors of RNICs have MAC addresses, etc. iWARP had to deal with TCP/IP in hardware, a TCP/IP Offload Engine or TOE. The TOE used in an iWARP RNIC is significantly constrained compared to a general purpose TOE and therefore can operate with very high performance. See the Chelsio website for proof points. RoCE does not use TCP so does not need to deal with TCP overhead.

Q. Does RDMA not work with fibre channel?

A. They are totally different Transports (L4) and Networks (L3). That said, the FCIA is working with NVMe, Inc. on supporting NVMe over Fibre Channel in a standard to be promoted by T11.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Upcoming Plugfests at SDC

This year’s SNIA Storage Developer Conference (SDC) will take place in Santa Clara, CA Sept. 15-18.   In addition to an exciting agenda with great speakers, there is an opportunity for vendors to participate in SNIA Plugfests. Two Plugfests that I think are worth noting are: SMB2/SMB3 and iSCSI.

These Plugfests enable a vendor to bring their implementations of SMB2/SMB3 and/or iSCSI to test, identify, and fix bugs in a collaborative setting with the goal of providing a forum in which companies can develop interoperable products. SNIA provides and supports networks and infrastructure for the Plugfest, creating a collaborative framework for testing. Plugfest participants work together to define the testing process, assuring that objectives are accomplished.

Still Time to Register

Great news! There is still time to register. Setup for the Plugfest begins on September 13, 2014 and testing begins on the September 14th.

Register here for the SMB2/SMB3 Plugfest

Register here for the iSCSI Plugfest

What to Expect at a Plugfest

Learn more about what takes place at the Plugfests by watching the  video interview of Jeremy Allison, Co-Creator of Samba, as he candidly talks about what to expect at an SDC Plugfest.

Learn more about the Plugfest registration process. If you have additional questions, please contact Arnold Jones (arnold@snia.org).

 

Use Cases for iSCSI and FCoE – Your Questions Answered

We had a tremendous response to our recent Webcast “Use Cases for iSCSI and FCoE – Where Each Makes Sense.” We had a lot of questions that we didn’t have time to address, so here are answers to them all. If you think of additional questions, please feel free to comment on this blog.

Q. You stated that FCoE requires End to End DCB connectivity.   That is not entirely true if you have native Fibre Channel storage.  

Once native FC is added, it is a hybrid FCoE/native FC network, not a simple FCoE network.   To be clearer I could’ve stated that for FCoE all Ethernet links traversed must be DCB enabled.

Q. Any impact on the protocol choice if you bring SDN solutions with overlay networks using VXLAN or NVGRE within virtual switching in hypervisors into the picture?

An excellent question, but complicated enough that it probably deserves a discussion on its own.   Overlay networks encapsulate Ethernet frames into routable packets.   On a view of strict adherence to ISO ordering, that means L2 constructs like Data Center Bridging become “invisible” until decap.   You lose the “lossless,” low-latency that FCoE expects and iSCSI may be taking advantage of, depending on your implementation.   That doesn’t really favor one protocol over the other, but FCoE may lose advantages it has over iSCSI when confined to a single L2 subnet.   But, unfortunately, the real answer to your question requires that you investigate in detail how the system software you are using handles encapsulated storage packets for both block storage protocols.   Microsoft’s Hyper-V is different from VMware’s vSphere, and each flavor of SDN could be different as well.   Proceed with caution.

Q. Have you heard of any enterprise customers who are interested in NIC Partitioning to separate iSCSI, FCoE, and typical network traffic?   If so, can you provide information about those customers’ use cases?

We have not come across many customers that are interested in large-scale deployments yet.

Q. What are the use cases for using standalone FCoE switches in SAN keeping aside Cisco UCS and Blade Servers?

There are two ways to look at this:

1) To use FCoE as an end-to-end (Initiating server to target storage array) solution instead of, or to replace, Fibre Channel. Although, not very prevalent to date, the reason this option is  chosen is to create a single converged LAN/SAN network that essentially retains the native FC constructs. The potential benefit would be in reduction in the amount of equipment required and the resources needed to deploy and administer two separate networks. This can be done in a phased approach, that uses multiprotocol switches, able to be used as Ethernet, FC or both on every port.   This will provide future proofing, reduced qualification costs, and lower OPEX by no longer requiring the purchase of multiple switches of different protocols.

2) To continue the use of FC for connectivity from the Top of Rack switch to the storage arrays, but use FCoE connectivity for server access. This is much more prevalent, and even when deployed outside of the Cisco UCS blade servers, is used to increase flexibility in highly virtualized server environments or multi-tenancy, where workloads/VMs from the same physical servers need to connect to different storage types.

Q. How do iSCSI and FCoE switches handle redundancy?   With FC, it is a best practice to implement dual fabrics with each storage system and server with paths down each.

Physical topology can be identical.   A storage system has one set of targets (either IP addresses or FCoE targets) on one switch and other targets on the other switch.   The initiators are configured to see any targets available on that leg.

To prevent Ethernet broadcast storms, technologies like per VLAN Spanning Tree and link aggregation are used.   TRILL can also be used.   For more details, I recommend reading this blog post by J Metz of Cisco.   http://blogs.cisco.com/datacenter/understanding-fcoe-and-trill-the-easy-way/

Q. Doesn’t increasing CPU mean software processing for FCoE and iSCSI at both endpoints can reduce costs considerably (i.e. no full HBA functionality needed at the endpoints)?

Absolutely.   If you have CPU cycles to spare at both endpoints, there is no reason to take on the extra cost of offload.   However, remember the principle behind Moore’s law also works on things like network adapters and HBAs.   It isn’t unreasonable to think that full offload capabilities will be included by default in a few years as technology progresses.   And even if they aren’t, the actual application of Moore’s law will push the difference in CPU utilization to be trivial.

Q. How do large data centers configure and manage iSCSI?   Is it by configuring the initiators and targets? My understanding is that most installations don’t use iSNS.   Is this true?

It is true that most implementations of iSCSI don’t use iSNS.   iSCSI initiators are simply configured with the target address by the administrator.   In the FC world, SNS is simply there, but the iSCSI equivalent, iSNS, has always been optional.   (SNS stands for Simple Name Service.   It is a service that helps initiators find targets.)

Q. I have been doing a lot of testing to compare iSCSI to FC and noticed that as we move from traditional storage to SSD-based storage the IOPS increase faster for FCoE. For example, 18K+ for FCoE vs. 12K for iSCSI. Have you seen similar results?

I have seen some similar results. However, I’ve also seen some that don’t necessarily line up with that.   I haven’t had the time to research this topic.   Sounds like a good topic for a future post.

Q. Do you have any information about the number of customers who use FCoE Boot and iSCSI Boot?

Unfortunately I don’t.   I do have anecdotal evidence to support customers using full-offload are more likely to boot from SAN.   Since more full-offload FCoE adapters are in use that full-offload iSCSI adapters today, it makes sense that more are booting over FCoE than iSCSI, but again, I don’t have any evidence to support that.

Q. What about iSCSI over RoCE?

There are three network/fabric technologies that use RDMA: InfiniBand, iWARP, and RoCE.  You can run iSCSI over any of these using the open-source iSER code supported by the Open Fabrics Alliance (https://www.openfabrics.org ).  iSER has been written to OFA’s “verbs” for RDMA (rather than to the more familiar “sockets).   However, note that of these three underlying transports, only iWARP is truly routable in general.   So technically you could implement iSER on InfiniBand or RoCE but it may not do for you what you expect iSCSI to do for you, i.e., go anywhere the internet goes.

Q. How does FCIP compare with iSCSI for long distance requirements?

FC networks rely on guaranteed packet delivery to deliver low latency, predictable performance. IP networks are a best effort network allowing for dropped packets with transmission retries. Given the possibility of latency loss, FCIP has experienced limited adoption. Useful where required. But, typically not a core part of infrastructure. If cost is a concern and long distance is required as part of the solution, then iSCSI is the better choice as it designed to allow for lossy networks.  

Q. Slide 22 – Was that hardware based iSCSI or software based iSCSI?

What was shown in the chart was software-based iSCSI, however you would see similar results with hardware-based iSCSI.

Q. What about FC vs FCoE performance? Any numbers?

Both Fibre Channel and FCoE can achieve line rate.   Here’s an example of testing Yahoo! did on an 8Gb FC HBA and a 10 GbE CNA that showed exactly that result: http://www.intel.com/content/www/us/en/network-adapters/10-gigabit-network-adapters/10-gbe-ethernet-yahoo-case-study.html .   So as Fibre Channel moves to 16 Gbps, it will outperform a 10GbE CNA, at least for peak performance.   However, the tables turn with a 40 GbE CNA, several of which are in production now.

Q. Do you see SR-IOV used currently or in the future to separate FCoE or iSCSI from standard LAN traffic?

So far we have seen that with the exception of a few operating systems (e.g., AIX), SR-IOV support today is network only.   Additionally, most customers want guaranteed bandwidth for storage and they wouldn’t be willing to run it on the same port as heavy NIC traffic.

Q. Are you aware of any FCoE targets for Windows?

I’m not aware of any right now.

Q. What is the max IOPS (at 4K) you can push thru 10G FCoE and iSCSI? Max latency (at 512 bytes)?

Latency is not determined by the pipe.

Q. Does FCoE really require a CNA? What about software only FCoE drivers?

Open FCoE does exist, but most FCoE implementations today use CNAs.   I do expect the adoption of FCoE software solutions to increase fairly substantially.   A lot of it comes down to the choice of booting via FCoE or another method.

Q. Do you think that the difference in FCoE/iSCSI usage for different App tiers can be related to the performance of the protocols?

Objectively, no.   Either protocol implemented can be configured to hit or exceed a performance number.   In my opinion, market perception of the protocols has more to do with the tier assignment than anything technical.

Q. Doesn’t 32 GbFC make it competitive with 40GbE FCoE?

From a purely technical perspective it helps, but FCoE is often deployed to reduce costs by simplifying cabling and switching by converging IP and storage onto the same fabric.   32Gb FC is slower than 40Gb and does nothing to reduce costs.   Unless 32Gb FC is significantly less expensive than 40 Gb Ethernet on a per port basis, market forces are going to push towards Ethernet.   There are still plenty of cases where organizations may deploy 32Gb FC instead of FCoE, but again, those criteria will mostly be non-technical.

Thanks to all my SNIA-ESF colleagues and Dell’Oro Group for helping me with these answers. If you missed the original Webcast, you can watch it on-demand here. You can also download a copy of the slides.

Why the FCoE – iSCSI Debate Continues

Why the FCoE – iSCSI Debate Continues

This is my first blog post for SNIA-ESF.  As a Principal Storage Architect, I have been doing extensive research on the factors that are driving FCoE vs. iSCSI choices over the last several years. The more I dive into the topic, the more intriguing the debate becomes. In fact, this blog is a preview of an upcoming white paper I’m writing and a Webcast SNIA is hosting on February 18th. If you agree this debate is interesting, I encourage you to attend. Details on the Webcast are at the end of this post.

A Look Back at FCoE and iSCSI History

There are two entrenched standards for block storage protocols over Ethernet networks.   FCoE was ratified in 2009, while iSCSI was ratified in 2004.   Of course, various vendors and early adopters supported these protocols before ratification, so the history of these protocols is a couple of years longer than it looks, respectively.   While iSCSI simply encapsulates the SCSI protocol in IP, FCoE operates lower in the network stack and to do so required many enhancements to Ethernet.   While iSCSI runs on any IP network (mostly Ethernet these days), FCoE requires Data Center Bridging and Converged Network Adapters all running at 10 Gbps or faster.

All of the Data Center Bridging enhancements that make FCoE possible, like lossless Ethernet, benefit all of the protocols using Ethernet as the transport protocol.   DCB doesn’t just make FCoE possible, but it improves iSCSI at the same time   (see the SNIA-ESF blog, How DCB Makes iSCSI Better). So given that modern servers, networks, and storage may all be connected by hardware capable of running FCoE, that same network is also able to run iSCSI, as well as other network traffic.   Nothing precludes them from running simultaneously on the same network either.   The leading storage vendors that offer both FCoE and iSCSI target systems allow administrators to present the same LUN over either protocol with little effort, so a transition from one protocol to the other is not difficult.

Strengths and Weaknesses

So which network protocol is the right choice?

Each protocol has strengths and weaknesses when judged relative to each other.   FCoE has higher throughput at lower host CPU utilization than iSCSI and FCoE doesn’t have to process the TCP/IP stack as iSCSI does. iSCSI is relatively simple to setup and troubleshoot when compared to FCoE because zoning is not a factor and IP connectivity (although not optimized for storage traffic) is likely in place already.  Also, while FCoE has a comprehensive set of existing tools available to ease troubleshooting, there aren’t as many qualified people to use them in most enterprises.   Ease of use, plus the ability to use low cost NICs and switches, gives iSCSI a cost advantage.   (However, if you check out our SNIA-ESF webcast, “How VN2VN Will Help Accelerate Adoption of FCoE,” you’ll hear about new technologies that reduce the costs of deploying FCoE.) FC, and by extension FCoE, are perceived to be enterprise-grade, suitable for all workloads; and while iSCSI is being widely adopted at the enterprise level, it is still perceived by some not to be ready for Tier-1 applications.   The graph below is excerpted from the report “Intel 10GbE Adapter Performance Evaluation” prepared by Demartek for Intel in September 2010.   This data is consistent with the rest of the report findings and is only intended to be representative of the results from comparative iSCSI and FCoE testing.   The report is interesting reading and I recommend looking at it for more information. This graph shows IOPS and CPU utilization for JetStress tests running against NetApp storage over multi-path iSCSI and FCoE.   Note that latencies were all similar and running the tests against EMC storage showed similar results.

FCoE-iSCSI_Data

Many other factors must be considered, but according to industry pundits- as well as my own personal experience – in the majority of cases either protocol is adequate for the task at hand, and that is to effectively transfer block data across an Ethernet network.

Maximizing Throughput

The reality is, most servers, applications, and storage arrays simply won’t take advantage of FCoE’s superior performance or any storage protocol running over 10GbE.   iSCSI and NAS protocols are very fast and are typically sufficient to meet most application requirements.   But this is not meant to be a SAN vs NAS post – besides years of history, thousands of happy end users, and billions of continued investment show that both work well enough to meet most business needs.   The commonly deployed storage systems and hosts are simply not configured with enough hardware to saturate multiple 10 gigabit network links.   While this is rare today, it is going to become more common to see systems capable of saturating 10GbE pipes in the near future, especially as flash memory, either in all-flash arrays or tiered storage systems, find more application.   (Hear more on the impact of flash in our SNIA-ESF webcast, “Flash – Plan for the Disruption“). At least as it relates to spinning media disk systems – network bandwidth increases faster than storage system throughput can keep up.   So consider the storage system to be the bottleneck or limiting factor when evaluating storage network performance.   After all, in most data center environments, the ratio of servers and applications to storage systems is high. So, it’s reasonable to expect the storage system to be the bottleneck.   The absolute throughput of FCoE and iSCSI, when pushing a storage system to its limits, is not sufficient alone to be used as the sole basis for the decision between the two protocols except, for a few edge cases.   Bottom line: Whether the storage system is the bottleneck or the network is the bottleneck the performance relationship between FCoE and iSCSI does not change.

These edge cases tend to be extremely IO intensive database workloads and big data applications, such as Hadoop.   Citing the graph above, FCoE is about 15-20% faster on identical hardware than iSCSI.   Granted this is a single graph of a single test, but the data is consistent across tests performed by IBM using Emulex network interfaces.   If absolute throughput and efficiency (both network and CPU) are the only criteria when deciding between block protocols, FCoE looks like the choice.   Since these cases are rare – because complexity, supportability, and even politics are almost always considered – the decision is not so obvious.   Again, beyond the scope of this article, NAS protocols should be considered when determining the proper protocol for an application also.

Is There a Clear Winner?

While FCoE can claim technical superiority, iSCSI has the edge in cost and supportability.   The number and range of systems supporting iSCSI connectivity is greater, particularly at the entry level.   What’s more, the availability of people that can troubleshoot end-to-end connectivity for iSCSI is also much greater.   (The “ping” command diagnoses most iSCSI connectivity problems.)   Also, do a resume search on Monster or LinkedIn and the number of people that can configure VLANs dwarfs the number that can properly zone a Fibre Channel network.   Greater familiarity reduces the support and operating cost of iSCSI.

IDC predicts that FCoE revenue will ramp very quickly through 2016. (If available to you, see the IDC Worldwide Enterprise Storage Systems 2012-2016 Forecast Update.)   As customers decide to transition existing Fibre Channel networks to an Ethernet infrastructure, deploying FCoE would be a comfortable choice due to existing IT expertise and functional expectations of the Fibre Channel protocol.

Both iSCSI and FCoE are capable storage protocols and choosing one over the other will likely be dependent upon budget, IT skill set, and application requirements

Don’t forget to join us on Feb. 18th

Again, I encourage you to attend our February 18th Webcast, “Use Cases for iSCSI and FCoE –Where Each Makes Sense.”   Analysts from Dell’Oro Group will share their latest market research on this topic and I’ll dive into use cases for both iSCSI and FCoE. It’s a live event, so please come with your toughest questions. I hope you’ll join us!

Update:

 

2013 in Review and the Outlook for 2014 – A SNIA ESF Perspective

Technology continues to advance rapidly. Making sense of it all can be a challenge. At the SNIA Ethernet Storage Forum, we focus on storage technologies and solutions enabled by and associated with Ethernet Networks. Last year, we modified the charters of our two Special Interest Groups (SIG) to address topics about file protocols and storage over Ethernet. The File Protocols SIG includes the prior focus on Network File System (NFS) related topics and adds discussions around Server Message Block (SMB / CIFS). We had our first webcast last November on the topic of SMB 3.0 and it was our best attended webcast ever. The Storage over Ethernet SIG focuses on general Ethernet storage topics as well as more information about technologies like FCoE, iSCSI, Data Center Bridging, and virtual networking for storage. I encourage you to check out other articles on these hot topics in this SNIAESF blog to hear from our member experts as well as guest posts from leading analysts.

2013 was a busy year and we are already kickin’ it in 2014. This should be an exciting year in IT. Data storage continues to be a hot sector especially in the areas of All-Flash and Hybrid arrays. This year, we will expect to see new standards coming out of the T11 committee for Fibre Channel and possibly FCoE as well as progress in high speed Ethernet networks. Lower cost network interconnects will facilitate adoption of high speed networks in the small to midsize business segment. And a new conversation around “Software Defined…” should push a lot of ink in trade rags and other news sources. Oh, and don’t forget about the “Internet of Things”, mobile solutions, and all things Cloud.

The ESF will be addressing the impact on Ethernet storage solutions from these hot technologies. Next month, on February 18th, experts from the ESF, along with industry analysts from Dell’Oro Group will speak to the benefits and best practices of deploying FCoE and iSCSI storage protocols. This presentation “Use Cases for iSCSI and Fibre Channel: Where Each Makes Sense” will be part of an upcoming BrightTalk Summit on Storage Networking. I encourage you to register for this session. Additionally, we will be publishing a couple of white papers on file-based storage and a review of FCoE and iSCSI in storage applications.

Finally, SNIA will be kicking off its first year of the new user conference, Data Storage Innovation Conference. This will be one of the few storage focused user conferences in the market and should be quite interesting.

We’re excited about our growing membership and our plans for 2014. Our goal is to advance  application of innovative technologies and we encourage you to send us mail or comment below with topics that are of interest to you.

Here’s to an exciting 2014!

Ethernet Storage Forum – 2012 Year in Review and What to Expect in 2013

As we come to a close of the year 2012, I want to share some of our successes and briefly highlight some new changes for 2013. Calendar year 2012 has been eventful and the SNIA-ESF has been busy. Here are some of our accomplishments:

  • 10GbE – With virtualization and network convergence, as well as the general availability of LOM and 10GBASE-T cabling, we saw this is a “breakout year” for 10GbE. In July, we published a comprehensive white paper titled “10GbE Comes of Age.” We then followed up with a Webcast “10GbE – Key Trends, Predictions and Drivers.” We ran this live once in the U.S. and once in the U.K. and combined, the Webcast has been viewed by over 400 people!
  • NFS – has also been a hot topic. In June we published a white paper “An Overview of NFSv4″ highlighting the many improved features NFSv4 has over NFSv3. A Webcast to help users upgrade, “NFSv4 – Plan for a Smooth Migration,” has also been well received with over 150 viewers to date.   A 4-part Webcast series on NFS is now planned. We kicked the series off last month with “Reasons to Start Working with NFSv4 Now” and will continue on this topic during the early part of 2013. Our next NFS Webcast will be “Advances in NFS – NFSv4.1 and pNFS.” You can register for that here.
  • Flash – The availability of solid state devices based on NAND flash is changing the performance efficiencies of storage. Our September Webcast “Flash – Plan for the Disruption” discusses how Flash is driving the need for 10GbE and has already been viewed by more than 150 people.

We have also added to expand membership and welcome new membership from Tonian and LSI to the ESF. We expect with this new charter to see an increase in membership participation as we drive incremental value and establish ourselves as a leadership voice for Ethernet Storage.

As we move into 2013, we expect two hot trends to continue – the broader use of file protocols in datacenter applications, and the continued push toward datacenter consolidation with the use of Ethernet as a storage network. In order to better address these two trends, we have modified our charter for 2013. Our NFS SIG will be renamed the File Protocol SIG and will focus on promoting not only NFS, but also SMB / CIFS solutions and protocols. The iSCSI SIG will be renamed to the Storage over Ethernet SIG and will focus on promoting data center convergence topics with Ethernet networks, including the use of block and file protocols, such as NFS, SMB, FCoE, and iSCSI, over the same wire. This modified charter will allow us to have a richer conversation around storage trends relevant to your IT environment.

So, here is to a successful 2012, and excitement for the coming year.

Will Ethernet storage move to 10GBASE-T?

10GBASE-T is a technology that runs 10Gb Ethernet over familiar Category 6/6a cables for distances up to 100m and is terminated by the ubiquitous RJ-45 jack. Till now, most datacenter copper cabling has been special Direct Attach cables for distances up to 7m terminated by an SFP+ connector. To work, data center switches need matching SFP+ connectors, meaning new switches are required for any data center making the move from 1GbE to 10GbE. 10GBASE-T is generating a lot of interest in 2012 as the first single-chip implementations at lower power (fanless) and lower cost (competitive with Direct Attach NICs) come to market. A data center manager now has an evolutionary way to incorporate 10GbE that exploits the cabling and switches already in place. The cost savings from preserving existing cabling alone can be tremendous.

But is 10GBASE-T up to the task of carrying storage traffic? The bit-error rate technical tests of 10GBASE-T look promising. 10GBASE-T is meeting the 10-12 BER requirements of all the relevant Ethernet and storage specifications. We expect NAS and iSCSI to move rapidly to take advantage of the deployment cost savings offered by 10GBASE-T. Admins responsible for NAS and iSCSI storage over Ethernet should find 10GBASE-T meets their reliability expectations.

But what about Fibre Channel over Ethernet (FCoE)? Note that storage admins responsible for FC and/or FCoE are among the most risk-adverse people on the planet. They especially need to be confident that any new technology, no matter how compelling its benefits, doesn’t appreciably increase the risk of data loss. For this reason, they are adopting FCoE very slowly, though the economics make FCoE very compelling. So a broad market transition to FCoE over 10GBASE-T is likely to take some time regardless.

Cisco announced in June 2012 a new 5000-series Nexus switch supporting up to 68 ports of “FCoE-ready” 10GBASE-T. Cisco has made the investment to support storage protocols, including FCoE, over 10GBASE-T in this switch and is committed to working with the industry to do the testing to prove its robustness. In fact, some eager end-users are getting ahead of this testing, and, based on results from their own stress tests, moving now to storage over 10GBASE-T deployments, including FCoE.

Every major speed and capabilities transition for Ethernet has engendered skeptics. The transition to running storage protocols over 10GBASE-T is no different. General consensus is that the “jury is out” for FCoE over 10GBASE-T. The interoperability and stress testing to prove reliability isn’t complete. And storage admins will generally want to see reports from multiple deployments before they move. But the long-term prognosis for storage – NAS, iSCSI, and FCoE — over 10GBASE-T is looking very encouraging.

Deploying SQL Server with iSCSI – Answers to your questions

by: Gary Gumanow

Last Wednesday (2/24/11), I hosted an Ethernet Storage Forum iSCSI SIG webinar with representatives from Emulex and NetApp to discuss the benefits of iSCSI storage networks in SQL application environments. You can catch a recording of the webcast on BrightTalk here.

The webinar was well attended, and while we received so many great questions during the webinar we just didn’t have time to answer all of them. Which brings us to this blogpost. We have included answers to these unanswered questions in our blog below.
We’ll be hosting another webinar real soon, so please check back for upcoming ESF iSCSI SIG topics. You’ll be able to register for this event shortly on BrightTalk.com.

Let’s get to the questions. We took the liberty of editing the questions for clarity. Please feel free to comment if we misinterpreted the question.

Question: Is TRILL needed in the data center to avoid pausing of traffic while extending the number of links that can be used?

Answer: The Internet Engineering Task Force (IETF) has developed a new shortest path frame Layer 2 (L2) routing protocol for multi-hop environments. The new protocol is called Transparent Interconnection of Lots of Links, or TRILL. TRILL will enable multipathing for L2 networks and remove the restrictions placed on data center environments by STP single-path networks.

Although TRILL may serve as an alternative to STP, it doesn’t require that STP be removed from an Ethernet infrastructure. Hybrid solutions that use both STP and TRILL are not only possible but also will be the norm for at least the near-term future. TRILL will also not automatically eliminate the risk of a single point of failure, especially in hybrid environments.

Another area where TRILL is not expected to play a role is the routing of traffic across L3 routers. TRILL is expected to operate within a single subnet. While the IETF draft standard document mentions the potential for tunneling data, it is unlikely that TRILL will evolve in a way that will expand its role to cover cross-L3 router traffic. Existing and well-established protocols such as Multiprotocol Label Switching (MPLS) and Virtual Private LAN Service (VPLS) cover these areas and are expected to continue to do so.

In summary, TRILL will help multipathing for L2 networks.

Question: How do you calculate bandwidth when you only have IOPS?
Answer:
The mathematical formula to calculate bandwidth is a function of IOPS and I/O size. The formula is simply IOP x I/O size. Example: 10,000 IOPS x 4k block size (4096 bytes) = 40.9 MB/sec.

Question: When deploying FCoE, must all 10GbE switches support Data Center Bridging (DCB) and FCoE? Or can some pass through FCoE?
Answer:
Today, in order to deploy FCoE, all switches in the data path must support both FCoE forwarding and DCB. Future standards include proposals to allow pass through of FCoE commands without having to support Fibre Channel services. This will allow for more cost effective networks where not all switch layers are needed to support the FCoE storage protocol.
Question: iSCSI performance is comparable to FC and FCoE. Do you expect to see iSCSI overtake FC in the near future?
Answer:
FCoE deployments are still very small compared to traditional Fibre Channel and iSCSI. However, industry projections by several analyst firms indicate that Ethernet storage protocols, such as iSCSI and FCoE, will overtake traditional Fibre Channel due to increased focus on shared data center infrastructures to address applications, such as private and public clouds. But, even the most aggressive forecasts don’t show this cross over for several years from now.
Customers looking to deploy new data centers are more likely today to consider iSCSI than in the past. Customers with existing Fibre Channel investments are likely to transition to FCoE in order to extend the investment of their existing FC storage assets. In either case, transitioning to 10Gb Ethernet with DCB capability offers the flexibility to do both.

Question: With 16Gb/s FC ratified, what product considerations would be considered by disk manufacturers?
Answer:
We can’t speak to what disk manufacturers will or won’t do regarding 16Gb/s disks. But, the current trend is to move away from Fibre Channel disk drives in favor of Serial Attached SCSI (SAS) and SATA disks as well as SSDs. 16Gb Fibre Channel will be a reality and will play in the data center. But, the prediction of some vendors is that the adoption rate will be much slower than previous generations.
Question: Why move to 10GbE if you have 8Gb Fibre Channel? The price is about the same, right?
Answer:
If your only network requirement is block storage, then Fibre Channel provides a high performance network to address that requirement. However, if you have a mixture of networking needs, such as NAS, block storage, and LAN, then moving to 10GbE provides sufficient bandwidth and flexibility to support multiple traffic types with fewer resources and with lower overall cost.
Question: Is the representation of number of links accurate when comparing Ethernet to Fibre Channel. Your overall bandwidth of the wire may be close, but when including protocol overheads, the real bandwidth isn’t an accurate comparison. Example: FC protocol overhead is only 5% vs TCP at 25%. iSCSI framing adds another 4%. So your math on how many FC cables equal 10 Gbps cables is not a fair comparison.

Answer: As pointed out in the question, comparing protocol performance requires more than just a comparison of wire rates of the physical transports. Based upon protocol efficiency, one could conclude that the comparison between FC and TCP/IP is unfair as designed because Fibre Channel should have produced greater data throughput from a comparable wire rate. However, the data in this case shows that iSCSI offers comparable performance in a real world application environment, rather than just a benchmark test. The focus of the presentation was iSCSI. FCoE and FC were only meant to provide a reference points. The comparisons were not intended to be exact nor precise. 10GbE and iSCSI offers the performance to satisfy business critical performance requirements. Customers looking to deploy a storage network should consider a proof of concept to ensure that a new solution can satisfy their specific application requirements.

Question: Two FC switches were used during this testing. Was it to solve an operation risk of no single point of failure?
Answer:
The use of two switches was due to hardware limitation. Each switch had 8-ports and the test required 8 ports at the target and the host. Since this was a lab setup, we weren’t configuring for HA. However, the recommendation for any production environment would be to use redundant switches. This would apply for iSCSI storage networks as well.
Question: How can iSCSI match all the distributed management and security capabilities of Fibre Channel / FCoE such as FLOGI, integrated name server, zoning etc?
Answer:
The feature lists between the two protocols don’t match exactly. The point of this presentation was to point out that iSCSI is closing the performance gap and has enough high-end features to make it enterprise-ready.
Question: How strong is the possibility that 40G Ethernet will be bypassed, with a move directly from 10G to 100G?
Answer: Vendors are shipping products today that support 40Gb Ethernet so it seems clear that there will be a 40GbE. Time will tell if customers bypass 40GbE and wait for 100GbE.

Thanks again for checking out our blog. We hope to have you on our next webinar live, but if not, we’ll be updating this blog frequently.

Gary Gumanow – iSCSI SIG Co-chairman, ESF Marketing Chair

SQL Server “rocks” with iSCSI – Emulex and NetApp tell why

The leading storage network technology for mission critical applications  today  is Fibre Channel (FC). Fibre Channel is a highly reliable and high performing network technology for block storage applications. But, for organizations that can’t afford single purpose networks or the added complexity of managing more than one network technology, FC may not be ideal. With the introduction of Fibre Channel over Ethernet (FCoE), the ability to deploy your FC storage resources over a shared Ethernet network is now possible. But, FCoE isn’t the only available option for block storage over Ethernet.

Initially used primarily by small and medium sized businesses or for less demanding applications, iSCSI is now finding broad application by larger enterprises for mission critical applications. Some of the drivers for increased iSCSI adoption in the enterprise include lower cost for 10Gb Ethernet components as well as the drive toward cloud based infrastructures which benefit from increased flexibility and scalability associated with IP network protocols.

On February 24th, the SNIA Ethrnet Storage Forum will present a live webcast to discuss the advantages of iSCSI storage for business applications and will show test results demonstrating the performance of  SQL Server deployed with  10GbE iSCSI. Hosted by Gary Gumanow, co-chair of the iSCSI SIG and ESF board member, this presentation will include content experts from  Emulex  and  NetApp along with a live Q&A.

Guest Speakers

Steve Abbott – Sr. Product Marketing Manager, Emulex

Wei Liu – Microsoft Alliance Engineer, NetApp

Data & Time: February 24th, 11am PT

Register today at http://www.brighttalk.com/webcast/25316

SNIA ESF

The SNIA Ethernet Storage Forum is dedicated to educating the IT community on the advantages and best use of Ethernet storage. This presentation is the first in a series of marketing activities that will primarily focus on data center applications during the calendar year 2011.

Ethernet and IP Storage – Today’s Technology Enabling Next Generation Data Centers

I continue to believe that IP based storage protocols will be preferred for future data center deployments. The future of IT is pointing to cloud based architectures, whether internal or external. At the core of the cloud is virtualization. And I believe that Ethernet and IP storage protocols offer the greatest overall value to unlock the potential of virtualization and clouds. Will other storage network technologies work? Of course. But, I’m not talking about whether a network “works”. I’m suggesting that a converged network environment with Ethernet and IP storage offers the best combined value for virtual environments and cloud deployments. I’ve written and spoken about this topic before. And I will likely continue to do so. So, let me mention a few reasons to choose IP storage, iSCSI or NAS, for use in cloud environments.

Mobility. One of the many benefits of server virtualization is the ability to non-disruptively migrate applications from one physical server to another to support load balancing, failover or redundancy, and servicing or updating of hardware. The ability to migrate applications is best achieved with networked storage since the data doesn’t have to move when a virtual machine (VM) moves. But, the network needs to maintain connectivity to the fabric when a VM moves. Ethernet offers a network technology capable of migrating or reassigning network addresses, in this case IP addresses, from one physical device to another. When a VM moves to another physical server, the IP addresses move with it. IP based storage, such as iSCSI, leverages the built in capabilities of TCP/IP over Ethernet to migrate network port addresses without interruption to applications.

Flexibility. Most data centers require a mixture of applications that access either file or block data. With server virtualization, it is likely that you’ll require access to file and block data types on the same physical server for either the guest or parent OS. The ability to use a common network infrastructure for both the guest and parent can reduce cost and simplify management. Ethernet offers support for multiple storage protocols. In addition to iSCSI, Ethernet supports NFS and CIFS/SMB resulting in greater choice to optimize application performance within your budget. FCoE is also supported on an enhanced 10Gb Ethernet network to offer access to an existing FC infrastructure. The added flexibility to interface with existing SAN resources enhances the value of 10Gb as a long-term networking solution.

Performance. Cost. Ubiquity. Other factors that enhance Ethernet storage and therefore IP storage adoption include a robust roadmap, favorable economics, and near universal adoption. The Ethernet roadmap includes 40Gb and 100Gb speeds which will support storage traffic and will be capable of addressing any foreseeable application requirements. Ethernet today offers considerable economic value as port prices continue to drop. Although Gb speeds offer sufficient bandwidth for most business applications, the cost per Gb of bandwidth with 10 Gigabit Ethernet (GbE) is now lower than GbE and therefore offers upside in cost and efficiency. Finally, nearly all new digital devices including mobile phones, cameras, laptops, servers, and even some home appliances, are being offered with WiFi connectivity over Ethernet. Consolidating onto a single network technology means that the networking infrastructure to the rest of the world is essentially already deployed. How good is that?

Some may view moving to a shared network as kind of scary. The concerns are real. But, Ethernet has been a shared networking platform for decades and continues to offer enhanced features, performance, and security to address its increased application. And just because it can share other traffic, doesn’t mean that it must. Physical isolation of Ethernet networks is just as feasible as any other networking technology. Some may choose this option. Regardless, selecting a single network technology, even if not shared across all applications, can reduce not only capital expense, but also operational expense. Your IT personnel can be trained on a single networking technology versus multiple specialized single purpose networks. You may even be able to reduce maintenance and inventory costs to boot.

Customers looking to architect their network and storage infrastructure for today and the future would do well to consider Ethernet and IP storage protocols. The advantages are pretty compelling.