Networking Questions for Ethernet Scale-Out Storage

Unlike traditional local or scale-up storage, scale-out storage imposes different and more intense workloads on the network. That’s why the SNIA Networking Storage Forum (NSF) hosted a live webcast “Networking Requirements for Ethernet Scale-Out Storage.” Our audience had some insightful questions. As promised, our experts are answering them in this blog.

Q. How does scale-out flash storage impact Ethernet networking requirements?

A.  Scale-out flash storage demands higher bandwidth and lower latency than scale-out storage using hard drives. As noted in the webcast, it’s more likely to run into problems with TCP Incast and congestion, especially with older or slower switches. For this reason it’s more likely than scale-out HDD storage to benefit from higher bandwidth networks and modern datacenter Ethernet solutions–such as RDMA, congestion management, and QoS features.

Q. What are your thoughts on NVMe-oF TCP/IP and availability?

A.  The NVMe over TCP specification was ratified in November 2018, so it is a new standard. Some vendors already offer this as a pre-standard implementation. We expect that several of the scale-out storage vendors who support block storage will support NVMe over TCP as a front-end (client connection) protocol in the near future. It’s also possible some vendors will use NVMe over TCP as a back-end (cluster) networking protocol.

Q. Which is better: RoCE or iWARP?

A.  SNIA is vendor-neutral and does not directly recommend one vendor or protocol over another. Both are RDMA protocols that run on Ethernet, are supported by multiple vendors, and can be used with Ethernet-based scale-out storage. You can learn more about this topic by viewing our recent Great Storage Debate webcast “RoCE vs. iWARP” and checking out the Q&A blog from that webcast.

Q. How would you compare use of TCP/IP and Ethernet RDMA networking for scale-out storage?

A.  Ethernet RDMA can improve the performance of Ethernet-based scale-out storage for the front-end (client) and/or back-end (cluster) networks. RDMA generally offers higher throughput, lower latency, and reduced CPU utilization when compared to using normal (non-RDMA) TCP/IP networking. This can lead to faster storage performance and leave more storage node CPU cycles available for running storage software. However, high-performance RDMA requires choosing network adapters that support RDMA offloads and in some cases requires modifications to the network switch configurations. Some other types of non-Ethernet storage networking also offer various levels of direct memory access or networking offloads that can provide high-performance networking for scale-out storage.

Q. How does RDMA networking enable latency reduction?

A. RDMA typically bypasses the kernel TCP/IP stack and offloads networking tasks from the CPU to the network adapter. In essence it reduces the total path length which consequently reduces the latency. Most RDMA NICs (rNICs) perform some level of networking acceleration in an ASIC or FPGA including retransmissions, reordering, TCP operations flow control, and congestion management.

Q. Do all scale-out storage solutions have a separate cluster network?

A.  Logically all scale-out storage systems have a cluster network. Sometimes it runs on a physically separate network and sometimes it runs on the same network as the front-end (client) traffic. Sometimes the client and cluster networks use different networking technologies.

 

 

 

 

RDMA for Persistent Memory over Fabrics – FAQ

In our most recent SNIA Networking Storage Forum (NSF) webcast Extending RDMA for Persistent Memory over Fabrics, our expert speakers, Tony Hurson and Rob Davis outlined extensions to RDMA protocols that confirm persistence and additionally can order successive writes to different memories within the target system. Hundreds of people have seen the webcast and have given it a 4.8 rating on a scale of 1-5! If you missed, it you can watch it on-demand at your convenience. The webcast slides are also available for download.

We had several interesting questions during the live event. Here are answers from our presenters:

Q. For the RDMA Message Extensions, does the client have to qualify a WRITE completion with only Atomic Write Response and not with Commit Response?
A. If an Atomic Write must be confirmed persistent, it must be followed by an additional Commit Request. Built-in confirmation of persistence was dropped from the Atomic Request because it adds latency and is not needed for some application streams.

Q. Why do you need confirmation for writes? From my point of view, the only thing required is ordering.

A. Agreed, but only if the entire target system is non-volatile! Explicit confirmation of persistence is required to cover the “gap” between the Write completing in the network and the data reaching persistence at the target.

Q. Where are these messages being generated? Does NIC know when the data is flushed or committed?

A. They are generated by the application that has reserved the memory window on the remote node. It can write using RDMA writes to that window all it wants, but to guarantee persistence it must send a flush.

Q. How is RPM presented on the client host?

A. The application using it sees it as memory it can read and write.

Q. Does this RDMA commit response implicitly ACK any previous RDMA sends/writes to same or different MR?

A. Yes, the new Commit (and Verify and Atomic Write) Responses have the same acknowledgement coalescing properties as the existing Read Response. That is, a Commit Response is explicit (non-coalesced); but it coalesces/implies acknowledgement of prior Write and/or Send Requests.

Q. Does this one still have the current RMDA Write ACK?

A. See previous general answer. Yes. A Commit Response implicitly acknowledges prior Writes.

Q. With respect to the Race Hazard explained to show the need for explicit completion response, wouldn’t this be the case even with a non-volatile Memory, if the data were to be stored in non-volatile memory. Why is this completion status required only on the non-volatile case?

A. Most networked applications that write over the network to volatile memory do not require explicit confirmation at the writer endpoint that data has actually reached there. If so, additional handshake messages are usually exchanged between the endpoint applications. On the other hand, a writer to PERSISTENT memory across a network almost always needs assurance that data has reached persistence, thus the new extension.

Q. What if you are using multiple RNIC with multiple ports to multiple ports on a 100Gb fabric for server-to-server RDMA? How is order kept there…by CPU software or ‘NIC teaming plus’?

A. This would depend on the RNIC vendor and their implementation.

Q. What is the time frame for these new RDMA messages to be available in verbs API?

A. This depends on the IBTA standards approval process which is not completely predicable, roughly sometime the first half of 2019.

Q. Where could I find more details about the three new verbs (what are the arguments)?

A. Please poll/contact/Google the IBTA and IETF organizations towards the end of calendar year 2018, when first drafts of the extension documents are expected to be available.

Q. Do you see this technology used in a way similar to Hyperconverged systems now use storage or could you see this used as a large shared memory subsystem in the network?

A. High-speed persistent memory, in either NVDIMM or SSD form factor, has enormous potential in speeding up hyperconverged write replication. It will require however substantial re-write of such storage stacks, moving for example from traditional three-phase block storage protocols (command/data/response) to an RDMA write/confirm model. More generally, the RDMA extensions are useful for distributed shared PERSISTENT memory applications.

Q. What would be the most useful performance metrics to debug performance issues in such environments?

A. Within the RNIC, basic counts for the new message types would be a baseline. These plus total stall times encountered by the RNIC awaiting Commit Responses from the local CPU subsystem would be useful. Within the CPU platform basic counts of device write and read requests targeting persistent memory would be useful.

Q. Do all the RDMA NIC’s have to update their firmware to support this new VERB’s? What is the expected performance improvement with the new Commit message?

A. Both answers would depend on the RNIC vendor and their implementation.

Q. Will the three new verbs be implemented in the RNIC alone, or will they require changes in other places (processor, memory controllers, etc.)?

A. The new Commit request requires the CPU platform and its memory controllers to confirm that prior write data has reached persistence. The new Atomic Write and Verify messages however may be executed entirely within the RNIC.

Q. What about the future of NVMe over TCP – this would be much simpler for people to implement. Is this a good option?

A. Again this would depend on the NIC vendor and their implementation. Different vendors have implemented various tests for performance. It is recommended that readers do their own due diligence.

 

Oh What a Tangled Web We Weave: Extending RDMA for PM over Fabrics

For datacenter applications requiring low-latency access to persistent storage, byte-addressable persistent memory (PM) technologies like 3D XPoint and MRAM are attractive solutions. Network-based access to PM, labeled here Persistent Memory over Fabrics (PMoF), is driven by data scalability and/or availability requirements. Remote Direct Memory Access (RDMA) network protocols are a good match for PMoF, allowing direct RDMA data reads or writes from/to remote PM. However, the completion of an RDMA Write at the sending node offers no guarantee that data has reached persistence at the target.

Join the Networking Storage Forum (NSF) on October 25, 2018 for out next live webcast, Extending RDMA for Persistent Memory over Fabrics. In this webcast, we will outline extensions to RDMA protocols that confirm such persistence and additionally can order successive writes to different memories within the target system. Learn:

  • Why we can’t just treat PM just like traditional storage or volatile memory
  • What happens when you write to memory over RDMA
  • Which programming model and protocol changes are required for PMoF
  • How proposed RDMA extensions for PM would work

We believe this webcast will appeal to developers of low-latency and/or high-availability datacenter storage applications and be of interest to datacenter developers, administrators and users. I encourage you to register today. Our NSF experts will be on hand to answer you questions. We look forward to your joining us on October 25th.

 

RoCE vs. iWARP Q&A

In our RoCE vs. iWARP webcast, experts from the SNIA Ethernet Storage Forum (ESF) had a friendly debate on two commonly known remote direct memory access (RDMA) protocols that run over Ethernet: RDMA over Converged Ethernet (RoCE) and the IETF-standard iWARP. It turned out to be another very popular addition to our “Great Storage Debate” webcast series. If you haven’t seen it yet, it’s now available on-demand along with a PDF of the presentation slides.

We received A LOT of questions related to Performance, Scalability and Distance, Multipathing, Error Correction, Windows and SMB Direct, DCB (Data Center Bridging), PFC (Priority Flow Control), lossless networks, and Congestion Management, and more. Here are answers to them all.   Read More

Fibre Channel vs. iSCSI – The Great Debate Generates Questions Galore

The SNIA Ethernet Storage Forum recently hosted the first of our “Great Debates” webcasts on Fibre Channel vs. iSCSI. The goal of this series is not to have a winner emerge, but rather provide vendor-neutral education on the capabilities and use cases of these technologies so that attendees can become more informed and make educated decisions. And it worked! Over 1,200 people have viewed the webcast in the first three weeks! And the comments from attendees were exactly what we had hoped for:

“A good and frank discussion about the two technologies that don’t always need to compete!”

Really nice and fair comparison guys. Always well moderated, you hit a lot of material in an hour. Thanks for your work!”  

“Very fair and balanced overview of the two protocols.”

“Excellent coverage of the topic. I will have to watch it again.”

If you missed the webcast, you can watch it on-demand at your convenience and download a copy of the slides.

The debate generated many good questions and our expert speakers have answered them all: Read More

2017 Ethernet Roadmap for Networked Storage

When SNIA’s Ethernet Storage Forum (ESF) last looked at the Ethernet Roadmap for Networked Storage in 2015, we anticipated a world of rapid change. The list of advances in 2016 is nothing short of amazing

  • New adapters, switches, and cables have been launched supporting 25, 50, and 100Gb Ethernet speeds including support from major server vendors and storage startups
  • Multiple vendors have added or updated support for RDMA over Ethernet
  • The growth of NVMe storage devices and release of the NVMe over Fabrics standard are driving demand for both faster speeds and lower latency in networking
  • The growth of cloud, virtualization, hyper-converged infrastructure, object storage, and containers are all increasing the popularity of Ethernet as a storage fabric

The world of Ethernet in 2017 promises more of the same. Now we revisit the topic with a look ahead at what’s in store for Ethernet in 2017.   Join us on December 1, 2016 for our live webcast, “2017 Ethernet Roadmap to Networked Storage.”

With all the incredible advances and learning vectors, SNIA ESF has assembled a great team of experts to help you keep up. Here are some of the things to keep track of in the upcoming year:

  • Learn what is driving the adoption of faster Ethernet speeds and new Ethernet storage models
  • Understand the different copper and optical cabling choices available at different speeds and distances
  • Debate how other connectivity options will compete against Ethernet for the new cloud and software-defined storage networks
  • And finally look ahead with us at what Ethernet is planning for new connectivity options and faster speeds such as 200 and 400 Gigabit Ethernet

The momentum is strong with Ethernet, and we’re here to help you stay informed of the lightning-fast changes. Come join us to look at the future of Ethernet for storage and join this SNIA ESF webcast on December 1st. Register here.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Find out How iSCSI is Evolving

The next Ethernet Storage Forum Webcast. “Evolution of iSCSI including iSER, iSCSI over RDMA Ethernet,” will focus on developments with iSCSI – the Internet Protocol standard for transferring SCSI commands across an Ethernet network, enabling hosts to link to storage devices wherever they may be.  At this Webcast on May 24th, I will be joined by Fred Knight, Standards Technologist at NetApp, and Andy Banta, Storage Janitor at SolidFire/NetApp, who will discuss the evolution of iSCSI up to iSER, which takes advantage of Ethernet RDMA fabric technologies to enhance performance.  Register now to hear:

  • A brief history of iSCSI
  • How iSCSI works
  • IETF refinements to the specification
  • Enhancing iSCSI  performance with iSER

The Webcast will be live, so please bring your questions for Andy and Fred. We hope to see you there!

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

How Ethernet RDMA Protocols iWARP and RoCE Support NVMe over Fabrics

NVMe (Non-Volatile Memory Express) over Fabrics is of tremendous interest among storage vendors, flash manufacturers, and cloud and Web 2.0 customers. Because it offers efficient remote and shared access to a new generation of flash and other non-volatile memory storage, it requires fast, low latency networks, and the first version of the specification is expected to take advantage of RDMA (Remote Direct Memory Access) support in the transport protocol.

Many customers and vendors are now familiar with the advantages and concepts of NVMe over Fabrics but are not familiar with the specific protocols that support it. Join us on January 26th for this live Webcast that will explore and compare the Ethernet RDMA protocols and transports that support NVMe over Fabrics and the infrastructure needed to use them. You’ll hear:

  • Why NVMe Over Fabrics requires a low-latency network
  • How the NVMe protocol is mapped to the network transport
  • How RDMA-capable protocols work
  • Comparing available Ethernet RDMA transports: iWARP and RoCE
  • Infrastructure required to support RDMA over Ethernet
  • Congestion management methods

The event is live, so please bring your questions. We look forward to answering them.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Ethernet Roadmap for Networked Storage Q&A

Almost 200 people attended our joint Webcast with the Ethernet Alliance: “The 2015 Ethernet Roadmap for Networked Storage.” We had a lot of great questions during the live event, but we did not have time to answer them all. As promised, we’ve complied answers for all of the questions that came in. If you think of additional questions, please feel free to comment on this blog.

Q. What did you mean by parity of flash with HDD?

A. We were referring to the O’Reilly article in “Network Computing.”   O’Reilly is predicting parity in BOTH capacity and price in 2016.

Q. When do we expect IEEE standards ratification for 25G speed?

A. 2016.   You can see the exact schedule here.

Q. Do you envision the Enterprise, Cloud Providers, HPC, Financials getting rid of their 10/40GbE infrastructure and replacing that with 25/100GbE infrastructure in 2017? Will these customers deploy 100GbE/25GbE switch in the leaf layer in 2017?

A. Deployment will occur over a multi-year time span overall if only because switch infrastructure is expensive to upgrade, as reflected in the Crehan Research forecast.   New deployments will likely move to 25/100GbE as new switches with 100GbE downstream ports become available in 2016.     Just because the Cloud Service Providers are currently the most aggressive in driving new infrastructure purchases, they represent the largest early volumes for 25/100 GbE.   Enterprise is still in the midst of the transition from 1GbE to 10GbE.

Q. What are some of the developments on spanning-tree derivatives vs. Dykstra based derivatives such as OSPF, FSPF for switches?

A. Beyond the scope of this presentation on Ethernet.   Ethernet is defined by the IEEE for L1 and L2 in the ISO model.   Your questions are at L3 and L4, which is handled by organizations like IETF.

Q. With all the speeds possible who is working on flow control?

A. Flow control at the 802.1 level is supported in the Layer 1/2 PHY & MAC by setting upper bounds on the delay through each layer which allows higher layers to comprehend the delays & response times to pause frames. Each new speed & PHY in 802.3 is accompanied by delay constraint specifications to support this.

Q.   Do you have an overlay graphic that shows the Ethernet RDMA roadmap?   If so, is Ethernet storage the primary driver for that technology?

A.   Beyond the scope of this presentation on Ethernet.   Ethernet is defined by the IEEE for L1 and L2 in the ISO model.   Your questions are at L3 and L4, which is handled by organizations like IETF and the InfiniBand Trade Association.

Q. The adoption of faster and new Ethernet always has to do with the costs of acquiring new technology. How long do you think it will take to adopt/acquire faster Ethernet in datacenters now that the development is happening much faster than the last 20 years?

A. Please see the chart on slide 7 where Crehan Research predicts how fast the technology will diffuse into deployments.

Q. What do you expect as cost comparison between Ethernet and InfiniBand going forward?
Also, what work is being done to reduce latency?

A. Beyond the scope of this presentation.   Latency is primarily a consequence of design methodologies and semiconductor process technology, and thus under the control of the silicon device manufacturers.   Some vendors prioritize latency more than others.

Q. What’s the technical limitation as speeds go higher and higher?

A. A number of factors limit speeds going faster and faster, but the main problem is that materials attenuate signals as they travel at higher frequencies.

Q. Will 1GbE used for manageability purposes disappear from public cloud? If so, what is the expected time frame?

A. This is a choice for end users.   Most equipment is managed on a separate network for security concerns, but users can eliminate these management networks at any time.

Q. What are the relative market size predictions for the expanding number of standards (25G, 50G, 100G, 200G, etc.)?

A. See the Crehan Research forecast in the presentation.

Q. What is the major difference between SMF & MMF for the not so initiated?

A. The SMF has a 9um core while the MMF has a 50um core.   Different lasers are used for each fiber type and MMF typically goes 100 meters above 10GbE and SMF goes from 500m to 10km.

Q. Will 25G be available through both copper and fibre connectivity?

A. Yes.   IEEE 802.3 work is currently underway to specify 25Gb/s on twinax (“direct attach copper)” to 5 meters, printed circuit backplane up to ~1m, twisted pair copper to 30m, multimode fiber to 100m.   There is no technology barrier to 25G on SMF, just that a standards project to specify it has not started yet.

Q. This is interesting from a hardware viewpoint, but has nothing to do with storage yet.   Are we going to get to how this relates to storage other than saying flash drives are fast and only Ethernet can keep up?

A. Beyond the scope of this presentation on Ethernet.   Ethernet is defined by the IEEE for L1 and L2 in the ISO model.   Your questions are directed at the higher layers.   The key point of this webcast is that storage networking engineers need to pay much more attention to the Ethernet roadmap than they have historically, primarily because of NVM.

Q. How does “SFP 28” fit in this mix?   Is it required for 25G?

A. SFP28 connectors and modules are required for 25GbE because they give better performance than SFP+ that only works to 10GbE.

Q. Can you provide the quick difference between copper & optical on speed & costs?

A. Copper and optical Ethernet links are usually standardized at the same speed.   400GbE is not defining a copper link but an active Direct Attached Cable (DAC) will probably support 400GbE.   Cost depends on volume and many factors and is beyond the scope of this presentation.   Copper is usually a fraction of the cost of optical links.

Q. Do you think people will try to use multiple CAT 5e to get more aggregate bandwidth to the access points to avoid having to run Fibre to them?

A. IEEE is defining 2.5GBASE-T and 5GBASE-T to enable Cat5e to support faster wireless access points.

Q. When are higher speeds and PoE going to reach the point when copper based Ethernet will become a viable heat source for buildings thus helping the environment?

A. 🙂   IEEE is defining 4 wire PoE to deliver at least 60W to end devices.   You can find out more here.

Q. What are the use cases for 2.5Gb and 5.0Gb Base-T?

A. The leading use case for 2.5G/5GBASE-T is to provide the uplink for wireless LAN access points that support 802.11ac and future wireless technology.   Wireless LAN technology has advanced to the point where >1Gb/s BW is needed upstream from the AP, and 2.5G/5G provide a higher speed uplink while preserving the user’s investment in Cat5e/Cat6 cabling.

Q. Why not have only CFP2 sockets right away with things disabled for lower speeds for all the intervening years leading to full-fledged CFP2?

A. CFP2 is defined for 100GbE and 8 ports can be used on a 1U switch. 100GbE switches are shifting to QSFP28 so that 32 ports of 100GbE is supported in a 1U switch at low cost.   The CFP2 is much more expensive than QSFP28 and will not be used for lower speeds because of the high cost.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Benefits of RDMA in Accelerating Ethernet Storage Q&A

At our recent live Webcast “Benefits of RDMA in Accelerating Ethernet Storage Connectivity” experts from Emulex, Intel and Microsoft had an insightful discussion on the ways RDMA is having an impact on Ethernet storage. The live event was attended by nearly 200 people and feedback was overwhelming positive with several attendees thanking us for our vendor neutral presentation and one attendee commenting that it was, “Probably the most clearly comprehensible yet comprehensive webinar I’ve attended in some time.” If you missed the Webcast, it’s now available on demand. We did not have time to get to everyone’s questions, so as promised, below are answers to all of them. If you have additional questions, please ask them in the comments section in this blog and we’ll get back to you as soon as possible.

Q.  Is RDMA over RoCEv2 in production?

A. The IBTA released the RoCEv2 Specification in September 2014.  In order to support that specification changes may be required across the RDMA stack, including firmware, drivers & operating systems.   Schedules for implementation of that specification will vary by operating system.   For example, the OpenFabrics Alliance (OFA) has not released an Open Fabrics Enterprise Distribution (OFED) version that implements that standard yet, although it is in process now.   Once OFA completes their OFED stack implementation, the Linux distribution vendors will then incorporate and support the updated OFED stack.   Implementations provided prior to full OFA and Distro vendor support would be preliminary, potentially incompatible with the OFED release, and require confirmation by the distro vendor with regard to the nature/level of support they would be providing

Q. I would have liked a list of Windows applications that take advantage of SMB Direct – both in a Hyper-V host or bare metal.

A.  In Windows, any file-based application can make use of SMB3 and SMB Direct due to the native file-based programming interface support. No application changes are required. For certain enterprise applications such as Hyper-V and SQL Server, SMB3 is officially supported, and more information can be found in the product catalog at www.microsoft.com.

Q. Are there any particular benefits in using one network protocol over another for SMB Direct/RDMA (iWARP vs. RoCE vs. IB)?

A.  There are no hard and fast rules; any adapter or protocol can be suitable for many scenarios. Of the Ethernet-based protocols we considered in today’s webcast

  • iWARP offers the benefit of operation over TCP with its reliability and routability, well-suited to a broad range of installed infrastructure.
  • RoCE offers a lightweight, efficient protocol when a DCB-enabled switched fabric is available. RoCE, however, is not routable.
  • RoCEv2 offers similar properties to RoCE, with the possibility to scale to larger routed and DCB-enabled fabrics.

Q. Who are the vendors offering iWARP capable RNICs?

A. Chelsio Communications has production iWARP adapters today, and both Intel and Qlogic have publicly committed to future iWARP controllers.

Q. How much testing has been done with SMB3, and in particular SMB direct, over WAN connections?

A. The SMB2 protocol was originally designed to adapt to WAN scenarios, and supports a credit-based management of large amounts of data to be outstanding, to make best use of WAN-type long pipes. The SMB3 protocol retains these design attributes, and the SMB Direct protocol also supports similar deep pipelining. The iWARP protocol, being layered on standard TCP, is well suited to such deployments, and RoCE WAN adapters are potentially available. Please contact the respective technology vendors for information on any available testing results.

Q. I love a future webcast for RDMA enabled distributed filesystems.

A. Thanks for the suggestion! We’re always looking for ideas for future webcasts and SNIA-ESF will consider this as a potential follow-on.

Q.  Is Live Migration the scenario where “packet size” is 1MB?

A.  All SMB Direct scenarios have workloads that range anywhere up to 8MB. For large file copies, most SMB3 clients request from 1MB to 8MB per operation, for Hyper-V live migration, transfers are typically similar, during the bulk transfer phase.

Q. SMB3 is being compared to FC for enterprise. If Ethernet based protocols are of interest, wouldn’t FCoE give the same performance as FC (same stack) vs. SMB3?

A. SMB3 with SMB Direct enables many workloads not possible with Fibre Channel over Ethernet, and performance comparisons are therefore difficult. Perhaps another SNIA webcast could investigate this!

Q.  Regarding your SMB direct example with lots of small operations, how do you deal with the overhead of registering and unregistering buffers for the RDMA operations?

A. As answered later in the session, the registration and unregistration is not a protocol matter, but in the case of the Windows implementation, it is strictly performed for the specific buffers of each operation, which is critical for security, data integrity, and system protection. The standard “Fast Register Work Request” method is used, and careful implementation has shown that the overhead does not negatively impact performance, even for small I/O (4KB/operation). Check out Jose Barreto’s blog, which contains many benchmark results.

Q. But isn’t Live Migration done in 1MB “chunks”? So not “small” I/Os?

A. As answered later in the session, Hyper-V Live Migration is done in several phases, the first phase is the initial bulk copy of memory, done in large chunks, but immediately after it a second phase of copying individual pages which were dirtied by the live-running VM is performed. These operations are typically 4KB. Note: The faster the initial phase goes, the less work there is in this second phase, but in both phases, the faster, the better, and RDMA accelerates both.

Q. Are iSER and iWARP alternatives to one another?

A.  iWARP is an RDMA protocol, and iSER is a mapping of iSCSI to iWARP, as well as RoCE/InfiniBand.

Q. What’s Intel’s roadmap for RoCE and/or iWARP?

A.  Intel is committed to iWARP and plans to incorporate it in future server chipsets and SOCs. See http://www.intel.com/content/www/us/en/ethernet-products/accelerating-ethernet-iwarp-video.html for more information.

Q. Is there any other Transport being used other than IB to create a reliable transport for RoceV2? Puristically it is possible?

A. RoCE was developed to leverage Infiniband as much as possible.   For that reason, the Infiniband transport was chosen when the RoCE standard was developed.   As the RoCEv2 standard was developed, the underlying Infiniband network protocol was replaced with IPv4 / IPv6 in order to provide the layer 3 routability  and UDP to provide stateless encapsulation (and indication) of the Infiniband transport header that was retained.   While it may be possible to develop a reliable transport to replace Infiniband, the RoCE standards body has elected not to go that route  as of this writing.