Expanding Your Data Center with FCoE – Q&A

At our recent live ESF Webcast, “Expert Insights: Expanding the Data Center with FCoE,” we examined the current state of FCoE and looked at how this protocol can expand the agility of the data center if you missed it, it’s now available  on-demand. We did not have time to address all the questions, so here are answers to them all. If you think of additional questions, please feel free to comment on this blog.

Q. You mentioned using 40 and 100G for inter-switch links.   Are there use cases for end point (FCoE target and initiator) 40 and 100G connectivity?

A. Today most end points are only supporting 10G, but we are starting to see 40G server offerings enter the market, and activity among the storage vendors designing these 40G products into their arrays.

Q. What about interoperability between FCoE switch vendors?

A. Each switch vendor has his own support matrix, and would need to be examined independently.

Q. Is FCoE supported on copper cable?

A. Yes, FCoE supports “Twin Ax” copper and is widely used for server to top of rack switch connections to seven meters.  In fact, Converged Network Adapters are now available that support 10GBASE-T copper cables with the familiar RJ-45 jack.   At least one major switch vendor has qualified FCoE running over 10GBASE-T to 30 meters.

Q. What distance does FCoE support?

A. Distance limits are dependent on the hardware in use and the buffering available for Priority Flow Control. The lengths can vary from 3m up to over 80km. Top of rack switches would fall into the 3m range while larger class switch/directors would support longer lengths.

Q. Can FCoE take part in management/orchestration by OpenStack Neutron?

A. As of this writing there are no OpenStack extensions in Neutron for FCoE-specific plugins.

Q. So how is this FC-BB-6 different than FIP snooping?

A. FIP Snooping is a part of FC-BB-5 (Appendix D), which allows switch devices to identify an FCoE Frame format and create a forwarding ACL to a known FCF. FC-BB-6 creates additional architectural elements for deployments, including a “switch-less” environment (VN2VN), and a distributed switch architecture with a controlling FCF. Each of these cases is independent from the other, and you would choose one instead of the others. You can learn more about VN2VN from our SNIA-ESF Webcast, “How VN2VN Will Help Accelerate Adoption of FCoE.”

Q. You mentioned DCB at the beginning of the presentation. Are there other purposes for DCB? Seems like a lot of change in the network to create a DCB environment for just FCoE. What are some of the other technologies that can take advantage of DCB?

A. First, DCB is becoming very ubiquitous. Unlike the early days of the standard, where only a few switches supported it, today most enterprise switches support DCB protocols. As far as other use cases for DCB, iSCSI benefits from DCB, since it eliminates dropped packets and the TCP/IP protocol’s backoff algorithm when packets are dropped, smoothing out response time for iSCSI traffic. There is a protocol known as RoCE or RDMA over Converged Ethernet. RoCE requires the lossless fabric DCB creates to achieve consistent low latency and high bandwidth.   This is basically the InfiniBand API running over Ethernet. Microsoft’s latest version of file serving protocol, SMB Direct, and the Hyper-V Live Migration can utilize RoCE, and there is an extension to iSCSI known as iSER, which replaces TCP/IP with RDMA for the iSCSI datamover; enabling all iSCSI reads and writes to be done as RDMA operations using RoCE.

Q. Great point about RoCE.   iSCSI RDMA (iSER) is required from DCB if the adapters support RoCE, right?

A. Agreed. Please see the answer above to the DCB question.

Q. Did that Boeing Aerospace diagram still have traditional FC links, and if yes, where?

A. There was no Fibre Channel storage attached in that environment. Having the green line in the ledger was simply to show that Fibre Channel would have it’s own color should there be any links.

Q. What is the price of a 10 Gbp CNA compare to a 10Gbps NIC ?

A. Price is dependent on vendor and economics. But, there are several approaches to delivering the value of FCoE which can influence pricing:

  • Purpose built silicon that offloads the FC and Ethernet protocol functions offer a number of advantages including high performance, low CPU overhead, advanced features, etc., though even this depends on the vendor’s implementation.    But, these added features come with the expectation of additional cost. But, the processing of the protocols has to be done somewhere, and if you need your server CPUs to process applications instead of network protocols, then the value is justified.
  • With the introduction of Open FCoE drivers with DCB supported NICs, new options are available for customers to deploy the value of FCoE at the host. Open FCoE offloads the FC processing onto the host CPU and standard 10GbE NICs with DCB support can be used to manage the Ethernet transport functions. Where you have excess CPU capacity on your server, you might be in a position to reduce costs and deploy a software driver with  a 10GbE or faster NIC enhanced with the limited set of hardware offloads necessary to achieve full performance with Open FCoE. However, Open FCoE isn’t available with every OS or every NIC, so you need to consider OS support and availability.
  • A third consideration is that most enterprise servers include some form of advanced 10GbE networking on the motherboard that either supports purpose built silicon or DCB enabled silicon. So, depending upon which server and OS you deploy, you may have several options via embedded silicon.

 

What’s Happening with 25GbE

In July 2014, IEEE 802.3 voted to form a Study Group for 25Gb/s Ethernet.   There has been a lot attention in the networking press lately about 25Gb/s Ethernet, but many people are asking what is it and how did we get here.   After all, 802.3 already has completed standards for  40Gb/s and  100Gb/s and is currently working on 400Gb/s, so from a pure speed perspective, starting a 25Gb/s project now does look like a step backwards.

(Warning: the following discussion contains excessive physical layer jargon.)

The Sweet Spot

25GbE as a port speed is attractive because it makes use of 25Gb/s per lane signaling technology that has been in development for years in the industry, culminating in the recent completion of 802.3bj, the standard for 100GbE over backplane or twinax copper that utilizes four parallel lanes of 25Gb/s signaling to achieve the 100Gb/s port speed.  Products implementing 25Gb/s signaling in CMOS technology are just starting to come to market, and the rate will likely be a sweet spot for many years, as higher rate signaling of 40Gb/s or 50Gb/s is still in early technology development phases.  The ability to implement this high speed I/O in CMOS is important because it allows combining high-speed I/O with many millions of logic gates needed to implement Ethernet switches, controllers, FPGAs, and microprocessors.  Thus specifying a MAC rate of 25Gb/s to utilize 25Gb/s serdes technology can enable product developers to optimize for both the lowest cost/bit and the highest overall bandwidth utilization of the switching fabric.

4-Lane to 1-Lane Evolution

To see how we got here and why 25Gb/s is interesting, it is useful to back up a couple of generations and look at 10Gb/s and 40Gb/s Ethernet.   Earliest implementations of 10GbE relied on rather wide parallel electrical interfaces: XGMII and the 16-Bit interface.   Very soon after, however, 4-lane serdes-based interfaces became the norm starting with XAUI (for chip-to-chip and chip-to-optical module use) which was then adapted to longer reaches on twinax and backplane (10GBASE-CX4 and 10GBASE-KX4).    Preceding  10GbE achieving higher volumes  (~2009)  was the specification and technical feasibility of 10Gb/s on a single electrical serial lane. XFI was the first followed by 10GBASE-KR (backplane) and SFI (as an optical module interface and for direct attach twinax cable using the SFP+ pluggable form factor).   KR and SFI started to ramp around 2009 and are still the highest volume share of 10GbE ports in datacenter applications. The takeaway, in my opinion, is that single-lane interfaces helped the 10GbE volume ramp by reducing interconnect cost.  Now look forward to 40GbE  and 10GbE.  The initial  standard, 802.3ba,  was  completed in 2010.   So during the time that this specification was being developed, 10Gb/s serial interfaces were gaining traction, and consensus formed around the use of multiple 10Gb/s lanes in parallel to make the 40GbE and 100GbE electrical interfaces.  For example, there is a great similarity between 10GBASE-KR, and one lane of the 40GBASE-KR4 four-lane interface.  In a similar fashion 10Gb/s SFI for twinax  & optics in the SFP+ form factor is similar to a lane of the 40GbE equivalent interfaces for twinax and optics in the QSFP+ form factor.

But how does this get to 25Gb/s?

Due to the similarity in technology needed to make 10GbE and 40GbE, it has because a common feature in Ethernet switch and NIC chips to implement a four-lane port for 40GbE that can be configured to use each lane separately yielding four 10GbE ports.

From there it is a natural extension that 100GbE ports being implemented using 802.3bj technology (4x25Gb/s) also can be configured to support four independent ports operating at 25Gb/s.   This is such a natural conclusion that multiple companies are implementing 25GbE even though it is not a standard.

In some environments, the existence of a standard is not a priority.   For example, when a large-scale datacenter of compute, storage and networking is architected, owned and operated by one entity, that entity validates the necessary configuration to meet its requirements.  For the broader market, however,  there  is typically a requirement for multi-vendor interoperability across a diverse set of configurations and uses.  This is where Ethernet and IEEE 802.3 has provided value to the industry for over 30 years.

Where’s the Application?

Given the nature of their environment, it is the Cloud datacenter operators that are poised to be the early adopters of 25GbE.  Will it also find a home in more traditional enterprise and storage markets?  Time will tell, but in many environments ease of use, long shelf life, and multi-vendor interoperability are the priorities. For any environment, having the 25GbE specification maintained IEEE 802.3 will facilitate those needs.

Upcoming Webcast: Is FCoE the Answer to Data Center Agility?

Fibre Channel over Ethernet (FCoE) has been growing in popularity year after year. From access layer, to multi-hop and beyond, FCoE has established itself as a true solution in the data center.

Interested in learning how the Data Center is expanding with FCoE? Join us on August 20th, at 4:00 pm ET, 1:00 pm PT for our live Webcast, “Expanding the Data Center with FCoE.”  Continuing our conversation from our February Webcast, “Use Cases for iSCSI and FCoE,” which is now available on demand. This live SNIA Webcast examines the current state of FCoE and looks at how this protocol can expand the agility of the data center.

  • We’ll take an unbiased look at the data center using FCoE, covering:
  • The history and evolution of convergence
  • Using FCoE as a storage overlay
  • Single-hop, multi-hop and beyond
  • 40G/100G   – Where does it fit
  • Futures:
    • OpenStack
    • Defining Network Functions Virtualization (NFV)
    • Mapping NFV to FCoE
  • Real-world Use Cases

This will be a vendor-neutral live presentation. Please join us on August 20th and bring your questions for our expert panel. Register now.

 

 

 

Object Storage 101 – Questions and Answers

At our recent live ESF Webcast, “Object Storage 101,” we talked about the what, how, and why behind storage technologies. Over 200 people attended the event. If you missed it, it’s now available on-demand. It was an interactive session and we did not have time to address all the questions, so here are answers to them all. If you think of additional questions, please feel free to comment on this blog.

Q. Would Object Storage be a feasible solution for only the nearline storage tier?

Typically Yes. If we think about the latency needed for real-time transactions, these are best served using a cache storage tier such as NAND or large arrays of RAM. Object stores are excellent methods to store and retrieve large data sets within single/multiple containers. Note: most systems support offset reads so you don’t need to access an entire object to get to the section of interest.

Q. Where is the index to find the location of an object that is stored? Is it stored locally or stored distributedly or replicated among each clusters?

Storage of the Index or Metadata of objects that are stored, if used, typically is replicated throughout the system. Also, if the Metadata is lost, typically, these can be re-built as a maintenance function.

Q. How is the object stored/broken up? Aside from being stored by metadata (like name, size, etc) … what is the process of the fragmentation…breaking it up …as described during this erasure coding segment?   Once it’s assigned some unique identifier … ie. an x-ray picture…. how is it addressed? (if not by block/bit/byte/level)?

Currently, Objects are stored using one of two methods of data protection either Replication or Erasure Coding. Some systems use both. That said, there are several algorithms used today to Erasure Code protect Objects. When using Reed-Solomon methods, you need to specify the number of “Data” Fragments and the number of the “Parity” fragments that will be created. The Size of each “Data” fragment is closely related to the Object size divided by the number of “Data” fragments requested. Each “Parity” fragment will be same size of each of the “Data” fragments created. The protected Object size is the sum of the “Data” fragments plus the “Parity” fragments created. Each of these fragments (Data and Parity) is stored on a different server for the purpose of avoiding a single point failure. The application that created the Object that will be accessing the Object store is responsible for keeping track of the ID of the Object and the Namespace the ID was stored in. Typically the Application will create an ID however, when an Application “Puts” an Object using an existing ID, the older stored Object using that same ID is overwritten. Typically, access into an Object Store using a RESTful Interface using commands like “Put, Get, Delete, List” over HTTP.

Q. Will Object storage drive network scale—further adoption of 10GE and 40GE or is 1GE enough?

Yes. If we think about the interconnection between the Control Plane and Data Plane of these systems (Orchestration and Object Storage Devices), better the connectivity the higher the performance.

Q. Is the number of fragments set or configurable?   What are the trade-offs of requiring fewer fragments for recovery besides perhaps processing overhead?   Are there any gotchas to watch out for/consider?

Yes. Storage policies are configurable. The number of “Parity” fragments defines the data loss risk. The more “Parity” fragments requested the lower this risk but this increases the storage resource needed for the Object. Eliminating single point failures is a key consideration. For example, if your Object Storage system has 10 servers, a storage policy using 9 of 12 will have 2 fragments of this Object located on 2 servers. In this case any single server failure would not cause data loss but may cause higher latency. However, if 3 servers would fail, you would lose access to your data until the servers were recovered. If the drives of the failed servers were not recovered then data loss would occur.

Q. Is erasure encoding used instead of Hash tagging?

No. Hash Tagging is a method of generating a unique number given a specific input of data, this number is used to find the location of the Object to be stored. Erasure Coding is the method used to create the fragments. So think of Hash tag as the seed to the address needed to find the fragments.

Q. How large are the fragments?

A rough estimate is the Object size divided by the number of fragments to re-hydrate the object. (e.g. 1GByte Object stored using a 8 of 12 policy would have a fragment size of 1GByte/8 =~ 125MByte

Q. What do you see as the requirement for the interconnect between the Object storage arrays/boxes to be? Very large pipes as in multiple 40G links or something lower?

It depends on the use case or Service Level Objective for the system. If your system design uses a Proxy service and Erasure Coding, then your back end network throughput (the network connecting the Proxy and Object Storage Devices – Storage Servers) will aggregated (Multiply). In this case the network throughput is based on the number of “Data” fragments being used. If you use Replication, then the back end network throughput will not aggregate. This multiplication factor, if present, is key to an efficient network strategy. In Non-Proxy based Object Storage designs or replication based Object Storage systems the network strategy will scale with network bandwidth to the limitation of the HDDs ability to server data.

Q. What about access control and security at the object level?   Is that typically part of the model?

Typically, access control methods are at the gateway or entry point of a Namespace. The access method used is up to the vendor of the Object Store.

Q. What is the presentation mode at the host level? i.e. a drive mapping or similar

Typically presentation methods are a RESTful API via HTTP. This used “PUT, GET, DELETE, LIST” semantics.

Q. Can you explain the differences/similarities between object storage, CDMI and software defined storage?

Object Stooge defined a system (Software + Hardware) to storage Objects. CDMI defends a method used to access/connect your application to an Object Storage system. Software Defined Storage describes using standard high volume servers with software for the purpose of storing data.

Q. Why can’t a traditional approach be used to Object Storage for its durability?

Traditional storage approaches such as direct attached storage (RAID Sets) do not scale. Once you run out of space, managing additional storage on separate systems becomes the issue.

Q. Aren’t all types of data going to need the accessibility required by users? For example, isn’t everything going to need to placed in an object store?

There is a lot of debate on this issue. The goal of an Object Store is two fold. 1) Drive down the cost/Byte and 2) keep content readily accessible.

Q. How to we avoid losing the Metadata from the data? Also, is there something like sub-meta data, where a small amount of Metadata is contained within the data and the larger Metadata is stored somewhere else?

Some Object storage systems support Extended File Attributes, which is a file system feature that allows the Applications to store “Metadata” about an Object which is then bound to the Object within the storage environment. These Extended File Attributes (XATTR’s) can be queried separately and can be used by your application as you see fit. The management of the XATTR’s is handled by the local file system and accessed by the Object Storage software via the RESTful API using HTTP.

Q. Is maintaining multiple copies mainly for durability or can it be used for performance enhancement (parallel access), or is that irrelevant?

Absolutely!   Management of copies/replicas can serve multiple purposes.  Replication across racks, datacenters, geographies, etc. can provide resiliency against failures at those levels.   Replication can also be used to provide object access in close proximity to the requester.   In the X-ray example discussed in the Webcast, we might set up a replica local to the medical practice for the first 90 days, in order to provide a low latency (time to first byte) copy during the initial treatment.   Additional copies can be kept at remote sites in order to provide fault tolerance.

Q. Is there a standard methodology for migrating from a file-system based methodology to an object store?

The short answer is no.   In general an application that is currently developed to use file or block based storage will need to be re-architected in order to take advantage of an object storage system/service.   There is, however, a growing category of products referred to as “cloud gateways” that can provide a bridge to object storage by presenting a filesystem to the existing application, while writing and reading via a RESTful API to a backend object storage system/service.

Q. Is it safe to say that in order to use object storage the application needs to be “object storage aware”? Unlike a traditional storage where the application doesn’t necessarily need to be familiar with the storage or file system since that is handled at a lower layer.

Yes, however as indicated in the question regarding migration of applications above, it is possible to implement a “cloud gateway” solution that will provide the translation from RESTful API to a CIFS/NFS fileshare, thus not requiring any application changes.   I would disagree with the premise that traditional applications don’t need to be familiar with the underlying storage.   Traditional file-based applications must understand the location (fileserver, folder, filename, etc.) in order to gain access to the appropriate data.

Q. I’m hearing a lot of ‘what’ and ‘how’ but not so much ‘why’ about object storage.  Can we hear some real-world examples of applications in industry today that are running better because of object storage?

An example of an application running today with object storage behind it, and why:   Web Based Media Asset Management/Distribution.   This particular use case tends to deal with billions of files/objects that can vary in size from very small thumbnail images to massive 4k HD movie files.   The ability to deliver these to multiple platforms (phone, laptop, set top box, etc.) across multiple geographies is something that is well suited for object storage.   Traditional file and/or block based storage environments may hit scale limitations in dealing with the number of files/objects, in addition the ability to have a single namespace maintained across multiple locations/datacenters is something that is exceedingly complex for storage environments other than object stores.

Q. Replicating an object two or three times would exponentially increase storage costs, wouldn’t it?   The more copies the higher the costs?

Certainly more copies would use more storage, and as a result most object stores provide different durability schemes based upon the performance/availability tradeoffs the data owner is willing to make.   Recovering a single object from a replica is significantly faster than rebuilding an object from geo-distributed EC fragments. Also, as discussed in the question above related to replicas to drive performance, replication can serve the purpose of placing objects as close to the consumer as possible, minimizing time to first bye and increasing the overall throughput of an application.

Q. If I have an app that access a CIFS share, is there a way to translate it into object store?

Please see answer to question: “Is there a standard methodology for migrating from a file-system based methodology to an object store?” Short answer: Yes, via a “cloud gateway” product.

Q. Is there a confluence point of Object and File based storage – specifically in NAS where object storage can be multi-protocol (NFS, and REST)?

While there are some object storage solutions that provide their own native cloud-gateway capability (NAS protocol to the application, RESTful API to the object store).   There are very few that provide a “file/object duality” capability allowing applications to manipulate an object as both an object and a file.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ethernet Meets Enterprise Storage – Finally

Presumptuous, yes, because Ethernet has been a mainstay in enterprises since its early days over 40 years ago.   It initially grew to prominence as the local area network (LAN) connection in the enterprise. More recent advances have enabled Ethernet to become a standard for mission critical storage connectivity for block, file and object storage in many enterprises.

Block storage in large enterprises has long been focused on Fibre Channel due to its performance capabilities.     In order to bring the same performance benefits to Ethernet, the IEEE 802.1 Data Center Bridging Task Group proposed a number of new standards to enhance Ethernet reliability.   For example, 802.1Qbb Priority-based Flow Control (PFC) provides a link level flow control mechanism to ensure lossless transmission under congestion, 802.1Qaz Enhanced Transmission Selection (ETS) provides a management framework for prioritized bandwidth and Data Center Bridging Exchange Protocol (DCBX) enabled these features to be used between neighbors to ensure consistency on the network. Collectively, these and other enhancements have brought those enterprise-class storage networking features to the Ethernet platform.

In addition, the International Committee for Information Technology Services (INCITS) T11 Fibre Channel committee developed a specification for Fibre Channel over Ethernet (FCoE) in its FC-BB-5 standard in 2009, which allows the Fibre Channel protocol to run directly on top of Ethernet, eliminating the TCP/IP stack and allowing for efficient performance of the Fibre Channel protocol.   FCoE also depends on the Data Center Bridging standards from IEEE 802.1 in order to ensure the “losslessness” and flow control needed by Fibre Channel.

An alternative to FCoE, iSCSI, was designed to run over standard Ethernet with TCP/IP and was designed to tolerate the “lossy” aspects of Ethernet.   Its architecture and the additional layers of encapsulation involved can impact latency and performance. However, more recent innovations in iSCSI have enabled it to run over a DCB Ethernet network, which enables iSCSI to inherit some of the enterprise storage features which have always been inherent in Fibre Channel.   For more on this, read last year’s blog  “How DCB Makes iSCSI Better ” from Allen Ordoubadian.

In 2013, INCITS submitted the FC-BB-6 standard for review which introduced, among other things, the VN2VN standard.   The VN2VN proposal will allow FCoE to work in a standard DCB switching environment without the presence of a Fibre Channel Forwarder (FCF).   An FCF allows for bridging between servers which are communicating with FCoE and storage devices which are communicating with traditional Fibre Channel.   As DCB switches and FCoE storage become more prevalent, the FC-BB-6 standard will allow for end-to-end FCoE connectivity in either a point to point (P2P) or DCB mesh environment. This will result in lower cost for FCoE environments. Products are beginning to appear which support VN2VN and over the next 18 months it is likely that all major vendors will support it. Check out our ESF Webcast “How VN2VN Will Help Accelerate Adoption of FCoE” for more details.

The availability of CNAs with processing capability allows for offloading storage protocol processing from the host processor, though some CNAs use host-based storage protocol initiators in system software and do selective stateless offloads in the data path.   Both FCoE and iSCSI require the storage protocol to be encapsulated in a frame to be sent across the Ethernet network.   In an enterprise environment, especially a virtual server environment, CPU utilization is tracked closely and target CPU thresholds are often set.   Anything which can minimize spikes in CPU utilization can allow for more workloads to be placed on servers and allows for predictable energy consumption.

For file storage, Ethernet has traditionally been the connectivity option of choice for file servers used as “shares” for centralized employee document storage. In the 21st century, usage of network attached storage (NAS) with the Network File System (NFS) has increased for enterprise databases and Hadoop clusters, especially with the availability of 10Gb Ethernet.   New features in NFS 4 and later introduced security and stateful protocol support after development of NFS was taken over by the Internet Engineering Task Force (IETF).

Object storage, has been around for nearly 20 years as a repository for storing data as objects which include not only the original file, but also a globally unique identifier and metadata which describes the object and various parameters about the object.   It has been used to store many forms of unstructured data, but found niches in certain areas, such as legal documents with retention policies and archiving photos and videos.   More recently, there seems to be a resurgence in object storage as the amount of unstructured data generated by enterprises continues to skyrocket.   Open source object storage in Ceph and OpenStack are also helping to drive the adoption. SNIA ESF is hosting a live Webcast on object storage on June 11, 2014, called “Object Storage 101.” I encourage you to register for this presentation for an unbiased look at the what, how and why of object storage technologies.

When combined with the advances in link speed, throughput capabilities, latency and input/output operations per second (IOPS) in modern 10Gb/s and 40Gb/s Ethernet, these existing and emerging Ethernet standards and storage architectures are having a profound effect on the ability of Ethernet as an enterprise class storage networking platform.   Vendors and customers are seeing the advantage in one wire, the Ethernet cable, carrying all LAN, WAN and storage traffic.

 

 

 

New ESF Live Webcast – Object Storage 101

Understanding the what, how and why behind object storage technologies.

Object storage systems are gaining quite a bit of attention as workloads continue to push scalability and availability limits of massive unstructured data repositories.  For some emerging workloads, object counts are measured by the 100’s of billions and capacities start in petabytes!

Need a tutorial on object storage? Join us on June 11th at 2:00 p.m. ET, 11:00 a.m. PT for our live Webcast, “Object Storage 101” as we take an unbiased look at the what, how and why behind object storage technologies. In this object storage primer, we’ll cover:

  • What is object storage
  • Where is it being deployed successfully
  • Key attributes of today’s object storage solutions
  • How object storage differs from traditional file or block technologies
  • Common enterprise use-cases and deployment approaches
  • Key considerations before deploying an object store

This will be a vendor-neutral live and lively discussion. Register now and please bring your questions for our expert panel.

 

The IETF, Consensus and NFSv4

The Internet Engineering Task Force is one of the older – and more unusual – internet organizations. It first met in 1986, and has regularly met since then several times a year. The last meeting was the March 2-7, 2014 IETF89 in London,  and I was fortunate to be in attendance.

What Makes the IETF Unique

What’s unusual about the IETF? From my perspective as someone who spends most of his working day dealing with more traditional standards bodies, two things stand out.

One, (in its own words) “it exists as a collection of happenings, but is not a corporation and has no board of directors, no members, and no dues.” The non-members divide themselves into loosely organized groups that agree on an agenda, discuss the stuff of the internet on mailing lists, generate documents that reflect consensus, and then agree to them as standards.

Two, the London IETF89 meeting was not a conference. The IETF doesn’t do conferences; there are no formal papers given by luminaries or industry experts. There is an agenda, agreed beforehand by consensus (there’s that word again) and then a few short and brief presentations on topics of interest. There are questions from the floor, discussions, and agreement of one form or another. I didn’t see a single formal vote; just that ill-defined and unquantifiable consensus where the outcome is just, well, agreed on.

Why the IETF Works

Revolution! Anarchy! This is unusual for a standards body, and it sounds like a recipe for disaster. But strangely, it isn’t, and from what I saw of the process, I think I see why.

It’s because it’s attended by software and network engineers who see code as the concrete representation of a good idea. They value running code, or stuff that works. That’s a powerful advantage over academic discussions, or codifying and formalizing a good (sometimes not-so-good) idea that no-one has yet implemented or is ever likely to.

Why face to face though? I reckon that even revolutionaries and anarchists need validation and a sense of community, and there was much of that in evidence in the corridors and public spaces outside of the formal meeting. Everyone talks like there’s no tomorrow. Ideas everywhere, grounded in what can be shown to actually work.

I attended, amongst others, the NFSv4 workgroup meetings. The agenda and notes from the meeting give some flavor of this consensus, and I am truly impressed by the process. I’m also thankful that there is some organization; Sorin Faibish (EMC) took notes, Tome Haynes (NetApp) chaired the meeting and kept it moving along, and all in all it was a great illustration of the best the industry can do.

As to the technical content… well, you can read the minutes. There are notes on security discussions led by Andy Adamson, on features proposed for NFSv4.2, and getting an RFC in place that accurately reflects implementations of earlier versions of NFSv4 and more. I’ll be blogging about this and more over the next few months. In the meanwhile, in the spirit of the IETF that favors working code over ideas and the concrete over the abstract, I’ll be presenting “Practical Steps to Implementing pNFS and NFSv4.1” at DSIcon on April 22-24 in Santa Clara, CA. OK, this one’s a conference, and anarchy will be in short supply, but we can still have great discussions and arguments in the corridors and public spaces outside of the formal meetings. I look forward to seeing you there!

Relentless Advance Of Ethernet – And Ethernet Storage Networking

As one Cisco colleague once said to me, “After the nuclear holocaust, there will be two things left: cockroaches and Ethernet.”   Not sure I like Ethernet’s unappealing company in that statement, but the truth it captures is that Ethernet, now entering its fifth decade (wow!), is ubiquitous and still continuing to advance at a breathtaking pace.   And as it advances, it advances the capabilities of storage networking based on the Ethernet backbone, be it file storage like NFS or SMB or block storage like iSCSI or FCoE.

Most recent evidence of Ethernet’s continuing and relentless evolution is illustrated in the 28 March 2014 announcement from the Ethernet Alliance congratulating the IEEE on formation of their IEEE P802.3bsâ„¢ Task Force:

The new group is chartered with the development of the IEEE P802.3bs 400 Gigabit Ethernet (GbE) project, which will define Ethernet Media Access Control (MAC) parameters, physical layer specifications, and management parameters for the transfer of Ethernet format frames at 400 Gb/s. As the leading voice of the Ethernet ecosystem, the Ethernet Alliance is ideally positioned to support this latest move towards standardizing and advancing 400Gb/s technologies through efforts such as the launch of the Ethernet Alliance’s own 400 GbE Subcommittee.

Ethernet is in production today from multiple vendors at 40GbE and supports all storage protocols, including FCoE, at those speeds.   Market forecasters expect the first 100GbE adapters to appear in 2015.   Obviously, it is too early to forecast when 400GbE will arrive, but the train is assuredly in motion.  And support for all the key storage protocols we see today on 10GbE and 40GbE will naturally extend to 100GbE and 400GbE.   Jim O’Reilly makes similar points in his recent Information Week article, “Ethernet: The New Storage Area Network where he argues, “Ethernet wins on schedule, cost, and performance.”

Beyond raw transport speed, the rich Ethernet infrastructure offers techniques to catapult your performance even beyond the fastest single-pipe speed.   The Ethernet world has established techniques for what is alternately referred to as link aggregation, channel bonding, or teaming.   The levels available are determined by the capabilities provided in system software and what switch vendors will support.   And those capabilities, in turn, are determined by what they respectively see as market demand.   VMware, for example, today will let you bond eight 10GbE channels into a single 80GbE pipe.   And that’s today with mainstream 10GbE technology.

Ethernet will continue to evolve in many different ways to support the needs of the industry.   Serving as a backbone for all storage networking traffic is just one of many such roles for Ethernet.   In fact, precisely because of the increasing breadth of usage models Ethernet supports, it will also continue to offer cost advantages.   The argument here is a very simple volume argument:

Total Server-class Adapter and LOM Market Ports

crehan-relentless-ethernet-420

Enough said, except to also note that volume is what funds speed roadmaps.

 

 

Use Cases for iSCSI and FCoE – Your Questions Answered

We had a tremendous response to our recent Webcast “Use Cases for iSCSI and FCoE – Where Each Makes Sense.” We had a lot of questions that we didn’t have time to address, so here are answers to them all. If you think of additional questions, please feel free to comment on this blog.

Q. You stated that FCoE requires End to End DCB connectivity.   That is not entirely true if you have native Fibre Channel storage.  

Once native FC is added, it is a hybrid FCoE/native FC network, not a simple FCoE network.   To be clearer I could’ve stated that for FCoE all Ethernet links traversed must be DCB enabled.

Q. Any impact on the protocol choice if you bring SDN solutions with overlay networks using VXLAN or NVGRE within virtual switching in hypervisors into the picture?

An excellent question, but complicated enough that it probably deserves a discussion on its own.   Overlay networks encapsulate Ethernet frames into routable packets.   On a view of strict adherence to ISO ordering, that means L2 constructs like Data Center Bridging become “invisible” until decap.   You lose the “lossless,” low-latency that FCoE expects and iSCSI may be taking advantage of, depending on your implementation.   That doesn’t really favor one protocol over the other, but FCoE may lose advantages it has over iSCSI when confined to a single L2 subnet.   But, unfortunately, the real answer to your question requires that you investigate in detail how the system software you are using handles encapsulated storage packets for both block storage protocols.   Microsoft’s Hyper-V is different from VMware’s vSphere, and each flavor of SDN could be different as well.   Proceed with caution.

Q. Have you heard of any enterprise customers who are interested in NIC Partitioning to separate iSCSI, FCoE, and typical network traffic?   If so, can you provide information about those customers’ use cases?

We have not come across many customers that are interested in large-scale deployments yet.

Q. What are the use cases for using standalone FCoE switches in SAN keeping aside Cisco UCS and Blade Servers?

There are two ways to look at this:

1) To use FCoE as an end-to-end (Initiating server to target storage array) solution instead of, or to replace, Fibre Channel. Although, not very prevalent to date, the reason this option is  chosen is to create a single converged LAN/SAN network that essentially retains the native FC constructs. The potential benefit would be in reduction in the amount of equipment required and the resources needed to deploy and administer two separate networks. This can be done in a phased approach, that uses multiprotocol switches, able to be used as Ethernet, FC or both on every port.   This will provide future proofing, reduced qualification costs, and lower OPEX by no longer requiring the purchase of multiple switches of different protocols.

2) To continue the use of FC for connectivity from the Top of Rack switch to the storage arrays, but use FCoE connectivity for server access. This is much more prevalent, and even when deployed outside of the Cisco UCS blade servers, is used to increase flexibility in highly virtualized server environments or multi-tenancy, where workloads/VMs from the same physical servers need to connect to different storage types.

Q. How do iSCSI and FCoE switches handle redundancy?   With FC, it is a best practice to implement dual fabrics with each storage system and server with paths down each.

Physical topology can be identical.   A storage system has one set of targets (either IP addresses or FCoE targets) on one switch and other targets on the other switch.   The initiators are configured to see any targets available on that leg.

To prevent Ethernet broadcast storms, technologies like per VLAN Spanning Tree and link aggregation are used.   TRILL can also be used.   For more details, I recommend reading this blog post by J Metz of Cisco.   http://blogs.cisco.com/datacenter/understanding-fcoe-and-trill-the-easy-way/

Q. Doesn’t increasing CPU mean software processing for FCoE and iSCSI at both endpoints can reduce costs considerably (i.e. no full HBA functionality needed at the endpoints)?

Absolutely.   If you have CPU cycles to spare at both endpoints, there is no reason to take on the extra cost of offload.   However, remember the principle behind Moore’s law also works on things like network adapters and HBAs.   It isn’t unreasonable to think that full offload capabilities will be included by default in a few years as technology progresses.   And even if they aren’t, the actual application of Moore’s law will push the difference in CPU utilization to be trivial.

Q. How do large data centers configure and manage iSCSI?   Is it by configuring the initiators and targets? My understanding is that most installations don’t use iSNS.   Is this true?

It is true that most implementations of iSCSI don’t use iSNS.   iSCSI initiators are simply configured with the target address by the administrator.   In the FC world, SNS is simply there, but the iSCSI equivalent, iSNS, has always been optional.   (SNS stands for Simple Name Service.   It is a service that helps initiators find targets.)

Q. I have been doing a lot of testing to compare iSCSI to FC and noticed that as we move from traditional storage to SSD-based storage the IOPS increase faster for FCoE. For example, 18K+ for FCoE vs. 12K for iSCSI. Have you seen similar results?

I have seen some similar results. However, I’ve also seen some that don’t necessarily line up with that.   I haven’t had the time to research this topic.   Sounds like a good topic for a future post.

Q. Do you have any information about the number of customers who use FCoE Boot and iSCSI Boot?

Unfortunately I don’t.   I do have anecdotal evidence to support customers using full-offload are more likely to boot from SAN.   Since more full-offload FCoE adapters are in use that full-offload iSCSI adapters today, it makes sense that more are booting over FCoE than iSCSI, but again, I don’t have any evidence to support that.

Q. What about iSCSI over RoCE?

There are three network/fabric technologies that use RDMA: InfiniBand, iWARP, and RoCE.  You can run iSCSI over any of these using the open-source iSER code supported by the Open Fabrics Alliance (https://www.openfabrics.org ).  iSER has been written to OFA’s “verbs” for RDMA (rather than to the more familiar “sockets).   However, note that of these three underlying transports, only iWARP is truly routable in general.   So technically you could implement iSER on InfiniBand or RoCE but it may not do for you what you expect iSCSI to do for you, i.e., go anywhere the internet goes.

Q. How does FCIP compare with iSCSI for long distance requirements?

FC networks rely on guaranteed packet delivery to deliver low latency, predictable performance. IP networks are a best effort network allowing for dropped packets with transmission retries. Given the possibility of latency loss, FCIP has experienced limited adoption. Useful where required. But, typically not a core part of infrastructure. If cost is a concern and long distance is required as part of the solution, then iSCSI is the better choice as it designed to allow for lossy networks.  

Q. Slide 22 – Was that hardware based iSCSI or software based iSCSI?

What was shown in the chart was software-based iSCSI, however you would see similar results with hardware-based iSCSI.

Q. What about FC vs FCoE performance? Any numbers?

Both Fibre Channel and FCoE can achieve line rate.   Here’s an example of testing Yahoo! did on an 8Gb FC HBA and a 10 GbE CNA that showed exactly that result: http://www.intel.com/content/www/us/en/network-adapters/10-gigabit-network-adapters/10-gbe-ethernet-yahoo-case-study.html .   So as Fibre Channel moves to 16 Gbps, it will outperform a 10GbE CNA, at least for peak performance.   However, the tables turn with a 40 GbE CNA, several of which are in production now.

Q. Do you see SR-IOV used currently or in the future to separate FCoE or iSCSI from standard LAN traffic?

So far we have seen that with the exception of a few operating systems (e.g., AIX), SR-IOV support today is network only.   Additionally, most customers want guaranteed bandwidth for storage and they wouldn’t be willing to run it on the same port as heavy NIC traffic.

Q. Are you aware of any FCoE targets for Windows?

I’m not aware of any right now.

Q. What is the max IOPS (at 4K) you can push thru 10G FCoE and iSCSI? Max latency (at 512 bytes)?

Latency is not determined by the pipe.

Q. Does FCoE really require a CNA? What about software only FCoE drivers?

Open FCoE does exist, but most FCoE implementations today use CNAs.   I do expect the adoption of FCoE software solutions to increase fairly substantially.   A lot of it comes down to the choice of booting via FCoE or another method.

Q. Do you think that the difference in FCoE/iSCSI usage for different App tiers can be related to the performance of the protocols?

Objectively, no.   Either protocol implemented can be configured to hit or exceed a performance number.   In my opinion, market perception of the protocols has more to do with the tier assignment than anything technical.

Q. Doesn’t 32 GbFC make it competitive with 40GbE FCoE?

From a purely technical perspective it helps, but FCoE is often deployed to reduce costs by simplifying cabling and switching by converging IP and storage onto the same fabric.   32Gb FC is slower than 40Gb and does nothing to reduce costs.   Unless 32Gb FC is significantly less expensive than 40 Gb Ethernet on a per port basis, market forces are going to push towards Ethernet.   There are still plenty of cases where organizations may deploy 32Gb FC instead of FCoE, but again, those criteria will mostly be non-technical.

Thanks to all my SNIA-ESF colleagues and Dell’Oro Group for helping me with these answers. If you missed the original Webcast, you can watch it on-demand here. You can also download a copy of the slides.

Why the FCoE – iSCSI Debate Continues

Why the FCoE – iSCSI Debate Continues

This is my first blog post for SNIA-ESF.  As a Principal Storage Architect, I have been doing extensive research on the factors that are driving FCoE vs. iSCSI choices over the last several years. The more I dive into the topic, the more intriguing the debate becomes. In fact, this blog is a preview of an upcoming white paper I’m writing and a Webcast SNIA is hosting on February 18th. If you agree this debate is interesting, I encourage you to attend. Details on the Webcast are at the end of this post.

A Look Back at FCoE and iSCSI History

There are two entrenched standards for block storage protocols over Ethernet networks.   FCoE was ratified in 2009, while iSCSI was ratified in 2004.   Of course, various vendors and early adopters supported these protocols before ratification, so the history of these protocols is a couple of years longer than it looks, respectively.   While iSCSI simply encapsulates the SCSI protocol in IP, FCoE operates lower in the network stack and to do so required many enhancements to Ethernet.   While iSCSI runs on any IP network (mostly Ethernet these days), FCoE requires Data Center Bridging and Converged Network Adapters all running at 10 Gbps or faster.

All of the Data Center Bridging enhancements that make FCoE possible, like lossless Ethernet, benefit all of the protocols using Ethernet as the transport protocol.   DCB doesn’t just make FCoE possible, but it improves iSCSI at the same time   (see the SNIA-ESF blog, How DCB Makes iSCSI Better). So given that modern servers, networks, and storage may all be connected by hardware capable of running FCoE, that same network is also able to run iSCSI, as well as other network traffic.   Nothing precludes them from running simultaneously on the same network either.   The leading storage vendors that offer both FCoE and iSCSI target systems allow administrators to present the same LUN over either protocol with little effort, so a transition from one protocol to the other is not difficult.

Strengths and Weaknesses

So which network protocol is the right choice?

Each protocol has strengths and weaknesses when judged relative to each other.   FCoE has higher throughput at lower host CPU utilization than iSCSI and FCoE doesn’t have to process the TCP/IP stack as iSCSI does. iSCSI is relatively simple to setup and troubleshoot when compared to FCoE because zoning is not a factor and IP connectivity (although not optimized for storage traffic) is likely in place already.  Also, while FCoE has a comprehensive set of existing tools available to ease troubleshooting, there aren’t as many qualified people to use them in most enterprises.   Ease of use, plus the ability to use low cost NICs and switches, gives iSCSI a cost advantage.   (However, if you check out our SNIA-ESF webcast, “How VN2VN Will Help Accelerate Adoption of FCoE,” you’ll hear about new technologies that reduce the costs of deploying FCoE.) FC, and by extension FCoE, are perceived to be enterprise-grade, suitable for all workloads; and while iSCSI is being widely adopted at the enterprise level, it is still perceived by some not to be ready for Tier-1 applications.   The graph below is excerpted from the report “Intel 10GbE Adapter Performance Evaluation” prepared by Demartek for Intel in September 2010.   This data is consistent with the rest of the report findings and is only intended to be representative of the results from comparative iSCSI and FCoE testing.   The report is interesting reading and I recommend looking at it for more information. This graph shows IOPS and CPU utilization for JetStress tests running against NetApp storage over multi-path iSCSI and FCoE.   Note that latencies were all similar and running the tests against EMC storage showed similar results.

FCoE-iSCSI_Data

Many other factors must be considered, but according to industry pundits- as well as my own personal experience – in the majority of cases either protocol is adequate for the task at hand, and that is to effectively transfer block data across an Ethernet network.

Maximizing Throughput

The reality is, most servers, applications, and storage arrays simply won’t take advantage of FCoE’s superior performance or any storage protocol running over 10GbE.   iSCSI and NAS protocols are very fast and are typically sufficient to meet most application requirements.   But this is not meant to be a SAN vs NAS post – besides years of history, thousands of happy end users, and billions of continued investment show that both work well enough to meet most business needs.   The commonly deployed storage systems and hosts are simply not configured with enough hardware to saturate multiple 10 gigabit network links.   While this is rare today, it is going to become more common to see systems capable of saturating 10GbE pipes in the near future, especially as flash memory, either in all-flash arrays or tiered storage systems, find more application.   (Hear more on the impact of flash in our SNIA-ESF webcast, “Flash – Plan for the Disruption“). At least as it relates to spinning media disk systems – network bandwidth increases faster than storage system throughput can keep up.   So consider the storage system to be the bottleneck or limiting factor when evaluating storage network performance.   After all, in most data center environments, the ratio of servers and applications to storage systems is high. So, it’s reasonable to expect the storage system to be the bottleneck.   The absolute throughput of FCoE and iSCSI, when pushing a storage system to its limits, is not sufficient alone to be used as the sole basis for the decision between the two protocols except, for a few edge cases.   Bottom line: Whether the storage system is the bottleneck or the network is the bottleneck the performance relationship between FCoE and iSCSI does not change.

These edge cases tend to be extremely IO intensive database workloads and big data applications, such as Hadoop.   Citing the graph above, FCoE is about 15-20% faster on identical hardware than iSCSI.   Granted this is a single graph of a single test, but the data is consistent across tests performed by IBM using Emulex network interfaces.   If absolute throughput and efficiency (both network and CPU) are the only criteria when deciding between block protocols, FCoE looks like the choice.   Since these cases are rare – because complexity, supportability, and even politics are almost always considered – the decision is not so obvious.   Again, beyond the scope of this article, NAS protocols should be considered when determining the proper protocol for an application also.

Is There a Clear Winner?

While FCoE can claim technical superiority, iSCSI has the edge in cost and supportability.   The number and range of systems supporting iSCSI connectivity is greater, particularly at the entry level.   What’s more, the availability of people that can troubleshoot end-to-end connectivity for iSCSI is also much greater.   (The “ping” command diagnoses most iSCSI connectivity problems.)   Also, do a resume search on Monster or LinkedIn and the number of people that can configure VLANs dwarfs the number that can properly zone a Fibre Channel network.   Greater familiarity reduces the support and operating cost of iSCSI.

IDC predicts that FCoE revenue will ramp very quickly through 2016. (If available to you, see the IDC Worldwide Enterprise Storage Systems 2012-2016 Forecast Update.)   As customers decide to transition existing Fibre Channel networks to an Ethernet infrastructure, deploying FCoE would be a comfortable choice due to existing IT expertise and functional expectations of the Fibre Channel protocol.

Both iSCSI and FCoE are capable storage protocols and choosing one over the other will likely be dependent upon budget, IT skill set, and application requirements

Don’t forget to join us on Feb. 18th

Again, I encourage you to attend our February 18th Webcast, “Use Cases for iSCSI and FCoE –Where Each Makes Sense.”   Analysts from Dell’Oro Group will share their latest market research on this topic and I’ll dive into use cases for both iSCSI and FCoE. It’s a live event, so please come with your toughest questions. I hope you’ll join us!

Update: