At our recent SNIA Networking Storage Forum (NSF) webcast “How Fibre Channel Hosts and Targets Really Communicate” our Fibre Channel (FC) experts explained exactly how Fibre Channel works, starting with the basics on the FC networking stack, link initialization, port types, and flow control, and then dove into the details on host/target logins and host/target IO. It was a great tutorial on Fibre Channel. If you missed it, you can view it on-demand. The audience asked several questions during the live event. Here are answers to them all:
Q. What is the most common problem that we face in the FC protocol?
A. Much the same as any other network protocol, congestion is the most common problem found in FC SANs. It can take a couple of forms including, but not limited to, host oversubscription and “Fan-in/Fan-out” ratios of host ports to storage ports, but it is probably the single largest generator of support cases. Another common problem is the ‘Host cannot see target’ kind of problem.
Q. What are typical latencies for N-to-N (node-to-node) Port and N-F-N (one switch between)?
A. Latencies vary from switch type to switch type and also vary based on the type of forwarding that is done. Port to port on a switch, I would say is from 1us to 5us in general.
Q. Has the Fabric Shortest Path First (FSPF) always been there or is there a minimum FC speed at which it was introduced in? Also, how is the FSPF determined? Is it via shortest path only or does it also take into account for speeds of the switches along the path?
A. While Fibre Channel has existed since 1993 at 133Mbit speed, FSPF was developed by the INCITS T11 Technical Committee and was published in 2000 as a cost-based Link State routing protocol. Costs are based on link speeds. The higher the link speed, the lower the cost. The cost is 1012 / Bandwidth(bps) = cost. There have been variations of implementations that allowed the network administrator to artificially set a link cost and force traffic into a path, but the better case is to simply allow FSPF to do its normal work. And yes, the link costs are considered for all of the intermediate devices along the path.
Q. All of this FSPF happens without even us noticing, right? Or do we need to manually configure?
A. Yes, all of the FSPF routing happens without any manual configuration. Most users don’t even realize there is an underlying routing protocol.
Q. Is it a best practice to have all ports in the system run at the same speed? We have storage connected at 32Gb interfaces and a hundred clients with 16Gb interfaces. Would this make the switch’s job easier?
A. It’s virtually impossible to have all ports of a FC SAN (or any network of size) connect at the same speed. In fact, the more common environment is for multiple versions of server and storage technology to have been “organically grown over time” in the datacenter. Even if that was somehow done, then there still can be congestion caused by hosts and targets requesting data from multiple simultaneous sources. So, having a uniform speed doesn’t really fix anything even if it might make some things a bit better. That said, it is always helpful to make certain that your HBA device drivers and firmware versions are up to date.
Q. From your experience, is there any place where the IO has gone wrong?
A. Not sure what ‘IO gone wrong’ means. All frames that transverse the SAN are cyclic redundancy check (CRC) checked. That might happen on each hop or it might just happen at the end devices. But frames that are found to be corrupted should never be incorporated into the LUN.
Q. Is there a fabric notification feature for these backpressure events?
A. Yes, the recent standards have several mechanisms for notification. This is called ‘Fabric Performance Impact Notifications (FPIN). It includes several things such as ELS (extended link service) notifications sent through software to identify congestion, link integrity and SCSI command delivery issues. In Gen 7/64Gb platforms it also includes an in-band hardware signal for credit stall and oversubscription conditions. Today both RHEL and AIX support the receipt of FPIN link integrity notifications and integrate it into their respective MPIO interfaces allowing them to load balance/avoid a “sick but not dead” link. Additional operating systems are on the way and the first of the array vendors to support this are expected “soonish.” While there is no “silver bullet” that solves every congestion problem, FPIN as a tool which engages the whole ecosystem instead of leaving the “switch in the middle” to interpret data on its own is a huge potential benefit.
Q. There is so much good information here. Are the slides available?
A. Yes, the session has been recorded and is available on-demand along with the slides at the SNIA Educational Library where you can also search for countless educational content storage.