In order to fully unlock the potential of the NVMe® IP based SANs, we first need to address the manual and error prone process that is currently used to establish connectivity between NVMe Hosts and NVM subsystems. Several leading companies in the industry have joined together through NVM Express to collaborate on innovations to simplify and automate this discovery process.
This was the topic of discussion at our recent SNIA Networking Storage Forum webcast “NVMe-oF: Discovery Automation for IP-based SANs” where our experts, Erik Smith and Curtis Ballard, took a deep dive on the work that is being done to address these issues. If you missed the live event, you can watch it on demand here and get a copy of the slides. Erik and Curtis did not have time to answer all the questions during the live presentation. As promised, here are answers to them all.
Q. Is the Centralized Discovery Controller (CDC) highly available, and is this visible to the hosts? Do they see a pair of CDCs on the network and retry requests to a secondary if a primary is not available?
A. Each CDC instance is intended to be highly available. How this is accomplished will be specific to vendor and deployment type. For example, a CDC running inside of an ESX based VM can leverage VMware’s high availability (HA) and fault tolerant (FT) functionality. For most implementations the HA functionality for a specific CDC is expected to be implemented using methods that are not visible to the hosts. In addition, each host will be able to access multiple CDC instances (e.g., one per “IP-Based SAN”). This ensures any problems encountered with any single CDC instance will not impact all paths between the host and storage. One point to note, it is not expected that there will be multiple CDC instances visible to each host via a single host interface. Although this is allowed per the specification, it does make it much harder for administrators to effectively manage connectivity.
Q. First: Isn’t the CDC the perfect Denial-of-Service (DoS) attack target? Being the ‘name server’ of NVMe-oF, when the CDC is compromised no storage is available anymore. Second: the CDC should be running as a multi-instance cluster to realize high availability or even better, have the CDC distributed like name-server in Fibre Channel (FC).
A. With regard to denial-of-service attacks. Both FC’s
Name Server and NVMe-oF’s CDC are susceptible to this type of problem and both
have the ability to mitigate these types of concerns. FC can fence or shut a
port that has a misbehaving end device is attached to it. The same can be done
with Ethernet, especially when the “underlay config service”
mentioned during the presentation is in use. In addition, the CDC’s role is
slightly different than FC’s Name Server. If a denial-of-service attack was
successfully executed against a CDC instance, existing host to storage
connections would remain intact. Hosts that are rebooted or disconnected and
reconnected could have a problem connecting to the CDC and could have a problem
reconnecting to storage via the IP SAN that is experiencing the DoS attack.
For the second concern, it’s all about the implementation. Nothing in the
standard prevents the CDC from running in an HA or FT mode. When the CDC is
deployed as a VM, hypervisor resident tools can be leveraged to provide HA and
FT functionality. When the CDC is deployed as a collection of microservices
that are running on the switches in an IP SAN, the services will be distributed.
One implementation available today uses distributed microservices to
enable scaling to meet or exceed what is possible with an FC SAN today.
Q. Would Dell/SFSS CDC be offered as a generic Open Source in public domain so other third parties don’t have to develop their own CDC? If other third parties do not use Dell’s open CDC and develop their own CDC, how would that multi-CDC control work for a customer with multi-vendor storage arrays with multiple CDCs?
A. Dell/SFSS CDC will not be open source. However, the reason Dell worked with HPE and other storage vendors on the 8009 and 8010 specifications, is to ensure that whichever CDC instance the customer chooses to deploy, all storage vendors will be able to discover and interoperate with it. Dell’s goal is to create an NVMe IP-Based SAN ecosystem. As a result, Dell will work to make the CDC implementation (SFSS) interoperate with every NVMe IP-Based product, regardless of the vendor. The last thing anyone wants is for customers to have to worry about basic compatibility.
Q. Does this work only for IPv4? We’re going towards IPv6 only environments?
A. Both IPv4 and IPv6 can be used.
Q. Will the discovery domains for this be limited to a single Ethernet broadcast domain or will there be mechanisms to scale out discovery to alternate subnets, like we see with DHCP & DHCP Relay?
A. By default
mDNS is constrained to a single broadcast domain. As a result, when you deploy
a CDC as a VM, if the IP SAN consists of multiple broadcast domains per IP SAN
instance (e.g., IP SAN A = VLANs/subnets 1, 2 and 3; IP SAN B = VLANs/subnets
10, 11 and 13) then you’ll need to ensure an interface from the VM is attached
to each VLAN/subnet. However, creating a VM interface for each subnet is a sub-optimal
user experience and as a result, there is the concept of an mDNS proxy in the
standard. The mDNS proxy is just an mDNS responder that resides on the switch
(similar concept to a DHCP proxy) that can respond to mDNS requests on each
broadcast domain and point the end devices to the IP Address of the CDC (which could
be on a different subnet). When you are selecting a switch vendor to use for
your IP SAN, ensure you ask if they support an mDNS proxy. If they do not, you
will need to do extra work to get your IP SAN configured properly. When you
deploy the CDC as a collection of services running on the switches in an IP
SAN, one of these services could be an mDNS responder. This is how Dell’s SFSS
will be handling this situation.
One final point about IP SANs that span multiple subnets: Historically, these
types of configurations have been difficult to administer because of the need
to configure and maintain static route table entries on each host. NVM Express
has done an extensive amount of work with 8010 to ensure that we can eliminate
the need to configure static routes. For more information about the solution to
this problem, take a look at nvme-stas on github.
Q. A question for Erik: If mDNS turned out to be a problem, how did you work around it?
A. mDNS is actually a problem right now because none of the implementations in the first release actually support it. In the second release this limitation is resolved. In any case, the only issue I am expecting with mDNS will be environments that don’t want to use it (for one reason or another) or can’t use it (because the switch vendor does not support an mDNS proxy). In these situations, you can administratively configure the IP address of the CDC on the host and storage subsystem.
Q. Just a comment. In my humble opinion, slide 15 is the most important slide at helping people see- at a glance – what these things are. Nice slide.
A. Thanks, slide 15 was one of the earliest diagrams we created to communicate the concept of what we’re trying to do.
Q. With Fibre Channel there are really only two HBA vendors and two switch vendors, so interoperability, even with vendor-specific implementations, is manageable. For Ethernet, there are many NIC and Switch vendors. How is interoperability going to be ensured in this more complex ecosystem?
A. The FC discovery protocol is very stable. We have not seen any basic interop issues related to login and discovery for years. However, back in the early days (’98-02), we needed to validate each HBA/Driver/FW version and do so for each OS we wanted to support. With NVMe/TCP, each discovery client is software based and OS specific (not HBA specific). As a result, we will only have two host-based discovery client implementations for now (ESX and Linux – see nvme-stas) and a discovery client for each storage OS. To date, we have been pleasantly surprised at the lack of interoperability issues we’ve seen as storage platforms have started integrating with CDC instances. Although it is likely we will see some issues as other storage vendors start to integrate with CDC instances from different storage vendors.
Q. A lot of companies will want to use NVMe-oF via IP/Ethernet in a micro-segmented network. There are a lot of L3/routing steps to reach the target. This presentation did not go into this part of scalability, only into scalability of discovery. Today, all networks are now L3 with smaller and smaller subnets with lot of L3-points.
A. This is a fair point. We didn’t have time to go into the work we have done to address the types of routing issues we’re anticipating and what we’ve done to mitigate them. However, nvme-stas, The Dell sponsored open-source discovery client for Linux, demonstrates how we use the CDC to work around these types of issues. Erik is planning to write about this topic on his blog brasstacksblog.typepad.com. You can follow him on twitter @provandal to make sure you don’t miss it.