SNIA Networking Storage Forum – New Name, Expanded Charter

Anyone who follows technology knows that it is a fast-paced world with rapid changes and constant innovations. SNIA, together with its members, technical work groups, Forums, and Initiatives, continues to embrace, educate, and develop standards to make technology more available and better understood.

At the SNIA Networking Storage Forum, we’ve been at the forefront of diving into technology topics that extend beyond traditional networked storage, providing education on AI, edge, acceleration and offloads, hyperconverged infrastructure, programming frameworks, and more. We still care about and spend a lot of time on networked storage and storage protocols, but we felt it was time that the name of the group better reflected the broad range of timely topics we’re covering. Read More

Q&A for Accelerating Gen AI Dataflow Bottlenecks

Generative AI is front page news everywhere you look. With advancements happening so quickly, it is hard to keep up. The SNIA Networking Storage Forum recently convened a panel of experts from a wide range of backgrounds to talk about Gen AI in general and specifically discuss how dataflow bottlenecks can constrain Gen AI application performance well below optimal levels. If you missed this session, “Accelerating Generative AI: Options for Conquering the Dataflow Bottlenecks,” it’s available on-demand at the SNIA Educational Library.

We promised to provide answers to our audience questions, and here they are.

Q: If ResNet-50 is a dinosaur from 2015, which model would you recommend using instead for benchmarking?

A: Setting aside the unfair aspersions being cast on the venerable ResNet-50, which is still used for inferencing benchmarks 😊, we suggest checking out the MLCommons website. In the benchmarks section you’ll see multiple use cases on Training and Inference. There are multiple benchmarks available that can provide more information about the ability of your infrastructure to effectively handle your intended workload. Read More

Hidden Costs of AI Q&A

At our recent SNIA Networking Storage Forum webinar, “Addressing the Hidden Costs of AI,” our expert team explored the impacts of AI, including sustainability and areas where there are potentially hidden technical and infrastructure costs. If you missed the live event, you can watch it on-demand in the SNIA Educational Library. Questions from the audience ranged from training Large Language Models to fundamental infrastructure changes from AI and more. Here are answers to the audience’s questions from our presenters.

Q: Do you have an idea of where the best tradeoff is for high IO speed cost and GPU working cost? Is it always best to spend maximum and get highest IO speed possible?

A: It depends on what you are trying to do If you are training a Large Language Model (LLM) then you’ll have a large collection of GPUs communicating with one another regularly (e.g., All-reduce) and doing so at throughput rates that are up to 900GB/s per GPU! For this kind of use case, it makes sense to use the fastest network option available. Any money saved by using a cheaper/slightly less performant transport will be more than offset by the cost of GPUs that are idle while waiting for data.

If you are more interested in Fine Tuning an existing model or using Retrieval Augmented Generation (RAG) then you won’t need quite as much network bandwidth and can choose a more economical connectivity option.

It’s worth noting Read More

Throughput, IOPs, and Latency Q&A

Throughput, IOPs, and latency are three terms often referred to as storage performance metrics. But the exact definitions of these terms and how they differ can be confusing. That’s why the SNIA Networking Storage Forum (NSF) brought back our popular webinar series, “Everything You Wanted to Know About Storage, But Were Too Proud to Ask,” with a live webinar, “Everything You Wanted to Know about Throughput, IOPs, and Latency But Were Too Proud to Ask.”

The live session was a hit with over 850 views in the first 48 hours. If you missed the live event, you can watch it on-demand. Our audience asked several interesting questions, here are our answer to them.

Q: Discussing congestion and mechanisms at play in RoCEv2 (DCQCN and delay-change control) would be more interesting than legacy BB_credit handling in FC SAN… Read More

Here’s Everything You Wanted to Know About Throughput, IOPs, and Latency

Any discussion about storage systems is incomplete without the mention of Throughput, IOPs, and Latency. But what exactly do these terms mean, and why are they important? To answer these questions, the SNIA Networking Storage Forum (NSF) is bringing back our popular webinar series, “Everything You Wanted to Know About Storage, But Were Too Proud to Ask.”

Collectively, these three terms are often referred to as storage performance metrics. Performance can be defined as the effectiveness of a storage system to address I/O needs of an application or workload. Different application workloads have different I/O patterns, and with that arises different bottlenecks, so there is no “one-size fits all” in storage systems. These storage performance metrics help with storage solution design and selection based on application/workload demands.

Join us on February 7, 2024, for “Everything You Wanted to Know About Throughput, IOPS, and Latency, But Were Too Proud to Ask.” In this webinar, we’ll cover: Read More

Addressing the Hidden Costs of AI

The latest buzz around generative AI ignores the massive costs to run and power the technology. Understanding what the sustainability and cost impacts of AI are and how to effectively address them will be the topic of our next SNIA Networking Storage Forum (NSF) webinar, “Addressing the Hidden Costs of AI.” On February 27, 2024, our SNIA experts will offer insights on the potentially hidden technical and infrastructure costs associated with generative AI. You’ll also learn best practices and potential solutions to be considered as they discuss: Read More

NVMe®/TCP Q&A

The SNIA Networking Storage Forum (NSF) had an outstanding response to our live webinar, “NVMe/TCP: Performance, Deployment, and Automation.” If you missed the session, you can watch it on-demand and download a copy of the presentation slides at the SNIA Educational Library. Our live audience gave the presentation a 4.9 rating on a scale of 1-5, and they asked a lot of detailed questions, which our presenter, Erik Smith, Vice Chair of SNIA NSF, has answered here.

Q: Does the Centralized Discovery Controller (CDC) layer also provide drive access control or is it simply for discovery of drives visible on the network?

A: As defined in TP8010, the CDC only provides transport layer discovery. In other words, the CDC will allow a host to discover transport layer information (IP, Port, NQN) about the subsystem ports (on the array) that each host has been allowed to communicate with. Provisioning storage volumes to a particular host is additional functionality that COULD be added to an implementation of the CDC. (e.g., Dell has a CDC implementation that we refer to as SmartFabric Storage Software (SFSS).

Q: Can you provide some examples of companies that provide CDC and drive access control functionalities? Read More

Automating Discovery for NVMe IP-based SANs

NVMe® IP-based SANs (including transports such as TCP, RoCE, and iWARP) have the potential to provide significant benefits in application environments ranging from the Edge to the Data Center. However, before we can fully unlock the potential of the NVMe IP-based SAN, we first need to address the manual and error prone process that is currently used to establish connectivity between NVMe Hosts and NVM subsystems.  This process includes administrators explicitly configuring each Host to access the appropriate NVM subsystems in their environment. In addition, any time an NVM Subsystem interface is added or removed, a Host administrator may need to explicitly update the configuration of impacted hosts to reflect this change. 

Due to the decentralized nature of this configuration process, using it to manage connectivity for more than a few Host and NVM subsystem interfaces is impractical and adds complexity when deploying an NVMe IP-based SAN in environments that require a high-degrees of automation.

Read More

Beyond NVMe-oF Performance Hero Numbers

When it comes to selecting the right NVMe over Fabrics™ (NVMe-oF™) solution, one should look beyond test results that demonstrate NVMe-oF’s dramatic reduction in latency and consider the other, more important, questions such as “How does the transport really impact application performance?” and “How does the transport holistically fit into my environment?”

To date, the focus has been on specialized fabrics like RDMA (e.g., RoCE) because it provides the lowest possible latency, as well as Fibre Channel because it is generally considered to be the most reliable.  However, with the introduction of NVMe-oF/TCP this conversation must be expanded to also include considerations regarding scale, cost, and operations. That’s why the SNIA Networking Storage Forum (NSF) is hosting a webcast series that will dive into answering these questions beyond the standard answer “it depends.”

Read More