Throughput, IOPs, and latency are three terms often referred to as storage performance metrics. But the exact definitions of these terms and how they differ can be confusing. That’s why the SNIA Networking Storage Forum (NSF) brought back our popular webinar series, “Everything You Wanted to Know About Storage, But Were Too Proud to Ask,” with a live webinar, “Everything You Wanted to Know about Throughput, IOPs, and Latency But Were Too Proud to Ask.”
The live session was a hit with over 850 views in the first 48 hours. If you missed the live event, you can watch it on-demand. Our audience asked several interesting questions, here are our answer to them.
Q: Discussing congestion and mechanisms at play in RoCEv2 (DCQCN and delay-change control) would be more interesting than legacy BB_credit handling in FC SAN…
A: FC’s BB_Credit mechanism was chosen for the sake of example, but many of the concepts discussed such as oversubscription, congestion, congestion spreading, etc apply to other transport protocols as well. For example, RoCE can benefit from the use of explicit congestion control mechanisms such as DCQCN, or a combination of PFC and ECN, and then there are vendor specific approaches such as RTTCC that can be used to minimize the dependency on the need for explicit congestion control.
Q: Does NVMe have or need the equivalent of SCSI ALUA?
A: Yes, the NVMe equivalent of SCSI’s ALUA is called Asymmetric Namespace Access (ANA).
Q: Will CXL fundamentally change IO performance and how soon will CXL become prevalent?
A: CXL runs over PCIe, and as a result will be subject to many of the same constraints, especially distance. Most of the CXL use cases will mainly be applicable at rack scale, therefore we do not anticipate it being used for IO between a compute node and external storage.
Q: Do people use a single storage system for various phases of AI/ML use cases (e.g., data collection and training) — or do they use different storage systems (e.g., one storage system for the data collection phase and a different storage system for AI/ML training)?
A: Typically, the same storage system is used for ingestion and checkpointing. In either case, high performance is key because the more time the system spends on data transfer for either of these phases, the longer the GPUs remain idle and consequently for training to complete.
Q: In the increasing world of hyperconverged (“shared nothing”) architecture where storage is spread across different nodes, what is the best approach to increase IOPs especially for applications like video and voice where real-time response times are important?
A: Using a scale-out approach (i.e., adding nodes that provide storage services) is the most common way to increase overall system performance for these kinds of deployments. Ensuring the network is properly sized and not congested as well as ensuring appropriate compute resources in each node can help.
Q: Why is small file I/O overhead so large? Metadata generally is still much smaller compared to files of a few KB or larger.
A: While metadata is a small percentage of the data transfer phase for any file transfer and is approximately the same percentage for any file size, small files require the same number of host interactions to initiate and complete the I/O. This host interaction of a command for the operation followed by the device return of completion notification is what adds to the percentage of overhead for small file transfers. For example, if a file is only 128 bytes long but there is a 32 byte request and a 32 byte completion, the overhead is 33% of the time required for the file transfer. For a 4 KB file transfer, this overhead is reduced to 1.5% of the time required to transfer the file.
Q: Is SPC 1/2 still a relevant benchmark for IO performance? If not, what is the gold standard today?
A: Yes, Storage Performance Council Benchmarks SPC 1/2 are still relevant and are continuing to be updated. You can also use tools like Vdbench and FIO for performance characterization. That said, the best benchmark to use is the one that most closely matches the application you expect to be running on the infrastructure you are testing. As AI becomes increasingly more important you can also checkout MLPerf.