An FAQ to Make Your Storage System Hum - SNIA on Data, Networking & Storage

In our most recent “Everything You Wanted To Know About Storage But Were Too Proud To Ask” webcast series – Part Sepia – Getting from Here to There, we discussed terms and concepts that have a profound impact on storage design and performance. If you missed the live event, I encourage you to check it our on-demand. We had many great questions on encapsulation, tunneling, IOPS, latency, jitter and quality of service (QoS). As promised, our experts have gotten together to answer them all.

Q. Is there a way to measure jitter?

A. Jitter can be measured directly as a statistical function of the latency, typically as the Variance or Standard Deviation of the latency. For example a storage device might show an average latency of 5ms with a standard deviation of 1.5ms. This means roughly 95% of the transactions have a latency between 2ms and 8ms (average latency plus/minus two standard deviations), however many storage customers measure jitter indirectly by showing the 99.9%, 99.99%, or 99.999% latency. For example if my storage system has 99.99% latency of 8ms, it means 99.99% of transactions have latency <=8ms and 1/10,000 of transactions have latency >8ms. Percentile latency is an indirect measure of jitter but often easier to calculate or understand than the actual jitter.

Q. Can jitter be easily characterized for storage, media, and networks. How and what tools are available for doing this?

A. Jitter is usually easy to measure on a network using standard network monitoring and reporting tools. It may or may not be easy to measure on storage systems or storage media, depending on the tools available (either built-in to the storage OS or using an external management or monitoring tool). If you can record the latency of each transaction or packet, then it’s easy to calculate and show the jitter using standard statistical measures such as Variance or Standard Deviation of the latency. What most customers do is just measure the 99.9%, 99.99%, or 99.999% latency. This is an indirect measure of jitter but is often much easier to report and understand than the actual jitter.

Q. Generally IOPS numbers are published for a particular block size like 8k write/read size, but in reality, IO request per second could be of mixed sizes, what is your perspective on this?

A. Most IOPS benchmarks test only one I/O size at a time. Most individual real workloads (for example databases) also use only one I/O size. It is true that a storage controller or HDD/SSD might need to support multiple workloads simultaneously, each with a different I/O size. While it is possible to run benchmarks with a mix of different I/O sizes, it’s rarely done because then there are too many workload combinations to test and publish. Some storage devices do not perform well if they must handle both small random and large sequential workloads simultaneously, so a smart storage controller might assign different workload types to different disk groups.

Q. One often misconfigured parameter is queue depth. Can you talk about how this relates to IOPS, latency and jitter?

A. Queue depth indicates how many tasks or I/Os can be lined up for a particular resource, such as a storage controller, network interface, or CPU. Having a higher queue depth ensures the resource stays highly utilized because it always has a new task to do as soon as it finishes its current task(s). This can result in higher IOPS because the CPU is less likely to have idle time waiting for new tasks to be put into its queue. But it could also increase latency because longer queues mean each task spends more time waiting in a queue. It’s easy to misconfigure queue depth because you it needs to be deep enough to keep the resource (CPU/controller/interface) busy but not so deep that each transaction spends a long time in the queue.

Q. Can you please repeat all your examples of tunneling? GRE, MPLS, what others? How can it be IPv4 via IPv6?

A. VXLAN, LISP, GRE, MPLS, IPSEC. Any time you encapsulate and send one protocol over another and decapsulate at the other end to send the original frame that process is tunneling. In the case we showed of IPv6 over IPv4, you are taking an original IPv6 frame with its IPv6 header of source address to destination address all IPv6 and sending it over and IPv4 enabled network we are encapsulating the IPv6 frame with an IPv4 header and “tunneling” IPv6 over the IPv4 network.

Q. I think it’d be possible to configure QoS to a point that exceeds the system capacity. Are there any safeguards on avoiding this scenario?

A. Some types of QoS allow over-provisioning and others do not. For example a QoS that imposes only maximum limits (and no minimum guarantees) on workloads might not prevent many workloads from exceeding system capacity. If the QoS allows over-provisioning, then you should use system monitoring and alerts to warn you when system capacity has been exceeded, or when any workloads are not getting their minimum guaranteed performance.

Q. Is there any research being done on using storage analytics along with artificial intelligence (AI) to assist with QoS?

A. There are a number of storage analytics products, both third party and storage vendor specific that help with QoS. Whether any of these tools may be described as using AI is debatable, since we’re in the early days of using AI to do much in the storage arena. There are many QoS research projects, and no doubt they will eventually make their way into commercially available products if they prove useful.

Q. Are there any methods (measurements) to calculate IOPS/MBps in tier capable storage? Would it be wrong metric if we estimate based on medium level, example tier 2 (between 1 and 3)?

A. This question needs refinement, since tiering is sometimes a cache model rather than a data movement model. And knowing the answer may not actually help! Vendors do have tools (normally internal, since they are quite complex) that can help with the planning of tiered storage.

By now, we hope you’re not “too proud” to ask some of these storage networking questions. We’ve produced four other webcasts in this “Everything You Wanted To Know About Storage,” series to date. They are all available on-demand. And you can register here for our next one on July 6^th where we’ll bring in experts to discuss:

Storage APIs and POSIX
Block, File, and Object storage
Byte Addressable and Logical Block Addressing
Log Structures and Journaling Systems

The Ethernet Storage Forum team and I hope to see you there!

Update: If you missed the live event, it’s now available on-demand. You can also download the webcast slides.

Leave a Reply