An FAQ to Make Your Storage System Hum

In our most recent “Everything You Wanted To Know About Storage But Were Too Proud To Ask” webcast series – Part Sepia – Getting from Here to There, we discussed terms and concepts that have a profound impact on storage design and performance. If you missed the live event, I encourage you to check it our on-demand. We had many great questions on encapsulation, tunneling, IOPS, latency, jitter and quality of service (QoS). As promised, our experts have gotten together to answer them all.

Q. Is there a way to measure jitter?

A. Jitter can be measured directly as a statistical function of the latency, typically as the Variance or Standard Deviation of the latency. For example a storage device might show an average latency of 5ms with a standard deviation of 1.5ms. This means roughly 95% of the transactions have a latency between 2ms and 8ms (average latency plus/minus two standard deviations), however many storage customers measure jitter indirectly by showing the 99.9%, 99.99%, or 99.999% latency. For example if my storage system has 99.99% latency of 8ms, it means 99.99% of transactions have latency <=8ms and 1/10,000 of transactions have latency >8ms. Percentile latency is an indirect measure of jitter but often easier to calculate or understand than the actual jitter.

Q. Can jitter be easily characterized for storage, media, and networks.   How and what tools are available for doing this?

A. Jitter is usually easy to measure on a network using standard network monitoring and reporting tools. It may or may not be easy to measure on storage systems or storage media, depending on the tools available (either built-in to the storage OS or using an external management or monitoring tool).   If you can record the latency of each transaction or packet, then it’s easy to calculate and show the jitter using standard statistical measures such as Variance or Standard Deviation of the latency. What most customers do is just measure the 99.9%, 99.99%, or 99.999% latency. This is an indirect measure of jitter but is often much easier to report and understand than the actual jitter.

Q.  Generally IOPS numbers are published for a particular block size like 8k write/read size, but in reality, IO request per second could be of mixed sizes, what is your perspective on this?

A. Most IOPS benchmarks test only one I/O size at a time. Most individual real workloads (for example databases) also use only one I/O size.  It is true that a storage controller or HDD/SSD might need to support multiple workloads simultaneously, each with a different I/O size.  While it is possible to run benchmarks with a mix of different I/O sizes, it’s rarely done because then there are too many workload combinations to test and publish. Some storage devices do not perform well if they must handle both small random and large sequential workloads simultaneously, so a smart storage controller might assign different workload types to different disk groups.

Q. One often misconfigured parameter is queue depth. Can you talk about how this relates to IOPS, latency and jitter?

A. Queue depth indicates how many tasks or I/Os can be lined up for a particular resource, such as a storage controller, network interface, or CPU. Having a higher queue depth ensures the resource stays highly utilized because it always has a new task to do as soon as it finishes its current task(s). This can result in higher IOPS because the CPU is less likely to have idle time waiting for new tasks to be put into its queue. But it could also increase latency because longer queues mean each task spends more time waiting in a queue.  It’s easy to misconfigure queue depth because you it needs to be deep enough to keep the resource (CPU/controller/interface) busy but not so deep that each transaction spends a long time in the queue.

Q. Can you please repeat all your examples of tunneling? GRE, MPLS, what others? How can it be IPv4 via IPv6?

A. VXLAN, LISP, GRE, MPLS, IPSEC.   Any time you encapsulate and send one protocol over another and decapsulate at the other end to send the original frame that process is tunneling. In the case we showed of IPv6 over IPv4, you are taking an original IPv6 frame with its IPv6 header of source address to destination address all IPv6 and sending it over and IPv4 enabled network we are encapsulating the IPv6 frame with an IPv4 header and “tunneling” IPv6 over the IPv4 network.

Q. I think it’d be possible to configure QoS to a point that exceeds the system capacity. Are there any safeguards on avoiding this scenario?

A. Some types of QoS allow over-provisioning and others do not. For example a QoS that imposes only maximum limits (and no minimum guarantees) on workloads might not prevent many workloads from exceeding system capacity. If the QoS allows over-provisioning, then you should use system monitoring and alerts to warn you when system capacity has been exceeded, or when any workloads are not getting their minimum guaranteed performance.

Q. Is there any research being done on using storage analytics along with artificial intelligence (AI) to assist with QoS?  

A. There are a number of storage analytics products, both third party and storage vendor specific that help with QoS. Whether any of these tools may be described as using AI is debatable, since we’re in the early days of using AI to do much in the storage arena. There are many QoS research projects, and no doubt they will eventually make their way into commercially available products if they prove useful.

Q. Are there any methods (measurements) to calculate IOPS/MBps in tier capable storage? Would it be wrong metric if we estimate based on medium level, example tier 2 (between 1 and 3)?

A. This question needs refinement, since tiering is sometimes a cache model rather than a data movement model. And knowing the answer may not actually help! Vendors do have tools (normally internal, since they are quite complex) that can help with the planning of tiered storage.

By now, we hope you’re not “too proud” to ask some of these storage networking questions. We’ve produced four other webcasts in this “Everything You Wanted To Know About Storage,” series to date. They are all available on-demand. And you can register here for our next one on July 6th where we’ll bring in experts to discuss:

  • Storage APIs and POSIX
  • Block, File, and Object storage
  • Byte Addressable and Logical Block Addressing
  • Log Structures and Journaling Systems

The Ethernet Storage Forum team and I hope to see you there!

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

 

 

What if Programming and Networking Had a Storage Baby? Say What?

The colorful “Everything You Wanted To Know About Storage But Were Too Proud To Ask,” popular webcast series marches on! In this 6th installment, Part – Vermillion – What if Programming and Networking Had a Storage Baby, we look into some of the nitties and the gritties of storage details that are often assumed.

When looking at data from the lens of an application, host, or operating system, it’s easy to forget that there are several layers of abstraction underneath each before the actual placement of data occurs. In this webcast we are going to scratch beyond the first layer to understand some of the basic taxonomies of these layers.

In this webcast we will show you more about the following:

  • Storage APIs and POSIX
  • Block, File, and Object storage
  • Byte Addressable and Logical Block Addressing
  • Log Structures and Journaling Systems

It’s an ambitious project, but these terms and concepts are at the heart of where compute, networking and storage intersect. Having a good grasp of these concepts ties in with which type of storage networking to use, and how data is actually stored behind the scenes.

Register today to join us on July 6th for this session. You can ask all the questions that, until now, you’ve been too proud to ask and we promise not to  to show you any baby pictures!

Update: If you missed the live event, it’s now available on-demand. You can also download the webcast slides.

SMB3 – These Questions Rock!

Earlier this month, the SNIA Ethernet Storage Forum hosted a live webcast on Server Message Block (SMB), “Rockin’ and Rollin’ with SMB3.” Presenting was Ned Pyle, Microsoft SMB Program Manager. If you missed the live event, I encourage you to watch it on-demand. We had a lot of questions from the big audience this event drew, so as promised, here are answers to them all.

Q. Other than that audit setup, is there a way to determine, via the OS, which SMB version is in use?

A. No. Network captures alone will tell you, but Windows doesn’t track this explicitly other than SMB1 with auditing we added specifically for the task of identifying removal options

Q. SMB 3.1.1 over Ethernet… can you discuss/compare with SMB 3.1.1 over Infiniband?

A. If the question is ‘what’s better, Infiniband or Ethernet’, my answer is always: it depends. I really don’t want to get into a competitive conversation under the guides of SNIA. I simply recommend looking at the vendor stories and make an informed decision. Overall, Ethernet/TCP/IP versions like RoCE and iWARP configurations are generally less expensive than Infiniband ones. They all have tremendous performance. They all have their various ups and downs.

Q. Do you have statistics regarding SMB-Direct adoption?

A. It’s tricky, as our telemetry for Server usage is quite inaccurate due to firewall rules preventing servers from reaching the Internet. I can say indirectly that we know of thousands of customer deployments.

Q. What’s the name of the IO application?

A. DiskSPD

Q. I don’t believe your I/O data tests, wouldn’t you need to trunk 17 10 Gigabit Network Cards to achieve 168 gigabit I/O capability?

A.  This was a misunderstanding, you thought I said 10Gb but it was 100Gb. We used 100Gb RDMA NICs in this demo with RoCEv2. The bottleneck was the storage at that point, the network had plenty of bandwidth left over.  

Q. These are great, but how many of these new features will end up locking out FOSS/GPL implementations of SMB such as SAMBA?

A. Absolutely not! We work with Samba team and Linux to ensure that SMB can be broadly deployed with all of its capabilities inside open source software.

Q. NetApp supports CA shares (which uses transparent failover) in two use cases: SQL over SMB and Hyper-V over SMB3.

A. This sounds likes someone from NetApp stating a fact, so I will simply say “good!” 🙂

Q.  Can you please post links to the tools mentioned in this presentation, and I/O tests? Is there a comparison using I/O Meter?

A. Here you go:

  • https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223
  • https://github.com/Microsoft/diskspd
  • https://github.com/Microsoft/diskspd/tree/master/Frameworks/VMFleet

Q. You are forced to use SMB1 because of the Windows 2003 issue?

A. Windows Server 2003 and XP (and older, like Win2000) all use SMB1. If they are still around, you will need to leave SMB1 enabled on any machines talking to them.

Q. When will Microsoft officially drop support for SMB1?

A. Overall for the protocol, there is no timeline. It is deprecated however, so no further work will be done in SMB1 other than critical security patches. SMB1 will start being removed *by default* in a coming release of Windows Server and Windows 10 client. This doesn’t mean totally removed forever, but instead “missing by default”, where you must directly opt in to adding it back. It will be done on a per-SKU basis, so that enterprises are first likely to see it, since they are equipped better to understand it and less likely to need SMB1

Q. Is there a way to change block size in SMB3 ?

A. In SMB2_READ processing section 3.3.5.12 (https://msdn.microsoft.com/en-us/library/cc246729.aspx):

The server SHOULD<296> fail the request with STATUS_INVALID_PARAMETER if the Length field is greater than Connection.MaxReadSize.

If Connection.SupportsMultiCredit is TRUE the server MUST validate CreditCharge based on Length, as specified in section 3.3.5.2.5. If the validation fails, it MUST fail the read request with STATUS_INVALID_PARAMETER.

There is similar text for SMB2_WRITE in 3.3.5.13 (https://msdn.microsoft.com/en-us/library/cc246730.aspx).

Then, off to SMB2_NEGOTIATE  in 3.3.5.4 (https://msdn.microsoft.com/en-us/library/cc246768.aspx) to discover:

  • MaxReadSize is set to the maximum size, in bytes, of the Length in an SMB2 READ Request (section 2.2.19) that the server will accept on the transport that established this connection. This value SHOULD<231> be greater than or equal to 65536. MaxReadSize MUST be set to MaxReadSize.
  • MaxWriteSize is set to the maximum size, in bytes, of the Length in an SMB2 WRITE Request (section 2.2.21) that the server will accept on the transport that established this connection. This value SHOULD<232> be greater than or equal to 65536. MaxWriteSize MUST be set to MaxWriteSize.
Windows version\Connection.Dialect 2.0.2 All other SMB2 dialects
Windows Vista SP1\Windows Server 2008 65536 N/A
Windows 7\Windows Server 2008 R2 65536 1048576
Windows 8 without [MSKB-2934016]\Windows Server 2012 without [MSKB-2934016] 65536 1048576
All other SMB2 servers 65536 8388608

<232> Section 3.3.5.4: If the underlying transport is NETBIOS over TCP, Windows servers set MaxWriteSize to 65536. Otherwise, MaxWriteSize is set based on the following table.

Windows version\Connection.Dialect 2.0.2 All other SMB2 dialects
Windows Vista SP1\Windows Server 2008 65536 N/A
Windows 7\Windows Server 2008 R2 65536 1048576
Windows 8 without [MSKB-2934016]\Windows Server 2012 without [MSKB-2934016] 65536 1048576
All other SMB2 servers 65536 8388608

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Buffers, Queues and Caches Explained

Finely tuning buffers, queues and caches can make your storage system hum. And that’s exactly what we discussed in our recent SNIA Ethernet Storage Forum webcast, “”Everything You Wanted to Know About Storage But Were Too Proud To Ask – Part Teal: The Buffering Pod.” If you missed it, it’s now available on-demand.

In this blog, you’ll find detailed answers from our panel of experts to all the great questions we received during the live event. I also encourage you to check out the other on-demand webcasts in this “Too Proud To Ask” series here and stay informed on upcoming events in this series by following us on Twitter @SNIAESF.

Q. Question on cache – What would be the right size of cache at each point (clients / Front-end connect / Storage controller / Back end connect / Physical storage).

A. Great question! The main consideration for cache sizing at any point is the workload. If the workload is conducive to cache benefits, then the more cache the merrier! However, when workload is not conducive to cache, adding more cache capacity won’t be beneficial. For example, if the workload is 100% sequential reads of small 4K IOs, having the data be pre-loaded into cache is going to be extremely helpful, and increasing the size of such cache at the end-point will be good. If the workload is random, and the IO size is changing, pre-fetching data into cache may be not a good idea. Similarly, with write cache, the benefit is realized two-fold: first, when the write is stored in cache and ack’ed back to the host (such write is typically called “dirty”, because it hasn’t been flashed back to the disk) and second, when the dirty write is overwritten by the host before it is flashed. Any other combination of workloads and IO will only get partial benefit from the cache. Sizing cache is a very difficult exercise and there are no universal answers. Every implementation has its own pluses and minuses.

Q. Isn’t a higher queue depth increasing latency as well, so applications would run slower as they are waiting longer for IO to complete?

A. The answer to this is very dependent on the environment. In general having more outstanding operations would increase the load on the interconnects and storage media which would result in the per-IO latency increasing. The alternative is having a small queue depth which may produce consistently lower per IOP latency at the expense of less throughput and IOPS. There are numerous techniques for dealing with mixed storage traffic, low-latency and high throughput, such as multi-queues, out-of-order completions, immediate and delayed data transfers in-line, ready to transfer, and policies. The NVM media latency roadmap is also helping with these types of latency vs. throughput decisions by enabling devices that achieve full-throughput at very low queue depths.

Q. Does SCSI protocol have a max queue depth of 32?

A. No, the SCSI Architecture Model allows for up to 64 bits for the command identifier field and each of the SCSI transports (iSCSI, SAS, …) defines a maximum within that range. There may be implementation-dependent SCSI endpoints that define smaller ranges.

Q. How would a distributed software defined storage technology deal with queue depth and how can this be advantageous or not advantageous?

A. Interesting question. Distributed software defined storage is by definition made up of multiple autonomous layers of software components orchestrated to provide stable storage. These types of systems will have many outstanding operations (queue depth) at multiple-stages and layers. It’s also not uncommon to see SDS file systems front-ended with block-based protocols, such as iSCSI, which enable the initiators to build up large queue depths of operations.

Q. Are queue depth and buffer the same?

A. No, queue refers to command and response queues, buffers refer to in-flight data buffers. Command and response queues often contain pointers to these buffers embedded in the read or write commands.

Q. Are caches and buffers made of the same silicon that makes up SSD disks? Which one is faster?

A. As a general idea, yes, SSDs, RAMs, caches, and buffers are all made from the silicon. If we dig a little deeper, device caches and buffers are typically made of high-speed static random access memory (SRAM), which is faster than the slower and cheaper dynamic RAM (DRAM), used for main memory. Modern SSDs are utilizing an even slower memory, which is commonly known as Flash memory, and we differentiate that type of storage by its structure: Single-Level Cell (SLC), Multi-Level Cell (MLC), etc. Although, there are some SSDs that are made out of DRAM, too. And then there are some newer technologies, like NVDIMM, 3D XPoint, etc. So, while the underlying physical material is still the same silicon, it’s the architecture that makes all the difference.

Q. In PFC.. If there are pending items in P1… can P2 or P3 etc. go ahead?

A. Yes. Priority Flow Control (PFC, also called Per Priority Pause, though rarely) is designed specifically to only pause traffic on one priority, allowing the remaining priority Classes of Service to work according to their configurations. So, for example, if PFC were to pause Priority Queue 1, and Priority Queue 3 also had a “no-drop” configuration but was not having any issues, PFC on Queue 1 would be triggered but PFC on Queue 3 would not. In reality, having more than one no-drop lane on a link is very, very rare, but it does illustrate that PFC operates on a per-priority basis, not on the whole link.

Q. Do all Ethernet based NVMe-oF (NVMe over Fabrics) implementations require some form of Data Center Bridging (DCB)? Or, are there versions of Ethernet based NVMe-oF (RoCE & iWARP) that run over standard Ethernet without needing DCB?

A. Yes, both iWARP and RoCE can be run without DCB. To maintain peak performance either DCB or other flow control mechanisms like ECN are recommended.

Q. Do server devices automatically honor the pause frame or does it require configuration?

A. I am assuming “server devices” refers to Ethernet ports on a server. It depends on the default settings of the NIC or LOM or those loaded by the driver during initialization. Generally speaking NIC devices that support PFC also support DCBX (Data Center Bridging Exchange). DCBX is a protocol that allows an end device, like a NIC, to get its proper configuration settings from the switch. That means that in an environment where PFC needs to be assigned to a specific Class of Service (CoS), the switch will send the NIC the proper settings during the setup configuration.

Q. Is it mandatory for all devices in network, host and storage to have same speed ports?

A. No.

Q. What are the theoretical devices for modeling and analyzing cache, buffer or queue behaviors?

A. Computers with software  🙂

Q. What if I have really large sized writes and they fill up the cache quickly? Is there a way to bypass the large sized writes?

A. The time of the presentation limited the amount of material we were able to share. One of the subjects we didn’t talk about was the cache software algorithm. Most storage vendors manage the cache by not letting extremely large IOs to be cached. Back in the spinning storage era, an IO of 2MB would typically be considered too large to be cached, and would be sent directly to disk.

Q. What will be the use of cache in all flash storage please? As flash is the highest performance disk.

A. See the answer to question above “Are caches and buffers made of the same silicon that makes up SSD disks? Which one is faster?” Hardware cache and buffers are typically made out of the fastest memory, then comes RAM and last are the SSDs aka flash disks. Therefore, storing data on a faster layer is still beneficial to the performance.

Q. Does the LUN Queue Depth includes the Queue Depth discussed here?

A. Yes, SCSI LUN queue depth enables the initiator(s) to have multi outstanding I/O operations in flight.

Q. Will you use a queuing algorithm to manage IO queue? If your answer is yes, which algorithm will you use?

A. There are several storage protocols that define mechanisms for a target to dynamically adjust the queue depth available to the initiator through various forms of credit exchanges. Having these types of mechanism enables the target to implement multi-initiator load balancing across targets.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Storage Expert Takes on Hyperconverged Questions

Last month, we were fortunate enough to have Greg Schulz, analyst and founder of Server Storage IO, as a guest speaker at our SNIA Ethernet Storage Forum webcast, “What Does Hyperconverged Mean to Storage.” If you missed it, it’s now available on-demand. Greg fielded many great questions during the live event, but we didn’t have time to get to them all. So here they are:

Q. What is the difference between Converged Infrastructure (CI) and Hyperconverged Infrastructure (HCI)?

A. HCI is aggregated. You scale compute and storage in lock step. Converged is disaggregated. You can scale the compute independently of the storage. There are some software solutions that can support both hyper-converged (aggregated) and converged (disaggregated) deployments.

 Q.  What is your definition of “Little Data”?

A. Little Data is anything that’s not Big Data. It encompasses traditional databases, traditional structured, semi-structured and even some unstructured data.

Q. With convergence, what is the impact on the IT organization?

A. There is an opportunity for organizations to converge how they manage data infrastructure resources and services delivery. In other words, the technology can be leveraged to help an organization itself converge. Another impact is how converged solutions are protected, backed up, BC/BR/DR and related management done. Traditionally there are separate IT teams for compute, storage, and networking, especially in a large organization. New technology solutions may allow an organization to converge those teams.

Q. Is there a hybrid strategy? Where a complete information system is composed of HCI/CI building blocks? If yes, what management tools would span these components?

A. Sure, why not? Certainly you can converge your environment into a particular CI/HCI solution or approach, likewise, different CI/HCI solutions can co-exist along with other solutions in a given environment in hybrid ways. Have a hybrid strategy that looks at how technologies and solutions adapt to your needs and environment. Focus on how it’s going to work for you, vs. you having to work for them.

Q. What does FUZE stand for?

A. FUZE is not an acronym. It is the actual fuzing as in melding and bringing together things – literally fuzing thing together.

Q. Do HCI vendors re-balance (compute, I/O, storage) automatically as more nodes are added?

A. Solutions vary in how they rebalance the workloads. Some are dynamic while others rebalance on intervals; it varies how, when and what they rebalance. So, as you add capacity as you make changes, you need to make sure resources are properly allocated to address performance.

Q. Can’t you offload those CPU cycles caused by I/O to another CPU?

A. That’s an interesting question. Yes, move the application to another CPU. There is software that will leverage the resources on another CPU. Most HCI and CI solutions are running on a stack that requires hardware somewhere.

Q. This discussion has touched on compute and storage scaling. What about network between compute in the CI/HCI infrastructure and external to other compute, databases, or end-users?

A. Both CI and HCI need to connect to other resources, but in most cases the highest levels of network traffic are inside the CI or HCI stack because the compute and storage resources are contained within. Their connections to outside clients or servers data exchange, application integration, or client access is important but usually not very demanding on network bandwidth. (External connections for storage remote replication or backup could be bandwidth-intensive.)

Q. How can the current Enterprise Storage Products blend with either CI or HCI? Enterprise Storage is basically centralized storage architecture however the HCI is built mostly on ‘distributed storage architecture’. So how can current Enterprise Storage show use cases to the customer to sell their Enterprise Storage either as part of the HCI solution or exist along with HCI?

A. Generally enterprise storage products can be included in CI but are not blended with HCI. For example Dell EMC, Cisco (with NetApp and other storage vendors), IBM and Oracle offer CI solutions that include enterprise storage arrays in the rack. Most HCI platforms do not interoperate with enterprise storage arrays because the HCI platforms include their own storage. They can co-exist with enterprise storage arrays and that’s how most customers deploy them—some workloads run on the HCI infrastructure while others continue to use enterprise storage arrays.

Q. One of the HCI selling points is simplicity and cost reductions from a la carte. It seems that from what is being presented, that may not be the case. Can you elaborate on where HCI may become more complex, costly?

A. It comes down to value. You can buy all the components yourself and glue them all together and may come up with a lower total cost, but what is the value of your time? What is the cost of staff time to evaluate, test, deploy and maintain. The total value must be considered. It’s possible that HCI will be more costly than a disaggregated deployment that separates compute and storage, but this depends heavily on the workload and specific vendor product solution implementation.

Q. Current HCI “full stack” solutions claim compute and storage convergence, but what about the network? Given the east/west traffic introduced by HCI solutions, what networking solutions should customers be looking at?

A. Most of the common HCI solutions are packaged with server, storage, compute and most have networking included as well—typically the network adapters and sometimes also the switches. Some even have a backend software defined networking (SDN) capability as part of their stack.

Q. Related to HCI answer, what about vendors who allow for storage growth and/or server (compute) and storage additions. This allows for aggregated and dis-aggregated…yes?

A. Most HCI vendors require compute and storage to be added simultaneously, though many support different nodes with different ratios of compute and storage. This allow customers to change the ratio of compute and storage by adding different node types. And yes, some HCI vendors also support both a hyper-converged and disaggregated model, with the disaggregated model allowing compute and storage to be added separately.

Q. What are the tools available to make HCI work in a hybrid load environment, with different workload requirements, e.g.: VDI and Databases?

A. There are tools for moving and migrating applications, workloads, systems and VMs into CI/HCI environments, likewise for tuning, optimizing, gaining insight, analytics and reporting. Most of the CI/HCI solutions have tools built into them for optimizing PACE (Performance, Availability, Capacity, Economics) attributes along with server compute, memory, storage, and I/O resources. Some CI/HCI solutions are optimized for VDI/workspaces, while others are able to support general workloads including databases, and some even support HPC/SC or other specialized workloads.

Q. Does network performance affect HCI or CI performance?

A. Sometimes. Most hybrid HCI nodes are happy with the bandwidth of 10GbE, but if the nodes are all-flash or have many disks, then a faster speed may be required to avoid a network bottleneck. Network latency could affect HCI or CI performance in some cases, especially with all-flash storage. Of course a reliable network helps ensure reliable CI/HCI operations.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Q&A on All Things iSCSI

In the recent SNIA Ethernet Storage Forum iSCSI pod webcast, from our “Everything You Wanted To Know About Storage Part Were Too Proud to Ask” series, we discussed all things iSCSI. If you missed the live event, it’s now available on-demand. As promised, we’ve compiled all the webcast questions with answers from our panel of experts. If you have additional questions, please feel free to ask them in the comment field of this blog. I also encourage you to check out the other on-demand webcasts in this “Too Proud To Ask” series here and stay informed on upcoming events in this series by following us on Twitter @SNIAESF.

Q. What does SPDK stand for?

A. SPDK stands for Storage Performance Development Kit. It is comprised of tools and libraries for developers to write high performance and scalable storage applications in user-mode. For details, see www.spdk.io.

Q. Can you elaborate on SPDK use? A quick search seems to indicate it is a “half-baked” solution, and available only on Linux systems.

A. SPDK isn’t a solution, per se – it’s a development kit, intended to provide common building blocks (NVMe drivers, NVMe over Fabrics targets & host/initiator, etc.) for solutions developers who care about latency, license (BSD) and efficiency.

Q. Is iSCSI ever going to be able to work with object storage?

A. iSCSI is a block storage protocol while object storage is normally accessed using a RESTful API such as Amazon’s S3 API or the Swift API. For this reason, iSCSI is unlikely to be used for direct access to object storage. However, an object storage system controller could use iSCSI—or other block protocols–to access remote storage enclosures or for data replication. There also could be storage systems that support both iSCSI/block and object storage access simultaneously.

Q. Does a high-density virtualized workload represent something better served with a full or partial offload solution?

A. The type of workload that is better served with full or partial offload will really depend more on what that workload is doing. If you are processing a lot of very large data segments, LSO or LRO might be very helpful. If you have a lot of smaller data sets, you might be able to benefit from checksum or chimney offload. Unfortunately, the best way to see is to test things out (but not on production, obviously).

Q. How does one determine if TOE NIC cards are worth the cost?

A. This is a really tough question to answer without context. The best way to look at it is do some digging into what your CPU and memory utilization and IO patters look like on your servers and try to map that to TCP connections. If you have a lot of iSCSI IO and a large amount of TCP connections on a server, that might be a candidate for TOE. That’s just a technical response, but then comes the really tricky part – the QUANTITY measurement of how many dollars it is worth… that’s way more challenging. For example, if I have a regular 10G NIC that costs $200 and a TOE card that costs 3x that and only saves 5% CPU, then it may not have enough value. On the other hand, if that 5% CPU can be used by your application to transact enough business to pay for the extra $400, then it’s worth it. Sorry to say that I have seen no scientific way to enumerate that value outside of specific hands-on testing of the solution with and without TOE NICs.

Q. What is the difference between a stateless and stateful TCP offload? Are RSS and TSS (receive-side and transmission-side scaling) offloads a type of TCP offload or are they operating at a lower level like Layer 2?

A. Stateless offloading is basically any offload function that can be done without the NIC needing to maintain a connection state table. Checksum offloads are an example. Stateful offloading is any offloading that requires the NIC to maintain a full state connection table. Receive Side Scaling has to do with distributing inbound connections in order to alternate connections coming into the server to different CPUs on a multi-CPU server. There are also some other performance-enhancements that can be done such as RPS, RFS, XPS and some others. These are more about how to get data from the network to the CPU, but are not really specifically TCP functions, as they have to do with uniform processing, not necessarily to do with the TCP stack.

Q. Is using the host CPU to run iSCSI really a downside?

A. There may be applications where this is a problem, but you’re generally right; it’s not too much of an issue today. But there are iSCSI-based storage solutions coming up where a consistent 100s of nanoseconds to low microseconds of latency from the device is possible – and that’s very fast indeed. So an iSCSI stack in these circumstances needs to ensure that its consumption of CPU doesn’t increase the latency (even very efficient stacks can add 100s of micro- to milliseconds of latency), or cause contention for the CPU (busy CPUs mean you may queue for compute resources).

Q. Is the term “onload” for iSCSI new – never heard this before?

A. It was intended as a quick shorthand word to stand in contrast to iSCSI offload. It will probably not catch on!

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

 

 

 

Would You Like Some Rosé with Your iSCSI?

Would you like some rosé with your iSCSI? I’m guessing that no one has ever asked you that before. But we at the SNIA Ethernet Storage Forum like to get pretty colorful in our “Everything You Wanted To Know about Storage But Were Too Proud To Ask” webcast series as we group common storage terms together by color rather than by number.

In our next live webcast, Part Rosé – The iSCSI Pod, we will focus entirely on iSCSI, one of the most used technologies in data centers today. With the increasing speeds for Ethernet, the technology is more and more appealing because of its relative low cost to implement. However, like any other storage technology, there is more here than meets the eye.

We’ve convened a great group of experts from Cisco, Mellanox and NetApp who will start by covering the basic elements to make your life easier if you are considering using iSCSI in your architecture, diving into:

  • iSCSI definition
  • iSCSI offload
  • Host-based iSCSI
  • TCP offload

Like nearly everything else in storage, there is more here than just a protocol. I hope you’ll register today to join us on March 2nd and learn how to make the most of your iSCSI solution. And while we won’t be able to provide the rosé wine, our panel of experts will be on-hand to answer your questions.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

We’ve Been Thinking…What Does Hyperconverged Mean to Storage?

Here at the SNIA Ethernet Storage Forum (ESF), we’ve been discussing how hyperconverged adoption will impact storage. Converged Infrastructure (CI), Hyperconverged Infrastructure (HCI), along with Cluster or Cloud In a Box (CIB) are popular trend topics that have gained both industry and customer adoption. As part of data infrastructures, CI, HCI, and CIB enable simplified deployment of resources (servers, storage, I/O networking, hypervisor, application software) across different environments.

But what do these approaches mean for the storage environment? What are the key concerns and considerations related specifically to storage? How will the storage be connected to (or included in) the platform? Who will protect and backup the data? And most importantly, how do you know that you’re asking the right questions in order to get to the right answers?

Find out on March 15th in a live SNIA-ESF webcast, “What Does Hyperconverged Mean to Storage.” We’ve invited expert Greg Schulz, founder and analyst of Server StorageIO, to answer the questions we’ve been debating. Join us, as Greg will move beyond the hype (pun intended) to discuss:

  • What are the storage considerations for CI, CIB and HCI
  • Why fast applications and fast servers need fast I/O
  • Networking and server-storage I/O considerations
  • How to avoid aggravation-causing aggregation (bottlenecks)
  • Aggregated vs. disaggregated vs. hybrid converged
  • Planning, comparing, benchmarking and decision-making
  • Data protection, management and east-west I/O traffic
  • Application and server north-south I/O traffic

Register today and please bring your questions. We’ll be on-hand to answer them during this event. We hope to see you there!

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Buffers, Queues, and Caches, Oh My!

Buffers and Queues are part of every data center architecture, and a critical part of performance – both in improving it as well as hindering it. A well-implemented buffer can mean the difference between a finely run system and a confusing nightmare of troubleshooting. Knowing how buffers and queues work in storage can help make your storage system shine.

However, there is something of a mystique surrounding these different data center components, as many people don’t realize just how they’re used and why. Join our team of carefully-selected experts on February 14th in the next live webcast in our “Too Proud to Ask” series, “Everything You Wanted to Know About Storage But Were Too Proud To Ask – Part Teal: The Buffering Pod” where we’ll demystify this very important aspect of data center storage. You’ll learn:

  • What are buffers, caches, and queues, and why you should care about the differences?
  • What’s the difference between a read cache and a write cache?
  • What does “queue depth” mean?
  • What’s a buffer, a ring buffer, and host memory buffer, and why does it matter?
  • What happens when things go wrong?

These are just some of the topics we’ll be covering, and while it won’t be exhaustive look at buffers, caches and queues, you can be sure that you’ll get insight into this very important, and yet often overlooked, part of storage design.

Register today and spend Valentine’s Day with our experts who will be on-hand to answer your questions on the spot!

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

Clearing Up Confusion on Common Storage Networking Terms

Do you ever feel a bit confused about common storage networking terms? You’re not alone. At our recent SNIA Ethernet Storage Forum webcast “Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Mauve,” we had experts from Cisco, Mellanox and NetApp explain the differences between:

  • Channel vs. Busses
  • Control Plane vs. Data Plane
  • Fabric vs. Network

If you missed the live webcast, you can watch it on-demand. As promised, we’re also providing answers to the questions we got during the webcast. Between these questions and the presentation itself, we hope it will help you decode these common, but sometimes confusing terms.

And remember, the “Everything You Wanted To Know About Storage But Were Too Proud To Ask” is a webcast series with a “colorfully-named pod” for each topic we tackle. You can register now for our next webcast: Part Teal, The Buffering Pod, on Feb. 14th.

Q. Why do we have Fibre and Fiber

A. Fiber Optics is the term used for the optical technology used by Fibre Channel Fabrics.   While a common story is that the “Fibre” spelling came about to accommodate the French (FC is after all, an international standard), in actuality, it was a marketing idea to create a more unique name, and in fact, it was decided to use the British spelling – “Fibre”.

Q. Will OpenStack change all the rules of the game?

A. Yes. OpenStack is all about centralizing the control plane of many different aspects of infrastructure.

Q. The difference between control and data plane matters only when we discuss software defined storage and software defined networking, not in traditional switching and storage.

A. It matters regardless. You need to understand how much each individual control plane can handle and how many control planes you have from a overall management perspective. In the case were you have too many control planes SDN and SDS can be a benefit to you.

Q. As I’ve heard that networks use stateless protocols, would FC do the same?

A.  Fibre Channel has several different Classes, which can be either stateful or stateless. Most applications of Fibre Channel are Class 3, as it is the preferred class for SCSI traffic, A connection between Fibre Channel endpoints is always stateful (as it involves a login process to the Fibre Channel fabric). The transport protocol is augmented by Fibre Channel exchanges, which are managed on a per-hop basis. Retransmissions are handled by devices when exchanges are incomplete or lost, meaning that each exchange is a stateful transmission, but the protocol itself is considered stateless in modern SCSI-transport Fibre Channel.

iSCSI, as a connection-oriented protocol, creates a nexus between an initiator and a target, and is considered stateful.  In addition, SMB, NFSv4, ftp, and TCP are stateful protocols, while NFSv2, NFSv3, http, and IP are stateless protocols.

Q. Where do CIFS/SMB come into the picture?

A. CIFFS/SMB is part of a network stack.   We need to have a separate talk about network stacks and their layers.   In this presentation, we were talking primarily about the physical layer of the networks and fabrics.   To overly simplify network stacks, there are multiple layers of protocols that run on top of the physical layer.   In the case of FC, those protocols include the control plane protocols (such as FC-SW), and the data plane protocols.   In FC, the most common data plane protocol is FCP (used by SCSI, FICON, and FC-NVMe).   In the case of Ethernet, those protocols also include the control plan (such as TCP/IP), and data plane protocols.   In Ethernet, there are many commonly used data plane protocols for storage (such as iSCSI, NFS, and CIFFS/SMB)

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.