Introducing the Networking Storage Forum

At SNIA, we are dedicated to staying on top of storage trends and technologies to fulfill our mission as a globally recognized and trusted authority for storage leadership, standards, and technology expertise. For the last several years, the Ethernet Storage Forum has been working hard to provide high quality educational and informational material related to all kinds of storage.

From our “Everything You Wanted To Know About Storage But Were Too Proud To Ask” series, to the absolutely phenomenal (and required viewing) “Storage Performance Benchmarking” series to the “Great Storage Debates” series, we’ve produced dozens of hours of material.

Technologies have evolved and we’ve come to a point where there’s a need to understand how these systems and architectures work – beyond just the type of wire that is used. Today, there are new systems that are bringing storage to completely new audiences. From scale-up to scale-out, from disaggregated to hyperconverged, RDMA, and NVMe-oF – there is more to storage networking than just your favorite transport.

For example, when we talk about NVMe™ over Fabrics, the protocol is broader than just one way of accomplishing what you need. When we talk about virtualized environments, we need to examine the nature of the relationship between hypervisors and all kinds of networks. When we look at “Storage as a Service,” we need to understand how we can create workable systems from all the tools at our disposal.

Bigger Than Our Britches

As I said, SNIA’s Ethernet Storage Forum has been working to bring these new technologies to the forefront, so that you can see (and understand) the bigger picture. To that end, we realized that we needed to rethink the way that our charter worked, to be even more inclusive of technologies that were relevant to storage and networking.

So…

Introducing the Networking Storage Forum. In this group we’re going to continue producing top-quality, vendor-neutral material related to storage networking solutions. We’ll be talking about:

  • Storage Protocols (iSCSI, FC, FCoE, NFS, SMB, NVMe-oF, etc.)
  • Architectures (Hyperconvergence, Virtualization, Storage as a Service, etc.)
  • Storage Best Practices
  • New and developing technologies

… and more!

Generally speaking, we’ll continue to do the same great work that we’ve been doing, but now our name more accurately reflects the breadth of work that we do.

We’re excited to launch this new chapter of the Forum. If you work for a vendor, are a systems integrator, university or someone who manages storage, we welcome you to join the NSF. We are an active group that honestly has a lot of fun. If you’re one of our loyal followers, we hope you will continue to keep track of what we’re doing. And if you’re new to this Forum, we encourage you to take advantage of the library of webcasts, white papers, and published articles that we have produced here. There’s a wealth of un-biased, educational information there, we don’t think you’ll find anywhere else!

If there’s something that you’d like to hear about – let us know! We are always looking to hear about headaches, concerns, and areas of confusion within the industry where we can shed some light. Stay current with all things NSF:

 

 

Centralized vs. Distributed Storage FAQ

To date, thousands have watched our “Great Storage Debate” webcast series. Our most recent installment of this friendly debate (where no technology actually emerges as a “winner”) was Centralized vs. Distributed Storage. If you missed it, it’s now available on-demand. The live event generated several excellent questions which our expert presenters have thoughtfully answered here:

Q. Which performs faster, centralized or distributed storage?

A. The answer depends on the type of storage, the type of connections to the storage, and whether the compute is distributed or centralized. The stereotype is that centralized storage performs faster if the compute is local, that is if it’s in the same data center as the centralized storage.

Distributed storage is often using different (less expensive) storage media and designed for slower WAN connections, but it doesn’t have to be so. Distributed storage can be built with the fastest storage and connected with the fastest networking, but it rarely is used that way. Also it can outperform centralized storage if the compute is distributed in a similar way to the distributed storage, letting each compute node access the data from a local node of the distributed storage.

Q. What about facilities costs in either environment? Ultimately the data has to physically “land” somewhere and use power/cooling/floor space. There is an economy of scale in centralized data centers, how does that compare with distributed?

A. One big difference is in the cost of power between various data centers. Typically, data centers tend to be the places where businesses have had traditional office space and accommodation for staff. Unfortunately, these are also areas of power scarcity and are consequently expensive to run. Distributed data centers can be in much cheaper locations; there are a number for instance in Iceland where geothermally generated electricity is very cheap, and environmental cooling is effectively free. Plus, the thermal cost per byte can be substantially lower in distributed data centers by efficiently packing drives to near capacity with compressed data. Learn more about data centers in Iceland here.

Another difference is that distributed storage might consume less space if its data protection method (such as erasure coding) is more efficient than the data protection method used by centralized storage (typically RAID or triple replication). While centralized storage can also use erasure coding, compression, and deduplication, it’s sometimes easier to apply these storage efficiency technologies to distributed storage.

Q. What is sharding?

A. Sharding is the process of breaking up, typically a database, into a number of partitions, and then putting these pieces or shards on separate storage devices or systems. The partitioning is normally a horizontal partition; that is, the rows of the database remain complete in a shard and some criteria (often a key range) is used to make each shard. Sharding is often used to improve performance, as the data is spread across multiple devices which can be accessed in parallel.

Sharding should not be confused with erasure coding used for data protection. Although this also breaks data into smaller pieces and spreads it across multiple devices, each part of the data is encoded and can only be understood once a minimum number of the fragments have been read and the data has been reconstituted on some system that has requested it.

Q. What is the preferred or recommended choice of NVMe over Fabrics (NVME-oF) for centralized vs. distributed storage systems for prioritized use-case scenarios such as data integrity, latency, number of retries for read-write/resource utilization?

A. This is a straightforward cost vs. performance question. This kind of solution only makes sense if the compute is very close to the data; so either a centralized SAN, or a (well-defined) distributed system in one location with co-located compute would make sense. Geographically dispersed data centers or compute on remote data adds too much latency, and often bandwidth issues can add to the cost.

Q. Is there a document that has catalogued the impact of latency on the many data types? When designing storage I would start with how much latency an application can withstand.

A. We are not aware of any single document that has done so, but many applications (along with their vendors, integrators, and users) have documented their storage bandwidth and latency needs. Other documents show the impact of differing storage latencies on application performance. Generally speaking one could say the following about latency requirements, though exceptions exist to each one:

  • Block storage wants lower latency than file storage, which wants lower latency than object storage
  • Large I/O and sequential workloads tolerate latency better than small I/O and random workloads
  • One-way streaming media, backup, monitoring and asynchronous replication care more about bandwidth than latency. Two-way streaming (e.g. videoconferencing or IP telephony), database updates, interactive monitoring, and synchronous replication care more about latency than bandwidth.
  • Real-time applications (remote control surgery, multi-person gaming, remote AR/VR, self-driving cars, etc.) require lower latency than non-real-time ones, especially if the real-time interaction goes both ways on the link.

One thing to note is that many factors affect performance of a storage system. You may want to take a look at our excellent Performance Benchmark webinar series to find out more.

Q. Computation faces an analogous debate between distributed compute vs. centralized compute. Please comment on how the computation debate relates to the storage debate. Typically, distributed computation will work best with distributed storage. Ditto for centralized computation and storage. Are there important applications where a user would go for centralized compute and distributed storage? Or distributed compute and centralized storage?

A. That’s a very good question, to which there is a range of not so very good answers! Here are some application scenarios that require different thinking about centralized vs. distributed storage.

Video surveillance is best with distributed storage (and perhaps a little local compute to do things like motion detection or object recognition) with centralized compute (for doing object identification or consolidation of multiple feeds). Robotics requires lots of distributed compute; think self-driving cars, where the analysis of a scene and the motion of the vehicle needs to be done locally, but where all the data on traffic volumes and road conditions needs multiple data sources to be processed centrally. There are lots of other (often less exciting but just as important) applications that have similar requirements; retail food sales with smart checkouts (that part is all local) and stock management systems & shipping (that part is heavily centralized).

In essence, sometimes it’s easier to process the data where it’s born, rather than move it somewhere else. Data is “sticky”, and that sometimes dictates that the compute should be where the data lies. Equally, it’s also true that the only way of making sense of distributed data is to centralize it; weather stations can’t do weather forecasting, so it needs to be unstuck, collected up & transmitted, and then computed centrally.We hope you enjoyed this un-biased, vendor-neutral debate. You can check out the others in this series below:

Follow us @SNIAESF for more upcoming webcasts.

RoCE vs. iWARP Q&A

In our RoCE vs. iWARP webcast, experts from the SNIA Ethernet Storage Forum (ESF) had a friendly debate on two commonly known remote direct memory access (RDMA) protocols that run over Ethernet: RDMA over Converged Ethernet (RoCE) and the IETF-standard iWARP. It turned out to be another very popular addition to our “Great Storage Debate” webcast series. If you haven’t seen it yet, it’s now available on-demand along with a PDF of the presentation slides.

We received A LOT of questions related to Performance, Scalability and Distance, Multipathing, Error Correction, Windows and SMB Direct, DCB (Data Center Bridging), PFC (Priority Flow Control), lossless networks, and Congestion Management, and more. Here are answers to them all.   Read More

We’re Debating Again: Centralized vs. Distributed Storage

We hope you’ve been following the SNIA Ethernet Storage Forum (ESF) “Great Storage Debates” webcast series. We’ve done four so far and they have been incredibly popular with 4,000 live and on-demand views to date and counting. Check out the links to all of them at the end of this blog.

Although we have “versus” in the title of these presentations, the goal of this series is not to have a winner emerge, but rather provide a “compare and contrast” that educates attendees on how the technologies work, the advantages of each, and to explore common use cases.

That’s exactly what we plan to do on September 11, 2018 when we host “Centralized vs. Distributed Storage.” In the history of enterprise storage there has been a trend to move from local storage to centralized, networked storage. Customers found that networked storage provided higher utilization, centralized and hence cheaper management, easier failover, and simplified data protection amongst many advantages, which drove the move to FC-SAN, iSCSI, NAS and object storage.

Recently, however, distributed storage has become more popular where storage lives in multiple locations, but can still be shared over a LAN (Local Area Network) and/or WAN (Wide Area Network). The advantages of distributed storage include the ability to scale out capacity. Conversely, in the hyperconverged use case, enterprises can use each node for both compute and storage, and scale-up as more resources are needed.

What does this all mean?

Register for this live webcast to find out, where my ESF colleagues and I will discuss:

  • Pros and cons of centralized vs. distributed storage
  • Typical use cases for centralized and distributed storage
  • How SAN, NAS, parallel file systems, and object storage fit in these different environments
  • How hyperconverged has introduced a new way of consuming storage

It’s sure to be another un-biased, vendor-neutral look at a storage topic many are debating within their own organizations. I hope you’ll join us on September 11th. In the meantime, I encourage you to watch our on-demand debates:

Learn about the work SNIA is doing to lead the storage industry worldwide in developing and promoting vendor-neutral architectures, standards, and educational services that facilitate the efficient management, movement, and security of information by visiting snia.org.

 

 

RoCE vs. iWARP – The Next “Great Storage Debate”

By now, we hope you’ve had a chance to watch one of the webcasts from the SNIA Ethernet Storage Forum’s “Great Storage Debate” webcast series. To date, our experts have had friendly, vendor-neutral debates on File vs. Block vs. Object Storage, Fibre Channel vs. iSCSI, and FCoE vs. iSCSI vs. iSER. The goal of this series is not to have a winner emerge, but rather educate the attendees on how the technologies work, advantages of each, and common use cases.

Our next great storage debate will be on August 22, 2018 where our experts will debate RoCE vs. iWARP. They will discuss these two commonly known RDMA protocols that run over Ethernet: RDMA over Converged Ethernet (RoCE) and the IETF-standard iWARP. Both are Ethernet-based RDMA technologies that can increase networking performance. Both reduce the amount of CPU overhead in transferring data among servers and storage systems to support network-intensive applications, like networked storage or clustered computing.

Join us on August 22nd, as we’ll address questions like:

  • Both RoCE and iWARP support RDMA over Ethernet, but what are the differences?
  • Use cases for RoCE and iWARP and what differentiates them?
  • UDP/IP and TCP/IP: which RDMA standard uses which protocol, and what are the advantages and disadvantages?
  • What are the software and hardware requirements for each?
  • What are the performance/latency differences of each?

Get this on your calendar by registering now. Our experts will be on-hand to answer your questions on the spot. We hope to see you there!

Visit snia.org to learn about the work SNIA is doing to lead the storage industry worldwide in developing and promoting vendor-neutral architectures, standards, and educational services that facilitate the efficient management, movement, and security of information.

 

A Q&A from the FCoE vs. iSCSI vs. iSER Debate

It’s become quite clear to those of us in the SNIA Ethernet Storage Forum (ESF) that everyone loves a great debate. We’ve proved that with our “Great Storage Debates” webcast series which has had over 3,500 views in just a few months! Last month we had another friendly debate on FCoE vs. iSCSI vs. iSER. If you missed the live event, you can watch it now on-demand and download a pdf of the webcast slides.  Our live audience asked a lot of interesting questions. As promised, here are answers to them all.

Q. How often are iSCSI offload adapters used in customer environments as compared to software initiators?   Can these adapters be used for all IP traffic or do they only run iSCSI?

A. iSCSI offload adapters are ideally suited for enabling high-performance storage access at up to 100Gbps data rates for business-critical applications, for example, latency-sensitive transactional applications and large-file business intelligence applications. iSCSi offload adapters typically also support offload of other storage protocols such as NVMe-oF, iSER, FCoE as well as regular Ethernet traffic using offload or non-offload means.

Q. What you’ve missed with iSCSI is Jumbo Frames. That payload size is one of the biggest advantages over Fibre Channel. The biggest problem with both FCoE and iSCSI is they build the networks too complex, with too many hops, without true redundant isolation. Best Practices with block based FC is to keep the host and storage as close to each other as possible. And to have separate isolated redundant networks/fabric.

A. The Jumbo Frame (JF) argument is quite contentious among iSCSI storage and network administrators, even beyond anything to do with Fibre Channel.

Considering that the performance advantages of JFs are minimal – only 3%-5% performance boost over default MTU sizes of 1500. In mixed workload environments (which dominate the Data Center application deployments), JFs simply do not provide the kind of benefits that people expect in real-world scenarios. The only time JFs can “push the needle,” so to speak, is when you have massively scaled systems with 100s or 1000s of devices, but this raises other issues.

One of those issues is that every device in the system needs to have JFs enabled. This can be something of a problem when systems get as large as they need to be in order to take advantage of JFs. Ensuring that every device is configured properly – especially over time, and especially when considering how iSCSI devices are added to networked environments – is a job that requires the coordination of the server/virtualization teams, the networking teams, and the storage teams. By and large, many people find QoS to be a more productive means of performance improvement for iSCSI systems than JFs.

Fibre Channel, on the other hand, has a maximum frame size of 2112 bytes. FCoE, then, only requires “baby jumbo” frames, for which the configuration is pushed from the switch to the end devices (~2.5k). What FC has that iSCSI does not have is the concept of “sequences” and “exchanges,” which ensure that the long-flow of frames (regardless of their size) are sent as an entity. So, regardless of what the frame size is (2.5k or 9k), the data flow is sent with consistency and low-jitter because of the way that the sequences and exchanges are handled.

The concern about “too complex” and “too many hops” is an interesting one, as Fibre Channel (and, correspondingly, FCoE) are deliberately kept as simple and straightforward as possible. A FC network, for instance, rarely goes beyond 2 hops (“hops” in FC are measured as the links between switches, whereas in Ethernet “hops” are measured as the switches themselves).

Logically, then, there is usually, at most, an edge-core-edge topology with a predeterministic path to be followed thanks to Fibre Channel’s FSPF routing algorithm.

iSCSI topologies, on the other hand, can be complex (as Ethernet topologies sometimes can be). For larger iSCSI environments, it is often recommended to isolate the storage traffic out into its own, simplified topology. iSCSI SANs that have grown organically, however, can sometimes struggle to be reined in over time.

Best practices for all storage is to keep it as close to the host/source as is reasonably possible, not just block. In backup scenarios for example, you want the storage far enough away to be safe from any catastrophe, but close enough to ensure recovery objectives. The design principle of keeping storage as close to the host is a common best practice, and as mentioned in the webinar it is important that architectural principles ensure high availability (HA) to compensate for the rigidity that block storage systems require to compensate for weaker ULP recovery mechanisms.

 Q.  Most servers today have enough compute power to not need offload adapters.

A.  This statement might be true in some situations, but definitely not most. With more and more virtual machines being deployed on physical systems and new storage technologies such as SSDs, and NVMe devices which greatly lower latencies, servers are often CPU bound when moving or retrieving data from storage. Offloading storage related activities to an adapter frees the CPU and increases overall server performance.

Q. In which industry is each protocol (i.e. FCOE or ISCSI and iSER) widely used and where?

A. iSCSI is the most widely-supported Ethernet SAN protocol  with native initiator support integrated into all the major operating systems and hypervisors, built-in RDMA for high performance offloaded implementations supporting up to 100Gbps and support across major storage platforms and  is thus ideally suited for deployment across cloud and enterprise data center environments.

Q. Do iSCSI offload adapters provide the IPSec encryption, or is this done in software only solutions? Please answer from both initiator and target perspective.

A. Yes, iSCSI protocol offload adapters can optionally provide offload of IPSec encryption for both iSCSI (as well as NVMe-oF) initiator and target operation at data rates of up to 100 Gigabits-per-second. This results in overall higher server and target efficiency including power, cooling, memory, and CPU savings.

Q. Does iSER support direct or is a switch between them required?

A. A switch is not required.

Q. J, you left out the centralized management that Fibre Channel provides for FCoE as a positive.

A. I got there eventually! But you are correct, the Fibre Channel tools for a centralized management plane with the name server – regardless of the number of switches in the fabric – is a tremendous positive for FCoE/FC solutions at scale.

Q. Is multipath possible on the initiator with ISER and will it scale with high IOPs?

A. Yes. Mulitpath is possible on the initiator with iSER and scales with high IOPs.

Q. FCoE has been around for a while, but I noticed that some storage vendors are dropping support for it. Do you still see a big future for FCoE?

A. As a protocol, FCoE has always been able to be used wherever and whenever needed. Almost all converged infrastructure systems use FCoE, for instance. Given that the key advantage of FCoE has been traffic/protocol consolidation, there is an extremely strong use case for FCoE at “the first hop” – that is, from the server to the first network switch.

Q. What is the MTU for iSER ?

A. iSER as a protocol that sits above the Layer 2 Data Link Layer, which is where the MTU is set. As a result, iSER will accept/accommodate any MTU setting that is configured at that layer. Please see the answer earlier about Jumbo Frames for more information.

Ready for more great storage debates? Our next one will be RoCE vs. iWARP on August 22, 2018. Save you place by registering here.

And you can check out our previous debates “File vs. Block vs. Object Storage” and “Fibre Channel vs. iSCSI” on-demand at your convenience too. Happy debating!

Storage Controllers – Your Questions Answered

The term controller is used constantly, but often has very different meanings. When you have a controller that manages hardware, there are very different requirements than a controller that manages an entire system-wide control plane. You can even have controllers managing other controllers. It can all get pretty confusing very quickly. That’s why the SNIA Ethernet Storage Forum (ESF) hosted our 9th “Too Proud to Ask” webcast. This time it was “Everything You Wanted to Know about Storage but were Too Proud to Ask: Part Aqua – Storage Controllers.” Our experts from Microsemi, Cavium, Mellanox and Cisco did a great job explaining the differences between the many types of controllers, but of course there were still questions. Here are answers to all that we received during the live event which you can now view on-demand.

Q.Is there a standard for things such as NVMe over TCP/IP?

A. NVMe™ is in the process of standardizing a TCP transport. It will be called NVMe over TCP (NVMe™/TCP) and the technical proposal should be completed and public later in 2018.

Q. What are the length limits on NVMe over fibre?

A. There are no length limits. Multiple Fibre Channel frames can be combined to create any length transfer needed. The Fibre Channel Industry Association has a very good presentation on Long-Distance Fibre Channel, which you can view here.

Q. What does the term “Fabrics” mean in the storage context?

A. Fabrics typically applies to the switch or switches interconnecting the hosts and storage devices. Specifically, a storage “fabric” maintains some knowledge about itself and the devices that are connected to it, but some people use it to mean any networked devices that provide storage. In this context, “Fabrics” is also shorthand for “NVMe over Fabrics,” which refers to the ability to run the NVMe protocol over an agnostic networking transport, such as RDMA-based Ethernet, Fibre Channel, and InfiniBand (TCP/IP coming soon).

Q. How does DMA result in lower power consumption?

A. DMA is typically done using a harder DMA engine on the controller. This offloads the transfer from the host CPU which is typically higher power than the logic of the DMA engine.

Q. How does the latency of NVMe over Fibre compare to NVMe over PCIe?

A. The overall goal of having NVMe transported over any fabric is not to exceed 20us of latency above and beyond a PCIe-based NVMe solution. Having said that, there are many aspects of networked storage that can affect latency, including number of hops, topology size, oversubscription ratios, and cut-through/store-and-forward switching. Individual latency metrics are published by specific vendors. We recommend you contact your favorite Fibre Channel vendor for their numbers.

Q. Which of these technologies will grow and prevail over the next 5-10 years…

A. That is the $64,000 question, isn’t it? J The basic premise of this presentation was to help illuminate what controllers are, and the different types that exist within a storage environment. No matter what specific flavor becomes the most popular, these basic tenets will remain in effect for the foreseeable future.

Q. I am new to Storage matters, but I have been an IT tech for almost 10 years. Can you explain Block vs. File IO?

A. We’re glad you asked! We highly recommend you take a look at another one of our webinars, Block vs. File vs. Object Storage, which covers that very subject!

If you have an idea for another topic you’re “Too Proud to Ask” about, let us know by commenting in this blog.

File, Block and Object Storage: Real-world Questions, Expert Answers

More than 1,200 people have already watched our Ethernet Storage Forum (ESF) Great Storage Debate webcast “File vs. Block vs. Object Storage.” If you haven’t seen it yet, it’s available on demand. This great debate generated many interesting questions. As promised, our experts have answered them all here.

Q. What about the encryption technologies on file storage? Do they exist, and how do they affect the performance compared to unencrypted storage?

A. Yes, encryption of file data at rest can be done by the storage software, operating system, or the drives themselves (self-encrypting drives). Encryption of file data on the wire can be done by the storage software, OS, or specialized network cards. These methods can usually also be applied to block and object storage. Encryption requires processing power so if it’s done by the main CPU it might affect performance. If encryption is offloaded to the HBA, drive, or SmartNIC then it might not affect performance.

Q. Regarding block size, I thought that block size settings were also used to tune and optimize file protocol transfer, for example in NFS, am I wrong?

A. That is correct, block size refers to the size of data in each I/O and can be applied to block, file and object storage, though it may not be used very often for object storage. NFS and SMB both let you specific block I/O size.

Q. What is the main difference between object and file? Is it true that File has a hierarchical structure, while object does not?

A. Yes that is one important difference. Another difference is the access method–folder/file/offset for files and key-value for objects.   File storage also often allows access to specific data within a file and in many cases shared writes to the same file, while object storage typically offers only shared reads and most object storage systems do not allow direct updates to existing objects.

Q. What is the best way to backup a local Object store system?

A. Most object storage systems have built-in data protection using either replication or erasure coding which often replicates the data to one or more remote locations. If you deploy local object storage that does not include any remote replication or erasure coding protection, you should implement some other form of backup or replication, perhaps at the hardware or operating system level.

Q. I feel that this discussion conflates object storage with cloud storage features, and presumes certain cloud features (for example security) that are not universally available or really part of Object Storage.   This is a very common problem with discussions of objects — they typically become descriptions of one vendor’s cloud features.

A. Cloud storage can be block, file, and/or object, though object storage is perhaps more popular in public and private cloud than it is in non-cloud environments. Security can be required and deployed in both enterprise and cloud storage environments, and for block, file and object storage. It was not the intention of this webinar to conflate cloud and object storage; we leave that to the SNIA Cloud Storage Initiative (CSI).

Q. How do open source block, file and object storage products play into the equation?

A. Open source software solutions are available for block, file and object storage. As is usually the case with other open-source, these solutions typically make storage (block, file or object) available at a lower acquisition cost than commercial storage software or appliances, but at the cost of higher complexity and higher integration/support effort by the end user. Thus customers who care most about simplicity and minimizing their integration/support work tend to buy commercial appliances or storage software, while large customers who have enough staff to do their own storage integration, testing and support may prefer open-source solutions so they don’t have to pay software license fees.

Q. How is data [0s and 1s in hard disk] converted to objects or vice versa?

A. In the beginning there were electrons, with conductors, insulators, and semi-conductors (we skipped the quantum physics level of explanation). Then there were chip companies, storage companies, and networking companies. Then The Storage Networking Industry Association (SNIA) came along… The short answer is some software (running in the storage server, storage device, or the cloud) organizes the 0s and 1s into objects stored in a file system or object store. The software makes these objects (full of 0s and 1s) available via a key-value systems and/or a RESTful API. You submit data (stream of 1s and 0s) and get a key-value in return. Or you submit a key-value and get the object (stream of 1s and 0s) in return.

Q. What is the difference (from an operating system perspective where the file/object resides) between a file in mounted NFS drive and object in, for example Google drive? Isn’t object storage (under the hood) just network file system with rest API access?

A. Correct–under the hood there are often similarities between file and object storage. Some object storage systems store the underlying data as file and some file storage systems store the underlying data as objects. However, customers and applications usually just care about the access method, performance, and reliability/availability, not the underlying storage method.

Q. I’ve heard that an Achilles’ Heel of Object is that if you lose the name/handle, then the object is essentially lost.   If true, are there ways to mitigate this risk?

A. If you lose the name/handle or key-value, then you cannot access the object, but most solutions using object storage keep redundant copies of the name/handle to avoid this. In addition, many object storage systems also store metadata about each object and let you search the metadata, so if you lose the name/handle you can regain access to the object by searching the metadata.

Q. Why don’t you mention concepts like time to first byte for object storage performance?

A. Time to first byte is an important performance metric for some applications and that can be true for block, file, and object storage. When using object storage, an application that is streaming out the object (like online video streaming) or processing the object linearly from beginning to end might really care about time to first byte. But an application that needs to work on the entire object might care more about time to load/copy the entire object instead of time to first byte.

Q. Could you describe how storage supports data temperatures?

A. Data temperatures describe how often data is accessed, where “hot” data is accessed often, “warm” data occasionally, and “cold” data rarely. A storage system can tier data so the hottest data is on the fastest storage while the coldest data is on the least expensive (and presumably slowest) storage. This could mean using block storage for the hot data, file storage for the warm data, and object storage for the cold data, but that is just one option. For example, block storage could be for cold data while file storage is for hot data, or you could have three tiers of file storage.

Q. Fibre channel uses SCSI. Does NVMe over Fibre Channel use SCSI too? That would diminish NVMe performance greatly.

A. NVMe over Fabrics over Fibre Channel does not use the Fibre Channel Protocol (FCP) and does not use SCSI. It runs the NVMe protocol over a FC-NVMe transport on top of the physical Fibre Channel network.   In fact none of the NVMe over Fabrics options use SCSI.

Q. I get confused when some one says block size for block storage, also block size for NFS storage and object storage as well. Does block size means different for different storage type?

A. In this case “block size” refers to the size of the data access and it can apply to block, file, or object storage. You can use 4KB “block size” to access file data in 4KB chunks, even though you’re accessing it through a folder/file/offset combination instead of a logical block address. Some implementations may limit which block sizes you can use. Object storage tends to use larger block sizes (128KB, 1MB, 4MB, etc.) than block storage, but this is not required.

Q. One could argue that file system is not really a good match for big data. Would you agree?

A. It depends on the type of big data and the access patterns. Big data that consists of large SQL databases might work better on block storage if low latency is the most important criteria. Big data that consists of very large video or image files might be easiest to manage and protect on object storage. And big data for Hadoop or some machine learning applications might work best on file storage.

Q. It is my understanding that the unit for both File Storage & Object storage is File – so what is the key/fundamental difference between the two?

A. The unit for file storage is a file (folder/file/offset or directory/file/offset) and the unit for object storage is an object (key-value or object name). They are similar but not identical. For example file storage usually allows shared reads and writes to the same file, while object storage usually allows shared reads but not shared writes to the object. In fact many object storage systems do not allow any writes or updates to the middle of an object–they either allow only appends to the end of the object or don’t allow any changes to an object at all once it has been created.

Q. Why is key value store more efficient and less costly for PCIe SSD? Can you please expand?

A. If the SSD supports key-value storage directly, then the applications or storage servers don’t have to perform the key-value translation. They simply submit the key value and then write or read the related data directly from the SSDs. This reduces the cost of the servers and software that would otherwise have to manage the key-value translations, and could also increase object storage performance. (Key-value storage is not inherently more efficient for PCIe SSDs than for other types of SSDs.)

Interested in more SNIA ESF Great Storage Debates? Check out:

If you have an idea for another storage debate, let us know by commenting on this blog. Happy debating!

FCoE vs. iSCSI vs. iSER: Get Ready for Another Great Storage Debate

As a follow up our first two hugely successful “Great Storage Debate” webcasts, Fibre Channel vs. iSCSI and File vs. Block vs. Object Storage, the SNIA Ethernet Storage Forum will be presenting another great storage debate on June 21, 2018. This time we’ll take on FCoE vs. iSCSI vs. iSER.

For those of you who’ve seen these webcasts, you know that the goal of these debates is not to have a winner emerge, but rather provide unbiased education on the capabilities and use cases of these technologies so that attendees can become more informed and make educated decisions.

Here’s what you can expect from this session: One of the features of modern data centers is the ubiquitous use of Ethernet. Although many data centers run multiple separate networks (Ethernet and Fibre Channel (FC)), these parallel infrastructures require separate switches, network adapters, management utilities and staff, which may not be cost effective.

Multiple options for Ethernet-based SANs enable network convergence, including FCoE (Fibre Channel over Ethernet) which allows FC protocols over Ethernet and Internet Small Computer System Interface (iSCSI) for transport of SCSI commands over TCP/IP-Ethernet networks. There are also new Ethernet technologies that reduce the amount of CPU overhead in transferring data from server to client by using Remote Direct Memory Access (RDMA), which is leveraged by iSER (iSCSI Extensions for RDMA) to avoid unnecessary data copying.

That leads to several questions about FCoE, iSCSI and iSER:

  • If we can run various network storage protocols over Ethernet, what differentiates them?
  • What are the advantages and disadvantages of FCoE, iSCSI and iSER?
  • How are they structured?
  • What software and hardware do they require?
  • How are they implemented, configured and managed?
  • Do they perform differently?
  • What do you need to do to take advantage of them in the data center?
  • What are the best use cases for each?

Register today to join our SNIA experts as they answer all these questions and more on the next Great Storage Debate: FCoE vs. iSCSI vs. iSER. We look forward to seeing you on June 21st.

 

Benchmarking Workload Storage Performance – An Expert Q&A

Nearly 1,000 people have watched our most recent SNIA ESF webcast, Storage Performance Benchmarking: Workloads. We hope you didn’t miss this 5th and final installment of the now famous Storage Performance Benchmarking webcast series where our experts, Mark Rogov and Chris Coniff, explained how to measure and optimize storage performance of workloads. If you haven’t seen it, it’s available on-demand. The live audience had many great questions. Here are answers to them all.

Q. Is it good to assume that sequential IO would benefit from large IO size and, conversely, random IO from small IO size?

A. I don’t think “benefit” is a practical way to look at this. Workloads come in all sizes and mixes, and it is the job of the storage array to handle what is thrown at it. Storage admins (and placement algorithms) need to configure the system to produce the best performance given the current load. Historically, random workloads are harder to optimize than sequential. Mixing block sizes, even with sequential workloads, is also tough to deal with. It all comes to figuring out where the bottlenecks are, and how to overcome them.

Q. But pre-fetch, cache, etc. will show benefit to sequential IO on all Flash array versus random?

A. Technically, we need to look at the effects of cache and pre-fetch on Reads and Writes separately. For Reads, pre-fetching data into cache does show a lot of benefit, especially for sequential IO. For Writes, pre-fetching is not effective, but cache is: it is, generally, faster to save data to cache than straight to disk (assuming, of course, there is free space in cache to write to).

Q. Don’t you see [concatenation of small IOs into larger IO] when apps are inside a VM versus physical nodes? I ask because [block size] changes with versions of hypervisors.

A. This is a great question, and a big misconception floating out there. Hypervisors have three primary methods for accessing storage: direct block, NAS, or via internal filesystem.

The direct block, aka raw device mapping, is simple—all IOs are simply sent to the storage array as they are. No concatenation, folding, compression, etc.

Internal filesystem, VMFS, has a concept of a block size. These blocks are used for internal management of the filesystem (see our File webcast with explanation on how those work). A common misconception is that when a block size is smaller than the write IO size, the filesystem issues a write equal to its block size. In reality, the writes are simply passed to the underlying driver, plus some additional meta-data IO. The amount of data being written doesn’t change just because it is written into a larger block container. In cases when the write IO is larger than a block size of the FS, yes, the storage will see a multiple of IOs, depending on what the ratio is. In VMFS though, block sizes are usually quite large: 1MiB, 4MiB, 8MiB. Very few (if any!) workloads have IOs that big. VMFS also has a concept of “sub blocks”, which are smaller than 1MiB, but are also quite large: 64KiB, with the same logic applied.

NAS communication is the most complex. For the purposes of this question, consider the Ethernet Maximum Transfer Unit size (MTU). It regulates the size of the frame, which by default is 1500 bytes. Therefore, all data must be either smaller than or cut into 1500 byte chunks to be sent across the wire. For example, 4KiB IO will be split into 3 frames: 1500, 1500, and 1096. Sometimes, MTU is set to a “jumbo setting”, or 9000. Then each 4KiB IO will fit into one frame. With NFSv4, the protocol allows combining several NFS calls into one frame. Theoretically, that means that two NFS write calls for 4KiB could fit into a single 9KB jumbo Ethernet frame. In reality, one needs to examine closely which specific NFS calls are truly being used—see our File webcast on details of how block commands get translated to and from filesystem calls and extrapolate that into NFS calls (FS and NFS are not the same!).

Bottom line, regardless of the datastore access method, in most cases, your workload IO will be passed to the storage array as is without coalescing.

Q. There are several other factors that have to be considered: 1) More than random/sequential it’s I/O adjacency that matters most. Think how differently a hybrid storage system would handle random I/O to 5% of the volume vs. even sequential to the whole volume. 2) Does the data dedupe/compress using the array’s algorithm?

A. I agree somewhat with this comment. Adjacency of the IO is a good way to think about things. Intelligent placement algorithms do have a concept of a “Working Area,” which could elect promoting whole regions to the faster storage tier to speed up all the “adjacent” random requests. Data reduction (compression, deduplication, single-instancing, zero-padding, etc.) introduces an overhead in some arrays, and therefore is mudding the picture somewhat, yet the underlying principles remain the same. Keep in mind, this is a vendor neutral presentation, very commonly the differences in handling data reduction are heavily marketed between different solutions.

Q. I think Mark is at a different company now 🙂 2 4 8 is NOT sequential, it is strided, or Geometric. Holistically, If proc A reads 2,4,6 and Proc B read 1,3,5,7 then the strides are within a proc, but holistically, this would be sequential. Though few operating systems do this level of pre-fetch logic.

A. Good catch, and I agree. In a later revision of the slides, we changed the 2, 4, 6 sequence into “predictable” pattern, not sequential.

Q. If you have an all Flash array what kind of performance hit do u take between sequential / random reads? I would think that it isn’t as impactful as a mechanical drive.

A. The answer can be true under one condition and not another (LUN is in a write-pending condition, cache flushing is taking place, etc.).  

Generally speaking, reads always perform better than writes on flash drives.   As for random vs sequential IO, they are measured with two different metrics (iops vs throughput). In order to answer that part of the question, their measurements must be normalized to a common KPI, such that they could be compared.   And to do that, we’d have to know what IO block size the question assumes, as it must be known to solve for ‘IO size x IOPS = Throughput in MB/s’

This is from the spec sheet for one vendor’s SSD:

  • Sequential Read (up to)450 MB/s
  • Sequential Write (up to)380 MB/s
  • Random Read (100% Span)67500 IOPS
  • Random Write (100% Span)17500 IOPS
  • Latency – Read40 µs
  • Latency – Write42 µs

So, the ‘It Depends” answer is based on the manufacture & model of drive at specific code level, and (if it’s in an array) how the specific array vendor implemented it in their design.   If it’s an Integrated Cache Disk Array (ICDA), then pre-fetching (read ahead) and caching algorithms behave differently at various code levels for each vendor.  There are specific used defined configuration parameters that can negate the answer to the question below as well.   For instance, High & Low water marks, Dynamic Cache Partitioning, Workload QoS, etc.

In the case of an ICDA, what’s more important than read vs write or random vs sequential is whether or not the IO was a cache hit.  A cache hit for a single random read IO in an ICDA who’s LUN is on a 7.2K SATA drive will have better performance than a random read miss on that same array if the LUN were on a flash drive.

So, as we’ve seen throughout this series, there is more to the overall performance benchmark than any one variable.

Q. What observations do you have on rebuild time for Flash disk? On what magnitude is it faster than spinning disks considering a high-end hybrid or AFA storage system?

A. This question is dangerous, as it crosses into vendor specifics. Rebuild time depends on more than just the type of drive, the RAID type and configuration, drive utilization, array business—and many more factors come into play here. Generally, an SSD drive will rebuild faster than a similarly sized spinning drive.

Q. What about the fact that you get cache effects in the OS stack and also in the Flash (DRAM landing areas) that actually improves write latency versus reads? Isn’t it worth mentioning Writes can actually appear higher performant on Flash? Or am I missing something? (Probably the latter) 🙂

A. You are 100% correct! 🙂 But do consider the size of DRAM… how much data can it take? Does the entire “working area” fit there? If it does, viola! If it does not – welcome to the rest of the world!

Q. Can you send links to all the webcasts in this series?

A. Of course, please see below and happy viewing!

  1. Storage Performance Benchmarking: Introduction and Fundamentals
  2. Storage Performance Benchmarking: Part 2  – Solution under Test
  3. Storage Performance Benchmarking: Block Components  
  4. Storage Performance Benchmarking: File Components  
  5. Storage Performance Benchmarking: Workloads