File vs. Block vs. Object Storage – Are Worlds Colliding?

When it comes to storage, a byte is a byte is a byte, isn’t it?

One of the enduring truths about simplicity is that scale makes everything hard, and with that comes complexity. And when we’re not processing the data, how do we store it and access it?

The only way to manage large quantities of data is to make it addressable in larger pieces, above the byte level. For that, we’ve designed sets of data management protocols that help us do several things: address large lumps of data by some kind of name or handle, organize it for storage on external storage devices with different characteristics, and provide protocols that allow us to programmatically write, find, and read it.

On April 17th, the SNIA Ethernet Storage Forum will host another of its “Great Debates” webcasts. This time, it’s “File vs. Block vs. Object Storage.” In this live webcast, our experts, Mark Carlson, Alex McDonald and Saqib Jang will compare three types of data organization: file, block and object storage, and the access methods that support them. Each has its own set of use cases, advantages and disadvantages. Each provides data management ranging from simple to sophisticated, and each makes different demands on storage devices and programming technologies.

Perhaps you’re comfortable with block and file, but are interested in investigating the more recent class of object storage and access. Perhaps you’re happy with your understanding of objects, but would really like to understand files a bit better. Or perhaps you want to understand how file, block and object are implemented on the underlying storage systems – and how one can be made to look like the other, depending on how the storage is accessed. Join us as we discuss and debate:

  • Storage devices
    • How different types of storage drive different management & access solutions
    • Which use cases tend to favor block, file or object
  • Block
    • Where everything is in fixed-size chunks
    • SCSI and SCSI-based protocols, and how FC and iSCSI fit in
  • Files
    • When everything is a stream of bytes
    • NFS and SMB
  • Objects
    • When everything is a BLOB
    • HTTP, key value and RESTful interfaces
  • Altogether…
    • When files, blocks and objects collide, it will rock your world!

I will be moderating this “friendly debate” where there won’t be winners or losers, just more information on these three popular data storage technologies. We hope you will register today to come join the debate on April 17th.

And if you missed our first hugely popular “Great Debate” – Fibre Channel vs. iSCSI, it’s now available on-demand.

An FAQ to Make Your Storage System Hum

In our most recent “Everything You Wanted To Know About Storage But Were Too Proud To Ask” webcast series – Part Sepia – Getting from Here to There, we discussed terms and concepts that have a profound impact on storage design and performance. If you missed the live event, I encourage you to check it our on-demand. We had many great questions on encapsulation, tunneling, IOPS, latency, jitter and quality of service (QoS). As promised, our experts have gotten together to answer them all.

Q. Is there a way to measure jitter?

A. Jitter can be measured directly as a statistical function of the latency, typically as the Variance or Standard Deviation of the latency. For example a storage device might show an average latency of 5ms with a standard deviation of 1.5ms. This means roughly 95% of the transactions have a latency between 2ms and 8ms (average latency plus/minus two standard deviations), however many storage customers measure jitter indirectly by showing the 99.9%, 99.99%, or 99.999% latency. For example if my storage system has 99.99% latency of 8ms, it means 99.99% of transactions have latency <=8ms and 1/10,000 of transactions have latency >8ms. Percentile latency is an indirect measure of jitter but often easier to calculate or understand than the actual jitter.

Q. Can jitter be easily characterized for storage, media, and networks.   How and what tools are available for doing this?

A. Jitter is usually easy to measure on a network using standard network monitoring and reporting tools. It may or may not be easy to measure on storage systems or storage media, depending on the tools available (either built-in to the storage OS or using an external management or monitoring tool).   If you can record the latency of each transaction or packet, then it’s easy to calculate and show the jitter using standard statistical measures such as Variance or Standard Deviation of the latency. What most customers do is just measure the 99.9%, 99.99%, or 99.999% latency. This is an indirect measure of jitter but is often much easier to report and understand than the actual jitter.

Q.  Generally IOPS numbers are published for a particular block size like 8k write/read size, but in reality, IO request per second could be of mixed sizes, what is your perspective on this?

A. Most IOPS benchmarks test only one I/O size at a time. Most individual real workloads (for example databases) also use only one I/O size.  It is true that a storage controller or HDD/SSD might need to support multiple workloads simultaneously, each with a different I/O size.  While it is possible to run benchmarks with a mix of different I/O sizes, it’s rarely done because then there are too many workload combinations to test and publish. Some storage devices do not perform well if they must handle both small random and large sequential workloads simultaneously, so a smart storage controller might assign different workload types to different disk groups.

Q. One often misconfigured parameter is queue depth. Can you talk about how this relates to IOPS, latency and jitter?

A. Queue depth indicates how many tasks or I/Os can be lined up for a particular resource, such as a storage controller, network interface, or CPU. Having a higher queue depth ensures the resource stays highly utilized because it always has a new task to do as soon as it finishes its current task(s). This can result in higher IOPS because the CPU is less likely to have idle time waiting for new tasks to be put into its queue. But it could also increase latency because longer queues mean each task spends more time waiting in a queue.  It’s easy to misconfigure queue depth because you it needs to be deep enough to keep the resource (CPU/controller/interface) busy but not so deep that each transaction spends a long time in the queue.

Q. Can you please repeat all your examples of tunneling? GRE, MPLS, what others? How can it be IPv4 via IPv6?

A. VXLAN, LISP, GRE, MPLS, IPSEC.   Any time you encapsulate and send one protocol over another and decapsulate at the other end to send the original frame that process is tunneling. In the case we showed of IPv6 over IPv4, you are taking an original IPv6 frame with its IPv6 header of source address to destination address all IPv6 and sending it over and IPv4 enabled network we are encapsulating the IPv6 frame with an IPv4 header and “tunneling” IPv6 over the IPv4 network.

Q. I think it’d be possible to configure QoS to a point that exceeds the system capacity. Are there any safeguards on avoiding this scenario?

A. Some types of QoS allow over-provisioning and others do not. For example a QoS that imposes only maximum limits (and no minimum guarantees) on workloads might not prevent many workloads from exceeding system capacity. If the QoS allows over-provisioning, then you should use system monitoring and alerts to warn you when system capacity has been exceeded, or when any workloads are not getting their minimum guaranteed performance.

Q. Is there any research being done on using storage analytics along with artificial intelligence (AI) to assist with QoS?  

A. There are a number of storage analytics products, both third party and storage vendor specific that help with QoS. Whether any of these tools may be described as using AI is debatable, since we’re in the early days of using AI to do much in the storage arena. There are many QoS research projects, and no doubt they will eventually make their way into commercially available products if they prove useful.

Q. Are there any methods (measurements) to calculate IOPS/MBps in tier capable storage? Would it be wrong metric if we estimate based on medium level, example tier 2 (between 1 and 3)?

A. This question needs refinement, since tiering is sometimes a cache model rather than a data movement model. And knowing the answer may not actually help! Vendors do have tools (normally internal, since they are quite complex) that can help with the planning of tiered storage.

By now, we hope you’re not “too proud” to ask some of these storage networking questions. We’ve produced four other webcasts in this “Everything You Wanted To Know About Storage,” series to date. They are all available on-demand. And you can register here for our next one on July 6th where we’ll bring in experts to discuss:

  • Storage APIs and POSIX
  • Block, File, and Object storage
  • Byte Addressable and Logical Block Addressing
  • Log Structures and Journaling Systems

The Ethernet Storage Forum team and I hope to see you there!

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.

 

 

Clustered File Systems: No Limits

Today’s storage world would appear to have been divided into three major and mutually exclusive categories: block, file and object storage. The marketing that shapes much of the user demand would appear to suggest that these are three quite distinct animals, and many systems are sold as exclusively either SAN for block, NAS for file or object. And object is often conflated with cloud, a consumption model that can in reality be block, file or object.

A fixed taxonomy that divides the storage world this way is very limiting, and can be confusing; for instance, when we talk about cloud. How should providers and users buy and consume their storage? Are there other classifications that might help in providing storage solutions to meet specific or more general application needs? What about customers who need file access performance beyond what one storage box can provide? Which options support those who want scale-out solution like object storage with file protocol semantics?

To clear up the confusion, the SNIA Ethernet Storage Forum is hosting a live Webcast, “Clustered File Systems: No Limits.” In this Webcast we will explore clustered storage solutions that not only provide multiple end users access to shared storage over a network, but allow the storage itself to be distributed and managed over multiple discrete storage systems. You’ll hear:

  • General principles and specific clustered and distributed systems and the facilities they provide built on the underlying storage
  • Better known file systems like NFS, IBM Spectrum Scale (GPFS) and Lustre, along with a few of the less well known
  • How object based systems like S3 have blurred the lines between them and traditional file based solutions

This Webcast should appeal to those interested in exploring some of the different ways of accessing & managing storage, and how that might affect how storage systems are provisioned and consumed. POSIX and other acronyms may be mentioned, but no rocket science beyond a general understanding of the principles of storage will be assumed. Contains no nuts and is suitable for vegans!

As always, our experts will be on hand to answer your questions on the spot. Register now for this October 25th event.

Update: If you missed the live event, it’s now available  on-demand. You can also  download the webcast slides.