Skip Ribbon Commands
Skip to main content

Cory's Blog

:

Quick Launch

Stenoweb Home Page > Cory's Blog > Posts > Vintage Tiered Storage
January 30
Vintage Tiered Storage

I recently participated in a discussion where somebody essentially presumed somebody else was "stupid" because they (as a UNIX system administrator) weren't aware of a long-deprecated feature in a version of Windows that came out fifteen years ago.

There was a pretty lengthy discussion about the fact that stupid and ill-informed are different things, and that (especially for a professional UNIX admin) not knowing about a feature that was deprecated ten years ago seems irrelevant. I won't harp on it too much here, but suffice it to say that people with ideas like these grate on my sensibilities because it tends to be a symptom of a larger pattern of some negative language that is often both inaccurate and in extremely poor spirits.

The technical component was interesting though. This person has an old server (using SCSI disks) with probably about 700 gigabytes of usable space. This is a common configuration for a server in the late Windows 2000 and early Windows 2003 era. The person has an external SCSI card and a tape library with some number of LTO-3 drives. Some amount of the tapes installed are dedicated to backup, and the remainder are dedicated to a cold storage tier. There are 22 terabytes of addressable cold tier space in this person's system.

For those unfamiliar with the idea, Windows Remote Storage Manager (2003) allows the extension of capacity and the better utilization of resources by placing infrequently used files on tapes, which have traditionally been cheaper to scale than hard disks. Today's version of this is basically to use a few solid state disks for the most active part of your dataset, and then start scaling down to different types of disks. You might use small 10k disks for the next tier. Then 4TB/7200RPM disks for the next tier, and then 8TB SMR 5400RPM disks for the last tier. Today, tapes basically don't exist in active tiered data storage situations, because they're just too slow, and because expectations are different.

If you control the environment and you tell the accounting department that files they haven't touched in a year are going to take longer to load, then you know what to tell them when they call when they're trying to open documents from two years ago. You can't exactly put the oldest bits of somebody's Dropbox account or last year's Facebook pictures on tapes. Even in internal institutional environments, expectations are such that there's an expectation that years-old data will be easily at hand. I don't think that expectation is unreasonable given how much less expensive server hardware and bulk storage is today than it was even just a few years ago.

There were the normal lamentations about why Microsoft removed it from the system, but to me it seems pretty simple. Remote Storage Manager is a vestigial component of Windows that just didn't fit with the strategy or the needs of real customers, even in 2008. In 2008, SATA and relatively big disks and the ease of expanding SAS controllers meant that buying shelves of disks with monumental capacities was cheaper than dedicating another tape library (or a portion of your primary one) to a cold tier.

Although it has existed, I'm guessing the big difference is that for high availability reasons, Microsoft wanted to push customers to Distributed File System, which allows for much better high availability configurations, and makes more sense today than it would have in the '90s or between 2000 and 2003 anyway, because servers and disks use a lot less energy per terabyte than they did all those years ago.

As I mentioned, the person we were talking with has about 22TB worth of tapes, and 700GB worth of hot storage area. I don't know what their habits are generally like, but I know I need a lot more space than that to be active. I have a lot of data that should probably be sent to a robust-but-inexpensive cold storage location, but I'm not going to do that by adding a big tape library to my server.

It got me thinking… this person was very proud of their 22TB. I think they think they're in some kind of big leagues. The availability of inexpensive slow external and internal disks (things like 8TB SMR disks from Seagate) prompted a vehement reaction. You should never use USB storage. Important files and any amount of large amount of data should always be on a server!

It's an interesting and ultimately unreasonable expectation. There are many datasets that need to be stored locally. Lightroom libraries, as one example, can not be stored on a server. You can use synchronization tools to store that data on multiple desktops, however.

Back to servers though… storing large amounts of data is less expensive than it has ever been. 8TB disks, SMR and PMR, are inexpensive enough, you could put five of them in a Drobo 5C and have 24 or 32TB of fault tolerant storage on a server or desktop. You can put six 3.5-inch disks in most entry level servers today, so you're talking about 32 or 40TB of fault tolerant storage. (Side note: I'm considering a USB 3.0/3.1 card for TECT and a Drobo 5N or 5C as a backup system. Perhaps as a sort of "disk to disk to other disk" setup or as the tier you see before moving things to tapes or RDX cartridges manually.

I'm sure that in its day there was a lot of success and cost savings to be had by using a tape storage tier. Today, I just don't think it's reasonable. You can buy extremely dense disk enclosures from normal PC vendors these days, and for any enclosure attached to a controller that supports 4k sectors, you can easily swap out old disks for new ones to increase density and capacity on existing systems. You can often switch out disks one at a time and then grow into the new capacity using whatever software or RAID controller you're using.

In the light of the fact that it's possible to equip a single disk enclosure that will run over 600 terabytes of disks, I'm not sure it's necessary for storage tiering products using tape to exist. With disks using less energy than ever, density going up, disks costing less than tape below a few hundred terabytes, and with the ability to spin down disks, it doesn't make an awful lot of sense to bother with tapes. Especially since at "web scale" you need several copies of a piece of data anyway, meaning that tiered storage with tapes is going to need to work that much harder to work at the expected level.

I think it's neat that somebody has done it and documented it, but there are few enough situations where it's relevant that it probably didn't make sense to keep in the software. Using deduplication to reduce the overall load on disks in a server and using true archiving solutions to offload old data probably makes more sense than using tape storage in this context. I think that tape archiving and tape backups are good ideas, but tape file server doesn't make as much sense.

Comments

There are no comments for this post.