| Subcribe via RSS

The T in IT: Mr. T endorses Hitachi Gear

August 22nd, 2007 | | Posted in Comedy, Personal, Storage

Before I give the links to the videos, I want to give the typical disclaimer:

Disclaimer: The opinions expressed here are my personal opinions, views, discussions, etc. Content published here is not read or approved in advance by HDS, my wife or anyone else for that matter and does not - in any way - reflect the views and opinions, positions/etc of my employer. This is my personal, largely python-related blog. Not my employers.

That being said: A few months ago, I discovered (much by accident) that HDS (Hitachi Data Systems) has started a viral marketing compaign involving Mr. T - yes, the man from the A-Team (whose face graced my lunchbox as a child). Note that massive "lulz" were attained when watching these.

Without passing judgement or in any way stating a direct opinion, here are the videos, in order of creation:

For additional amusement, I will direct you to the Archivas (before we were bought by HDS) viral/spoof/etc video that made it to youtube, here.

Google’s Drive Study

February 19th, 2007 | | Posted in Storage, Technology


I saw this post on Slashdot the other day - it's a paper called " Failure Trends in a Large Disk Drive Population". It's a good read for anyone in the storage business - hell, it's a good read for anyone interested in computer. In section 5, under conclusions, they state:

In this study we report on the failure characteristics of consumer-grade disk drives. To our knowledge, the study is unprecedented in that it uses a much larger population size than has been previously reported and presents a comprehensive analysis of the correlation between failures and several parameters that are believed to affect disk lifetime. Such analysis is made possible by a new highly parallel health data collection and analysis infrastructure, and by the sheer size of our computing deployment.
One of our key findings has been the lack of a consistent pattern of higher failure rates for higher temperature drives or for those drives at higher utilization levels. Such correlations have been repeatedly highlighted by previous studies, but we are unable to confirm them by observing our population. Although our data do not allow us to conclude that there is no such correlation, it provides strong evidence to suggest that other effects
may be more prominent in affecting disk drive reliability in the context of a professionally managed data center deployment.

These two points are interesting. In some of the labs I've worked in, an astonishing number of drives die regularly. The manufacturer/distributor excuse has always been "heat issues" or "use cases". Admittedly, the temp. range Google capped at was 50 celsius (122 Fahrenheit). In a rack with densely stacked servers (1-2U machines, rack filled) and with those machines running close to 75% and above CPU load with non-stop disk I/O (read, write, delete/format) and constant machine power cycles the temp. inside the racks could spike far past the 122 mark at which point the failure-trend Google marks starts to spike again.

Of course, in the labs I've been in, we were using these as test bed machines - total/high reliability was not something direly important for the simple fact that these machines were disposable.

Even with that in mind: You should always assume your disk drives are going to fail sooner than you expect. The MTBF on a large enough pool of disks not configured in a "smart" configuration (i.e. raid, arrays, etc). I'm not talking about consumer-use patterns (although, I just had a drive go south on my laptop) - I'm talking about datacenter/IT/etc use cases.

The Google paper is a good reference case, but you should remember that all use patterns are different. An application/test or system that really puts the disks to use can cause drive failures much earlier than you (or any paper) might assume. A good chunk of the "storage industry" realized this long ago - this is why companies (cough) work on software applications and intelligent hardware "wrappers" (arrays, raids, etc) to work around the basic assumption that in a large enough pool of drives, you're going to have near constant drive failure. People might disagree with the prices or methodology, but the fact remains that the basic assumption is true.

Of course, that reasoning can be held for any piece of hardware in the typical data center. Apply too much heat/load to a pool of machines and your failure rate it going to be high unless the machines were designed with high-reliability in mind (which normal indicates RAID/Fiber/etc storage).

In any case, the paper is a good read. I've gone and started rambling. If you're looking for some tools to test drives/filesystems in general, I'd take a look at the standard Bonnie/Bonnie++ and other tools, but also take a look at Rugg (built in python) and also remember that it's important to stress a drive below the filesystem layer. Typically, this means raw-writing to the device - if you're job is to test drive speed/reliability or test the reliability of drive drivers for your operating system, that's a step you can't forget.

Update: StorageMojo has a more detailed breakdown.

Hitachi buys Archivas.

February 6th, 2007 | | Posted in Storage, Technology


That's right. My awesome employer, a storage startup called Archivas is being purchased by Hitachi Data Systems, a wholly owned subsidiary of Hitachi Limited.

This is of course, awesome new for us as a company - but on a personal level this purchase shows a belief and commitment on the part of Hitachi when it comes to the product and technology I've put blood and sweat and tears into for the past 3 years.

Some of the news links:
From the HDS site
Search Storage
eWeek Coverage
MarketWatch

When I go to pyCon this month - I get to go as an HDS employee. This is truly awesome.

HDS, Archivas team on fixed-content archiving

February 27th, 2006 | | Posted in Storage, Technology

HDS, Archivas team on fixed-content archiving
Hitachi Data Systems and Archivas Forge Global Partnership to Create Solutions for a New “Active Archive” Market Space
And that's all I am going to say about that.

The essence of a long-term digital archive

January 16th, 2006 | | Posted in Storage, Technology

You should read this: The essence of a long-term digital archive

An excellent article describing digital archives in the modern IT field. For most people, this article might seem irrelevant (in the scope of things I have talked about previously - it is). However, given that this is the field I work in, it's a subject near and dear to me.

Of course, in the interest of full disclosure, I work in this field, and I also happen to know the article's author.

A note on "what python is used for" - and a Storage note.

October 4th, 2005 | | Posted in Programming, Python, Storage, Technology

First, I saw this:Question: "What exactly is python used for...

The question itself is inane - but the answer is what caught my eyes.

To Quote:
Too busy to answer properly, but I'll just mention that Fermilab uses a home-grown python package called Enstore to manage their data store of 3 Petabytes of physics data, growing at 1PB/year. The transfers of ~25TB/day to and from that system is what keeps me busy.

And then he provides this link: Presentation about the Enstore System

The answer was from a Mailing List.

The full response is at that link Googling Enstore up yields a few links:

Access to Mass Storage at FNAL

And: Dcache.Org

This is some code I have Got to see.

And to add: Information on the Enstore Project