Miscellanea - Python Sprints, Nasuni, etc.

by jesse in , , ,

I've obviously been quiet here on my personal blog - as everyone who reads regularly knows I'm neck-deep in a pretty exciting startup call Nasuni as well as doing other projects, like the PSF Sponsored sprints thing. That combined with twitter means my time for other additional long-form content is minimal. So here's a small roundup of interesting things:


Yup, still running Python and Django! We're actually pretty proud to be a sponsor for DjangoCon 2010 coming up in September - I'll be attending, so I hope to see all the familiar Django faces I know, and meet some new ones.

I've been blogging semi-regularly for the Nasuni blog itself - my posts are focused on product-things more than anything else. Here's a small list of posts which I've done:

  • The Road to Release - Feature Previews - this is actually my latest one, and the first in a series where I'll be showing off some of the new features we're adding in the latest release.
  • Looking at OpenStack, a Rackspace and NASA initiative - For those of you who don't know, Rackspace and NASA announced OpenStack - the awesome part? It's all python - I had the swift component (which powers Rackspace's cloudfiles system) of OpenStack running pretty quickly. I'd recommend snagging the code from launchpad and taking a look. Swift (the storage component) uses eventlet - and Nova (the compute part) uses Tornado and Twisted.
  • Storage Switzerland Test Drives the Filer - This is a response to an article written about the product - I actually used it to preview some of the work going into the next release of the Filer.
  • Thanks to Django - This piece goes into some detail about our use of Django, it's one of our ways of saying thanks. I still need to rework it so we can send it over for the Django Success Stories page.
  • Thanks to the Supporting Cast - This is an earlier thank you post - but to the other people who have helped out a ton, including Greg Newman, Lincoln Loop, and Revsys.
  • The Donut Solution - This was a fun one, mainly to show that yes - we're listening hard to customer feedback, and we're improving/iterating quickly. Also, I get to show off UI improvements.
  • Finally - The Nasuni Blog team - this is the rosetta stone for the authors of the blog, describing who we are. I didn't write this piece, but it's good reading to figure out who is who.

If you're interested in Nasuni - or cloud storage in general - I'd encourage you to sign up for the RSS feed. We're trying to keep the information useful outside of "just us" (despite my urge and predilection to churning out completely product-related posts) - and if you ever have feedback, drop us a line.

PSF Sponsored Sprints

The project continues on - we've funded two sprints so far, and have several more coming down the pike. We're always in need of volunteers to help us do things like the manuals and site maintenance/content authoring. Here's some highlights:

  • The call for applications is open - The call for applications is open - and now I suspect we won't be closing it. Originally, I thought we'd have to do things in waves of apply-approve. As time has progressed, I no longer think this is the case.
  • Montreal Python Packaging sprint wrap up - the wrap up report for our first sprint!
  • Europython core sprint report - another wrap up report for the core sprint we provided funds to.
  • Just added the locations page - we now have people/companies offering up space for sprinters! Check it out!
  • Finally - Sprints at PyOhio - PyOhio is going on this weekend, if you're in the area you should really go check it out! Catherine has gone above and beyond with the entire "become a contributor" effort going on.

Please! If you're thinking about holding a sprint - send us an application! Heck, even if we're not sponsoring it, we'll help promote you via the blog, and the sprint calendar we have up. A little fact? The sprints we've funded so far, and that are on deck for funding are all outside of the US, which is both awesome, and surprising!

PSF Board

Some of you probably know that I'm currently on the board of directors for the PSF - things progress well here, but I mainly wanted to call out the excellent blog Doug Hellmann has been authoring for PSF news. You should really be watching that because yes - we do do things, and hopefully over the next year, we'll be doing more awesome things.

I've actually got a bigger post in the works for what I think the ultimate mission of the PSF is/should be as well as "how do you get money from us" as well. Must find the time!

Say Hello - Nasuni Launches Today!

by jesse in , ,

nasuni_final.png The company I've worked for since July of last year - Nasuni Corporation (a startup in Massachusetts) has gone live! This is the culmination of a lot of hard, but exceedingly fun and exciting work over the past months. The Nasuni team is an excellent one - and one I am very, very proud to be a part of. Our product is called the Nasuni Filer - a simple-to-use, versioned, encrypted and cloud-storage backed virtual NAS (network attached storage) server (click here for more information).

Without going into all of the features, our goal in making this was to make cloud storage simple, accessible and secure - and I know we've accomplished all three. All you do is download it, boot it and start using it - once you do so you have access to truly unlimited storage. It's an unlimited filesystem for the cloud. Here's the elevator pitch:

Nasuni has developed a virtual file server, called the Nasuni Filer, that delivers unlimited file storage and complete file protection for businesses. Working in partnership with leading cloud storage vendors, the Nasuni Filer leverages the vast capacity of the cloud to store and protect company files offsite, while retaining the local functionality and performance of a traditional NAS.

This technology allows businesses to use the cloud provider of their choice as a replacement for traditional primary storage. Snapshots, file versioning, and offsite storage are integrated into the file server itself - ensuring business file are safe and secure at all times. No need to manage complex backup and DR schemes - if the file server is running, files are protected.

We've launched the Beta of the product today - anyone can sign up, download and use it. Anyone can give us feedback and suggestions - I encourage all of you who might need something like this to download and give it a try. If you want - go check out the videos we've put together showcasing the Filer (and better yet - check out the awesome animated cartoon we have on the front page).

Most of you know that my blog is mainly Python oriented. Suffice it to say, Nasuni - and the Nasuni Filer make use of Python for a wide range of tasks. We use Python, Django and as much of the Python ecosystem as we can to drive everything from the website, to the GUI on the appliance itself - Python is part of the DNA of the company, and it has served us well. Without Open Source and Python - I don't think it would have been possible to build what we have built in as little time as we have.

We have a strong dedication to not just Python, but open source in general (and a fair number of us will be at PyCon this month). As time progresses, now that we're exiting stealth mode we plan on possibly open sourcing stuff we feel would benefit the community. Some of us already push patches back where and when we can, but as I said - as time progresses this involvement will only increase.

So not only am I proud to announce the product, be part of this team and to see what we've made, I'm also happy to thank so many people in the Python and OSS community which have helped us reach this point.

So go - check it out, let us know what you think.

The T in IT: Mr. T endorses Hitachi Gear

by jesse in , ,

Before I give the links to the videos, I want to give the typical disclaimer:

Disclaimer: The opinions expressed here are my personal opinions, views, discussions, etc. Content published here is not read or approved in advance by HDS, my wife or anyone else for that matter and does not - in any way - reflect the views and opinions, positions/etc of my employer. This is my personal, largely python-related blog. Not my employers.

That being said: A few months ago, I discovered (much by accident) that HDS (Hitachi Data Systems) has started a viral marketing compaign involving Mr. T - yes, the man from the A-Team (whose face graced my lunchbox as a child). Note that massive "lulz" were attained when watching these.

Without passing judgement or in any way stating a direct opinion, here are the videos, in order of creation:

For additional amusement, I will direct you to the Archivas (before we were bought by HDS) viral/spoof/etc video that made it to youtube, here.

Google’s Drive Study

by jesse in ,

I saw this post on Slashdot the other day - it's a paper called " Failure Trends in a Large Disk Drive Population". It's a good read for anyone in the storage business - hell, it's a good read for anyone interested in computer. In section 5, under conclusions, they state:

In this study we report on the failure characteristics of consumer-grade disk drives. To our knowledge, the study is unprecedented in that it uses a much larger population size than has been previously reported and presents a comprehensive analysis of the correlation between failures and several parameters that are believed to affect disk lifetime. Such analysis is made possible by a new highly parallel health data collection and analysis infrastructure, and by the sheer size of our computing deployment. One of our key findings has been the lack of a consistent pattern of higher failure rates for higher temperature drives or for those drives at higher utilization levels. Such correlations have been repeatedly highlighted by previous studies, but we are unable to confirm them by observing our population. Although our data do not allow us to conclude that there is no such correlation, it provides strong evidence to suggest that other effects may be more prominent in affecting disk drive reliability in the context of a professionally managed data center deployment.

These two points are interesting. In some of the labs I've worked in, an astonishing number of drives die regularly. The manufacturer/distributor excuse has always been "heat issues" or "use cases". Admittedly, the temp. range Google capped at was 50 celsius (122 Fahrenheit). In a rack with densely stacked servers (1-2U machines, rack filled) and with those machines running close to 75% and above CPU load with non-stop disk I/O (read, write, delete/format) and constant machine power cycles the temp. inside the racks could spike far past the 122 mark at which point the failure-trend Google marks starts to spike again.

Of course, in the labs I've been in, we were using these as test bed machines - total/high reliability was not something direly important for the simple fact that these machines were disposable.

Even with that in mind: You should always assume your disk drives are going to fail sooner than you expect. The MTBF on a large enough pool of disks not configured in a "smart" configuration (i.e. raid, arrays, etc). I'm not talking about consumer-use patterns (although, I just had a drive go south on my laptop) - I'm talking about datacenter/IT/etc use cases.

The Google paper is a good reference case, but you should remember that all use patterns are different. An application/test or system that really puts the disks to use can cause drive failures much earlier than you (or any paper) might assume. A good chunk of the "storage industry" realized this long ago - this is why companies (cough) work on software applications and intelligent hardware "wrappers" (arrays, raids, etc) to work around the basic assumption that in a large enough pool of drives, you're going to have near constant drive failure. People might disagree with the prices or methodology, but the fact remains that the basic assumption is true.

Of course, that reasoning can be held for any piece of hardware in the typical data center. Apply too much heat/load to a pool of machines and your failure rate it going to be high unless the machines were designed with high-reliability in mind (which normal indicates RAID/Fiber/etc storage).

In any case, the paper is a good read. I've gone and started rambling. If you're looking for some tools to test drives/filesystems in general, I'd take a look at the standard Bonnie/Bonnie++ and other tools, but also take a look at Rugg (built in python) and also remember that it's important to stress a drive below the filesystem layer. Typically, this means raw-writing to the device - if you're job is to test drive speed/reliability or test the reliability of drive drivers for your operating system, that's a step you can't forget.

Update: StorageMojo has a more detailed breakdown.