<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Python: Does it scale?</title>
	<atom:link href="http://jessenoller.com/2007/08/30/python-does-it-scale/feed/" rel="self" type="application/rss+xml" />
	<link>http://jessenoller.com/2007/08/30/python-does-it-scale/</link>
	<description>python, programming and other things</description>
	<pubDate>Thu, 24 Jul 2008 05:35:27 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: jesse</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-4934</link>
		<dc:creator>jesse</dc:creator>
		<pubDate>Sun, 09 Sep 2007 13:59:47 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-4934</guid>
		<description>Interesting. I had not heard of this - but it hasn't been updated since 2006 :(</description>
		<content:encoded><![CDATA[<p>Interesting. I had not heard of this - but it hasn&#8217;t been updated since 2006 :(</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: J Esteves</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-4846</link>
		<dc:creator>J Esteves</dc:creator>
		<pubDate>Sat, 08 Sep 2007 21:56:44 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-4846</guid>
		<description>&lt;blockquote&gt;
"A large scale, highly distributed storage system with one goal: it can never, ever loose data. So the better question is: Is Python, as a language, appropriate for distributed/fault tolerant mission-critical-zero-data-loss systems?"
&lt;/blockquote&gt;

Aching for Shane Hathaway to unveil &lt;a href="http://hathawaymix.org/Weblog/2006-05-17" rel="nofollow"&gt;Bit Mountain&lt;/a&gt;:

&lt;a href="http://hathawaymix.org/Writings/BitMountainPaper.doc" rel="nofollow"&gt;"The Bit Mountain Research Project"&lt;/a&gt;</description>
		<content:encoded><![CDATA[<blockquote><p>
&#8220;A large scale, highly distributed storage system with one goal: it can never, ever loose data. So the better question is: Is Python, as a language, appropriate for distributed/fault tolerant mission-critical-zero-data-loss systems?&#8221;
</p></blockquote>
<p>Aching for Shane Hathaway to unveil <a href="http://hathawaymix.org/Weblog/2006-05-17" rel="nofollow">Bit Mountain</a>:</p>
<p><a href="http://hathawaymix.org/Writings/BitMountainPaper.doc" rel="nofollow">&#8220;The Bit Mountain Research Project&#8221;</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paddy3118</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3302</link>
		<dc:creator>Paddy3118</dc:creator>
		<pubDate>Thu, 30 Aug 2007 21:51:07 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3302</guid>
		<description>&lt;a href="http://www.selenic.com/mercurial/wiki/index.cgi" rel="nofollow"&gt; Mercurial&lt;/a&gt; is a distributed version control system that is going great guns. Sun has chosen it for &lt;a href="http://opensolaris.org/os/community/tools/scm/;jsessionid=0645B42AD2686B8608E044DBAAE374C8" rel="nofollow"&gt; opensolaris&lt;/a&gt; and other code bases.

- Paddy.</description>
		<content:encoded><![CDATA[<p><a href="http://www.selenic.com/mercurial/wiki/index.cgi" rel="nofollow"> Mercurial</a> is a distributed version control system that is going great guns. Sun has chosen it for <a href="http://opensolaris.org/os/community/tools/scm/;jsessionid=0645B42AD2686B8608E044DBAAE374C8" rel="nofollow"> opensolaris</a> and other code bases.</p>
<p>- Paddy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ray</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3289</link>
		<dc:creator>Ray</dc:creator>
		<pubDate>Thu, 30 Aug 2007 19:51:35 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3289</guid>
		<description>In terms of scaling up a project for developers (vs. scaling it up for speed/capacity) you can also take a look at TinyERP.com. I'm amazed at the functionality / code size in that project. 

&lt;strong&gt;Not&lt;/strong&gt; having to write code is an advantage of scaling up teams.</description>
		<content:encoded><![CDATA[<p>In terms of scaling up a project for developers (vs. scaling it up for speed/capacity) you can also take a look at TinyERP.com. I&#8217;m amazed at the functionality / code size in that project. </p>
<p><strong>Not</strong> having to write code is an advantage of scaling up teams.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jesse</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3265</link>
		<dc:creator>jesse</dc:creator>
		<pubDate>Thu, 30 Aug 2007 17:36:49 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3265</guid>
		<description>I want to see the end-result of chandler, I've dealt with some of the small components, but I want to see the much larger finished end-product first</description>
		<content:encoded><![CDATA[<p>I want to see the end-result of chandler, I&#8217;ve dealt with some of the small components, but I want to see the much larger finished end-product first</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jesse</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3264</link>
		<dc:creator>jesse</dc:creator>
		<pubDate>Thu, 30 Aug 2007 17:33:59 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3264</guid>
		<description>Wow, first off - thanks for taking the time to post that, here are some thoughts:

&lt;ul&gt;
&lt;li&gt;I was going to mention Allmydata-tahoe, and the other like-kin, but I haven't had a chance to dig into tahoe yet, it's probably the closest kin to the system I've dealt with.
&lt;li&gt; The quote "Python lets you easily take your first effort, break it down, and re-purpose existing code for a new set of constraints." is an excellent point.
&lt;li&gt;As for re-writing, I did not mean to insinuate "you would always have to rewrite" (in fact, I would always like the shortest path from prototype-&gt;production). I do know about the LAN vs. WAN latency issues in distributed systems though. Depending on the system you're dealing with, hitting a disk/network bottleneck is agreeably difficult to reach though. Frequently you spend too much time elsewhere in the system-layer. 
&lt;li&gt;I have had multiple experiences with "You will re-write most of the code eventually, probably several times." and you're dead-right.
&lt;li&gt;Is there a 6 degrees of Twisted game out there? :)
&lt;/ul&gt;

Overall, your points are very well made - also note, I am not talking about a planned or current project. I already work (day to day) on a distributed filesystem/archiving system, and have for some time. The question (rather rhetorically) I was trying to answer is the one of "could Python "scale" up to these requirements". 

Also, one of these days I am going to be able to actually sit down and &lt;b&gt;write&lt;/b&gt; something in twisted, damnit.</description>
		<content:encoded><![CDATA[<p>Wow, first off - thanks for taking the time to post that, here are some thoughts:</p>
<ul>
<li>I was going to mention Allmydata-tahoe, and the other like-kin, but I haven&#8217;t had a chance to dig into tahoe yet, it&#8217;s probably the closest kin to the system I&#8217;ve dealt with.
</li>
<li> The quote &#8220;Python lets you easily take your first effort, break it down, and re-purpose existing code for a new set of constraints.&#8221; is an excellent point.
</li>
<li>As for re-writing, I did not mean to insinuate &#8220;you would always have to rewrite&#8221; (in fact, I would always like the shortest path from prototype->production). I do know about the LAN vs. WAN latency issues in distributed systems though. Depending on the system you&#8217;re dealing with, hitting a disk/network bottleneck is agreeably difficult to reach though. Frequently you spend too much time elsewhere in the system-layer.
</li>
<li>I have had multiple experiences with &#8220;You will re-write most of the code eventually, probably several times.&#8221; and you&#8217;re dead-right.
</li>
<li>Is there a 6 degrees of Twisted game out there? :)
</li>
</ul>
<p>Overall, your points are very well made - also note, I am not talking about a planned or current project. I already work (day to day) on a distributed filesystem/archiving system, and have for some time. The question (rather rhetorically) I was trying to answer is the one of &#8220;could Python &#8220;scale&#8221; up to these requirements&#8221;. </p>
<p>Also, one of these days I am going to be able to actually sit down and <b>write</b> something in twisted, damnit.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Carl</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3263</link>
		<dc:creator>Carl</dc:creator>
		<pubDate>Thu, 30 Aug 2007 17:20:28 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3263</guid>
		<description>It's not quite live yet but Chandler is written in Python and must qualify as large complex application with both desktop and back end components.</description>
		<content:encoded><![CDATA[<p>It&#8217;s not quite live yet but Chandler is written in Python and must qualify as large complex application with both desktop and back end components.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: evgen</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3256</link>
		<dc:creator>evgen</dc:creator>
		<pubDate>Thu, 30 Aug 2007 16:52:51 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3256</guid>
		<description>You might consider the case of MojoNation, which begat BitTorrent, HiveCache, Mnet, and Allmydata-tahoe.  It created an architecture for a large-scale, fault-tolerant persistent distributed storage system similar to GoogleFileSystem (albeit a couple of years before GFS existed.)  The actual implementation did not meet all of its goals, but because it was created in Python it was easier for the follow-on projects to pick up the pieces and re-work them in more application-specific ways to meet various facets of the original goal (e.g. file sharing for BT, enterprise backups for HiveCache, etc.)  Python lets you easily take your first effort, break it down, and re-purpose existing code for a new set of constraints.

Python has several advantages that you touch upon briefly but need to be repeated.  It makes prototyping easy, which is a big win.  You may think that it burns cash/time before you re-write it in a "proper" language, but there is nothing about such a project like this that necessitates it being written in c/c++ or Java -- once you are dealing with distributed storage across a WAN boundary you will discover that managing network latency is the big bottleneck (unlike a LAN filesystem where disk latency and component optimization can become an issue.)  Therefore you will be writing your prototypes in your "shipping" language, but will have more flexibility while you are building your product.  The advantage of this cannot be understated for a complex problem like a large-scale distributed system.  You will re-write most of the code eventually, probably several times.

If you really think hard about your problem domain, there optimization that you will need to do will be more about process and algorithm optimization than about whether or not a particular loop is running as fast as possible.  For this sort of a problem you will discover that no particular language is going to offer you anything more than a 5-10% speedup in actual execution time of any particular component, so what you need to do is optimize programmer time.  This is a _huge_ task.

One other point I really hate to make here is that you are going to want to look at Twisted instead of Stackless.  I really love Stackless and prefer it over Twisted whenever given the choice, but there is a sizeable amount of existing code in this particular area that is already written in Twisted and you will save yourself some time by choosing that framework over Stackless.</description>
		<content:encoded><![CDATA[<p>You might consider the case of MojoNation, which begat BitTorrent, HiveCache, Mnet, and Allmydata-tahoe.  It created an architecture for a large-scale, fault-tolerant persistent distributed storage system similar to GoogleFileSystem (albeit a couple of years before GFS existed.)  The actual implementation did not meet all of its goals, but because it was created in Python it was easier for the follow-on projects to pick up the pieces and re-work them in more application-specific ways to meet various facets of the original goal (e.g. file sharing for BT, enterprise backups for HiveCache, etc.)  Python lets you easily take your first effort, break it down, and re-purpose existing code for a new set of constraints.</p>
<p>Python has several advantages that you touch upon briefly but need to be repeated.  It makes prototyping easy, which is a big win.  You may think that it burns cash/time before you re-write it in a &#8220;proper&#8221; language, but there is nothing about such a project like this that necessitates it being written in c/c++ or Java &#8212; once you are dealing with distributed storage across a WAN boundary you will discover that managing network latency is the big bottleneck (unlike a LAN filesystem where disk latency and component optimization can become an issue.)  Therefore you will be writing your prototypes in your &#8220;shipping&#8221; language, but will have more flexibility while you are building your product.  The advantage of this cannot be understated for a complex problem like a large-scale distributed system.  You will re-write most of the code eventually, probably several times.</p>
<p>If you really think hard about your problem domain, there optimization that you will need to do will be more about process and algorithm optimization than about whether or not a particular loop is running as fast as possible.  For this sort of a problem you will discover that no particular language is going to offer you anything more than a 5-10% speedup in actual execution time of any particular component, so what you need to do is optimize programmer time.  This is a _huge_ task.</p>
<p>One other point I really hate to make here is that you are going to want to look at Twisted instead of Stackless.  I really love Stackless and prefer it over Twisted whenever given the choice, but there is a sizeable amount of existing code in this particular area that is already written in Twisted and you will save yourself some time by choosing that framework over Stackless.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jesse</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3250</link>
		<dc:creator>jesse</dc:creator>
		<pubDate>Thu, 30 Aug 2007 16:12:48 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3250</guid>
		<description>I completely agree: that's why python is so attractive in many case, the speed at which you can go from "zero to hero" in it (especially in light of the limited burn startups have). The concept of "build it fast in python:optimize when you have to" is a key in this kind of discussion.

Maybe deep down inside that's the question/point that counts the most: what will let me get this done the fastest to "prove" out an idea/concept.

But when you're aiming at "something big" from position 0 (let's say, oh, a distributed filesystem) - often the time you spend prototyping can chew into your cash and limited time, and then you run the risk of find yourself having to re-implement in the "proper" domain language later (let's say, a 60% or more "optimization" re-write in Java or C++).

The second half of that is: Does Python scale in teams (i.e: can duck typing scale)?</description>
		<content:encoded><![CDATA[<p>I completely agree: that&#8217;s why python is so attractive in many case, the speed at which you can go from &#8220;zero to hero&#8221; in it (especially in light of the limited burn startups have). The concept of &#8220;build it fast in python:optimize when you have to&#8221; is a key in this kind of discussion.</p>
<p>Maybe deep down inside that&#8217;s the question/point that counts the most: what will let me get this done the fastest to &#8220;prove&#8221; out an idea/concept.</p>
<p>But when you&#8217;re aiming at &#8220;something big&#8221; from position 0 (let&#8217;s say, oh, a distributed filesystem) - often the time you spend prototyping can chew into your cash and limited time, and then you run the risk of find yourself having to re-implement in the &#8220;proper&#8221; domain language later (let&#8217;s say, a 60% or more &#8220;optimization&#8221; re-write in Java or C++).</p>
<p>The second half of that is: Does Python scale in teams (i.e: can duck typing scale)?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: JohnMc</title>
		<link>http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3249</link>
		<dc:creator>JohnMc</dc:creator>
		<pubDate>Thu, 30 Aug 2007 16:04:37 +0000</pubDate>
		<guid isPermaLink="false">http://jessenoller.com/2007/08/30/python-does-it-scale/#comment-3249</guid>
		<description>Jesse, guess I have a different view on 'the big thing' I worked on one of two major projects back in prior employment. I have to tell you that in my experience conceptualization was the bigger problem than scalability. You only have a finite amount of time to get the idea from paper napkin to prototype. My tools then were C++ and Scheme, python having not even broken on the scene. 

Now the landscape is so different. I would have leaped at Python back then if I had access to it. One of Python's strengths I have used time and again is its ability to glue pieces together. If profiling a python app indicates that there are delays, then most likely I can get someone on staff to write the code to use the native C or ASM code to drive it more efficiently. At the scale of something like a YouTube I don't think languages per se are the issue.</description>
		<content:encoded><![CDATA[<p>Jesse, guess I have a different view on &#8216;the big thing&#8217; I worked on one of two major projects back in prior employment. I have to tell you that in my experience conceptualization was the bigger problem than scalability. You only have a finite amount of time to get the idea from paper napkin to prototype. My tools then were C++ and Scheme, python having not even broken on the scene. </p>
<p>Now the landscape is so different. I would have leaped at Python back then if I had access to it. One of Python&#8217;s strengths I have used time and again is its ability to glue pieces together. If profiling a python app indicates that there are delays, then most likely I can get someone on staff to write the code to use the native C or ASM code to drive it more efficiently. At the scale of something like a YouTube I don&#8217;t think languages per se are the issue.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
