Stirred up dem bees: Should BSDDB be removed from Python?

This week, we’ve seen a push dev-wise to get RC1 completed and ready to go – I’ve spent some time giving multiprocessing some love (still not done) and a lot of other people have been working around the clock to close out the large number of release blockers.

As of last night though, the trigger was pulled on removing bsddb (the berkley DB python module) from the standard library in the 3.0 timeline (2.6 adds deprecation warnings).

Now, before anyone thinks this is an arbitrary decision, here’s the argument (in a nutshell):

  • bsddb has always been painful to maintain
  • Jesus Cea is the only person who has stepped up to maintain it
  • bsddb is “heavy weight” – out most of the standard library, it has the most dependencies and nuances to cross platform maintenance.
  • Until Jesus Cea stepped up later in the 2.6/3.0 process it was “one of those packages” that no one wanted to maintain.
  • For most of 2.6 and 3.0 it’s been a buildbot fail train.

See PEP 3108:

Maintenance Burden

Over the years, certain modules have become a heavy burden upon python-dev to maintain. In situations like this, it is better for the module to be given to the community to maintain to free python-dev to focus more on language support and other modules in the standard library that do not take up a undue amount of time and effort.

bsddb3

  • Externally maintained at http://www.jcea.es/programacion/pybsddb.htm .
  • Consistent testing instability.
  • Berkeley DB follows a different release schedule than Python, leading to the bindings not necessarily being in sync with what is available.

This thread is where the hammer fell.

Now, note that Jesus Cea has done an amazing amount of work updating/upgrading the bsddb support for 2.6 and 3.0 (see his recent announcement here). I feel for him in a lot of respects: He busted his butt to fix, maintain and resolve all open issues with bsddb and the buildbots for the release, but the decision had been made back in July to remove/deprecate the bsddb package (see above).

Now, there is a lot more discussion occurring around the removal:

Edit: I finally got a free moment to do an updatein an email this afternoon on Python 3000, the BDFL (GvR) made the final decision on bsddb – it’s out as of py3k:

I am still in favor of removing bsddb from Python 3.0. It depends on a
3rd party library of enormous complexity whose stability cannot always
be taken for granted. Arguments about code ownership, release cycles,
bugbot stability and more all point towards keeping it separate. I
consider it no different in nature than 3rd party UI packages (e.g.
wxPython or PyQt) or relational database bindings (e.g. the MySQL or
PostgreSQL bindings): very useful to a certain class of users, but
outside the scope of the core distribution.

Python 3.0 is a perfect opportunity to say goodbye to bsddb as a
standard library component. For apps that depend on it, it is just a
download away — deprecating in 3.0 and removal in 3.1 would actually
send the *wrong* message, since it is very much alive! I am grateful
for Jesus to have taken over maintenance, and hope that the package
blossoms in its newfound freedom.

  • Benjamin Peterson
    Skip has already supplied a proof concept sqlite backend: http://bugs.python.org/issue3783
  • Skip is a machine simply posing as a man
  • Benjamin Peterson
    I thought that was the TimBot, actually.
  • No, see, the timbot doesn't pretend to be a man. Timbot is simply timbot
  • Brett
    Just so people know, Barry said the only way he would change his mind was if Guido stepped in and said bsddb should go. Guido has subsequently stated that he supports removing bsddb in 3.0 (this does not affect 2.x).
  • It's a pity that the final decision was to remove the module, though i can see the rationale.

    I've found bsddb to be enormously useful (including situations where sqlite's performance was very low in comparison). Thanks a lot to Jesus Cea for maintaining pybsddb and I hope he keeps up doing this.
  • I was in meetings - but I finally updated the post with the pronouncement.
  • crb
    sqlite is great for a relational DB, but bsddb gives the developer control over creating app specific stores. That's huge, especially given the experimentation happening these days on alternative ways of storing and indexing. Consider this one vote for keeping it in.
  • klaus
    Nuke it. sqlite (wich is a proper db) is already part of the battery pack. If anyone needs bdb support they can get the standalone package and install it themselves. Maybe someone can set it up in PyPy for easy retrieval post-3.0. But I see no need to keep it. In any case, you can keep running the 2.5/2.6 series until the cows come home, which I suspect is going to be the case for quite a while after 3.0 is released.

    Seriously, how difficult would it be to write a simple shelve replacement that uses sqlite behind the scenes? My guess is not much.
  • John M. Camara
    Klaus,

    Sqlite may be a proper db in your opinion but that's only due to you not understanding why different types of databases exists. At times a relational database is the wrong tool for the job and in these cases a basdb can be 10s of thousands of times faster. Chances are you just work on applications that work well with relational databases so you don't see the point of using other types.

    To get an idea of the differences between bsddb and a relational database you could read the following white paper from Oracle. Just take it with a grain of salt as they like to paint a prettier picture when talking about relational databases. Which should be no surprise as that's their big money maker. Any way, it's always best to choose the right tool for the job.

    http://www.oracle.com/database/docs/Berkeley-DB...
  • John M. Camara
    Another good reference is the first few pages of the Introduction to Berkeley DB Reference Guide.

    http://pybsddb.sourceforge.net/ref/intro/data.html
  • So, does this mean that shelve will be even less functional in Py3k?

    It's not like shelve seems to get a lot of love, either, but it's a package I use every day.
  • Raymond H. has already brought this up in the thread. Yes - it affects shelve, and that's one of the possible reasons for leaving it in at least until py3.1
blog comments powered by Disqus