Stirred up dem bees: Should BSDDB be removed from Python?

September 4th, 2008 § 14 comments

This week, we’ve seen a push dev-wise to get RC1 com­pleted and ready to go — I’ve spent some time giv­ing mul­ti­pro­cess­ing some love (still not done) and a lot of other peo­ple have been work­ing around the clock to close out the large num­ber of release blockers.

As of last night though, the trig­ger was pulled on remov­ing bsddb (the berkley DB python mod­ule) from the stan­dard library in the 3.0 time­line (2.6 adds dep­re­ca­tion warnings).

Now, before any­one thinks this is an arbi­trary deci­sion, here’s the argu­ment (in a nutshell):

  • bsddb has always been painful to maintain
  • Jesus Cea is the only per­son who has stepped up to main­tain it
  • bsddb is “heavy weight” — out most of the stan­dard library, it has the most depen­den­cies and nuances to cross plat­form maintenance.
  • Until Jesus Cea stepped up later in the 2.6/3.0 process it was “one of those pack­ages” that no one wanted to maintain.
  • For most of 2.6 and 3.0 it’s been a build­bot fail train.

See PEP 3108:

Main­te­nance Burden

Over the years, cer­tain mod­ules have become a heavy bur­den upon python-dev to main­tain. In sit­u­a­tions like this, it is bet­ter for the mod­ule to be given to the com­mu­nity to main­tain to free python-dev to focus more on lan­guage sup­port and other mod­ules in the stan­dard library that do not take up a undue amount of time and effort.

bsddb3

  • Exter­nally main­tained at http://www.jcea.es/programacion/pybsddb.htm .
  • Con­sis­tent test­ing instability.
  • Berke­ley DB fol­lows a dif­fer­ent release sched­ule than Python, lead­ing to the bind­ings not nec­es­sar­ily being in sync with what is available.

This thread is where the ham­mer fell.

Now, note that Jesus Cea has done an amaz­ing amount of work updating/upgrading the bsddb sup­port for 2.6 and 3.0 (see his recent announce­ment here). I feel for him in a lot of respects: He busted his butt to fix, main­tain and resolve all open issues with bsddb and the build­bots for the release, but the deci­sion had been made back in July to remove/deprecate the bsddb pack­age (see above).

Now, there is a lot more dis­cus­sion occur­ring around the removal:

Edit: I finally got a free moment to do an updatein an email this after­noon on Python 3000, the BDFL (GvR) made the final deci­sion on bsddb — it’s out as of py3k:

I am still in favor of remov­ing bsddb from Python 3.0. It depends on a
3rd party library of enor­mous com­plex­ity whose sta­bil­ity can­not always
be taken for granted. Argu­ments about code own­er­ship, release cycles,
bug­bot sta­bil­ity and more all point towards keep­ing it sep­a­rate. I
con­sider it no dif­fer­ent in nature than 3rd party UI pack­ages (e.g.
wxPython or PyQt) or rela­tional data­base bind­ings (e.g. the MySQL or
Post­greSQL bind­ings): very use­ful to a cer­tain class of users, but
out­side the scope of the core distribution.

Python 3.0 is a per­fect oppor­tu­nity to say good­bye to bsddb as a
stan­dard library com­po­nent. For apps that depend on it, it is just a
down­load away — dep­re­cat­ing in 3.0 and removal in 3.1 would actu­ally
send the *wrong* mes­sage, since it is very much alive! I am grate­ful
for Jesus to have taken over main­te­nance, and hope that the pack­age
blos­soms in its new­found freedom.

  • http://writeonly.wordpress.com/ Gregg Lind

    So, does this mean that shelve will be even less func­tional in Py3k?

    It’s not like shelve seems to get a lot of love, either, but it’s a pack­age I use every day.

  • jnoller

    Ray­mond H. has already brought this up in the thread. Yes — it affects shelve, and that’s one of the pos­si­ble rea­sons for leav­ing it in at least until py3.1

  • klaus

    Nuke it. sqlite (wich is a proper db) is already part of the bat­tery pack. If any­one needs bdb sup­port they can get the stand­alone pack­age and install it them­selves. Maybe some­one can set it up in PyPy for easy retrieval post-3.0. But I see no need to keep it. In any case, you can keep run­ning the 2.5/2.6 series until the cows come home, which I sus­pect is going to be the case for quite a while after 3.0 is released.

    Seri­ously, how dif­fi­cult would it be to write a sim­ple shelve replace­ment that uses sqlite behind the scenes? My guess is not much.

  • crb

    sqlite is great for a rela­tional DB, but bsddb gives the devel­oper con­trol over cre­at­ing app spe­cific stores. That’s huge, espe­cially given the exper­i­men­ta­tion hap­pen­ing these days on alter­na­tive ways of stor­ing and index­ing. Con­sider this one vote for keep­ing it in.

  • Brett

    Just so peo­ple know, Barry said the only way he would change his mind was if Guido stepped in and said bsddb should go. Guido has sub­se­quently stated that he sup­ports remov­ing bsddb in 3.0 (this does not affect 2.x).

  • Ben­jamin Peterson

    Skip has already sup­plied a proof con­cept sqlite back­end: http://bugs.python.org/issue3783

  • jnoller

    I was in meet­ings — but I finally updated the post with the pronouncement.

  • jnoller

    Skip is a machine sim­ply pos­ing as a man

  • John M. Camara

    Klaus,

    Sqlite may be a proper db in your opin­ion but that’s only due to you not under­stand­ing why dif­fer­ent types of data­bases exists. At times a rela­tional data­base is the wrong tool for the job and in these cases a basdb can be 10s of thou­sands of times faster. Chances are you just work on appli­ca­tions that work well with rela­tional data­bases so you don’t see the point of using other types.

    To get an idea of the dif­fer­ences between bsddb and a rela­tional data­base you could read the fol­low­ing white paper from Ora­cle. Just take it with a grain of salt as they like to paint a pret­tier pic­ture when talk­ing about rela­tional data­bases. Which should be no sur­prise as that’s their big money maker. Any way, it’s always best to choose the right tool for the job.

    http://www.oracle.com/database/docs/Berkeley-DB…

  • Ben­jamin Peterson

    I thought that was the Tim­Bot, actually.

  • John M. Camara

    Another good ref­er­ence is the first few pages of the Intro­duc­tion to Berke­ley DB Ref­er­ence Guide.

    http://pybsddb.sourceforge.net/ref/intro/data.html

  • jnoller

    No, see, the tim­bot doesn’t pre­tend to be a man. Tim­bot is sim­ply timbot

  • http://captsolo.net/info/ Uldis Bojars

    It’s a pity that the final deci­sion was to remove the mod­ule, though i can see the rationale.

    I’ve found bsddb to be enor­mously use­ful (includ­ing sit­u­a­tions where sqlite’s per­for­mance was very low in com­par­i­son). Thanks a lot to Jesus Cea for main­tain­ing pyb­s­ddb and I hope he keeps up doing this.

  • http://captsolo.net/ Capt­Solo

    It’s a pity that the final deci­sion was to remove the mod­ule, though i can see the rationale.

    I’ve found bsddb to be enor­mously use­ful (includ­ing sit­u­a­tions where sqlite’s per­for­mance was very low in com­par­i­son). Thanks a lot to Jesus Cea for main­tain­ing pyb­s­ddb and I hope he keeps up doing this.

What's this?

You are currently reading Stirred up dem bees: Should BSDDB be removed from Python? at jessenoller.com.

meta