MySQL-Python: Python

Showing posts with label Python. Show all posts

Thursday, March 19, 2009

MySQL-python-1.2.3 beta 2 released

I released the second beta of MySQLdb-1.2.3 over the weekend. So far I've gotten a fair number of downloads but not a lot of feedback. I did find out though what small tweaking is required to build on Windows. It's also in the Python Package Index, so if you can also install using easy_install MySQL-python. Once I make the final release of 1.2.3, I'll put up more eggs for fringe operating systems (Mac OS X, Windows).

Friday, March 14, 2008

I am not dead

It's been nearly a year since the last post, so you might naturally wonder if I am dead or stopping development of MySQLdb. Actually I've been sick for the last year and a half or so, and hadn't really been motivated enough to do anything. Here's what's been going on:

John Eikenberry has taken over development of ZMySQLDA, since I pretty much don't do anything with Zope these days. He's made a couple of releases on the way to a 3.0 release. As far as I know, ZMySQLDA is still only useful with Zope 2 as Zope 3 has a different architecture and comes with MySQL support directly.

Monty Taylor from MySQL AB has volunteered to help out on MySQLdb. I believe the way this is going to work out is he's going to be doing maintenance on the 1.2 branch, and get some minor bug fixes out there. In addition, he has a good start on a native (i.e. written in Python) MySQL driver.

I have a lot of work done towards the first 1.3 version, which will be the development branch (SVN trunk) for 2.0. To a large extent, this is a refactoring project, which means there are a lot of internal changes that don't affect most users.

When I first started working on MySQLdb (way back in the last millenium), there was only one option for talking to the MySQL server: Use the C API via libmysqlclient. Then the re-entrant/thread-safe libmysqlclient_r was added. Then the embedded server libmysqld. And now there is the prospect of a native Python version. This makes building more complicated because you currently have to build against one library at a time. It's also more complicated for users: Suppose you want to have both a regular client version and an embedded version on the same system?

1.3/2.0 is going to fix this by building all the possible options as separate drivers, and then you can specify which one to use at connect-time, or you can use the default, which will probably go in the order libmysqlclient_r, libmysqlclient, and native. (The embedded version requires special initialization so it should never be a default.)

1.2 and earlier uses a type conversion dictionary to map MySQL column types to functions which convert a string result to a Python type. Additionally, this same dictionary is used to convert Python types into SQL literals. I think in earlier (pre-1.0) versions, these were separate dictionaries, and they were later combined because they have a disjoint set of keys. I'm not sure now if this was a good idea or not.

One of the other complications of this approach is TEXT columns. To the MySQL C API, TEXT columns have the same column type as BLOB columns. The difference is the presence of a flag. This took some kludgy stuff to get to work.

Then unicode came along, not just in Python but in MySQL. (The original target versions where MySQL-3.23 and Python-1.5.) This complicated the type conversion because now it was dependent on the connection's character set, which could be changed, so the converter dictionary had to be tinkered with on the fly. Additionally, there were reference count problems (and maybe still are to an extent) with this approach, due in part to the dictionary being about to be overridden by the user.

I haven't decided entirely how this is going to be fixed, but I will have some method for users to override the type conversion at runtime. I will probably have some hooks that will allow you to use a specific conversion for column based on column type, column name, table name, database name, or any combination thereof. For example, you could have a rule that said that any column name ending with "_ip" with a column type of UNSIGNED INTEGER could be returned as a user-defined IP object, but stored in the database as a four-byte integer.

The type conversion from MySQL to Python currently takes place in the low level driver (_mysql). Since there are going to be multiple drivers, this is going to move up into the Python layer. I don't believe this will adversely affect performance. Looking up the right converter is only a dictionary lookup anyway, and only has to be done once per column per query. Once you have a list/tuple of converters for the result set, these can be applied quickly with a list/generator comprehension.

MySQLdb-1.2 and earlier have several cursor classes which are built with mixin classes. The mixins control things like whether the rows are returned as tuples or dictionaries, or whether a buffered or unbuffered result set is used (i.e. mysql_store_result() vs. mysql_use_result()). This is is pretty messy and is going away.

The format of the row will probably be controlled by a hook of some sort. I'm inclined to using unbuffered cursors, i.e. SSCursor or mysql_use_result(), by default. The tricky part is the entire result set must be fetched before another query can be issued, so if there are multiple cursors, there needs to be a mechanism so that only one can use the connection at a time. Rather than locking the connection, there will need to be a way for one cursor to tell the others that they need to buffer the rest of the result set.

Some of this is already done for the trunk, but needs to be committed. In particular, there is no type conversion at all, and the driver selection is not done yet, but I'll see if I have time to work on it more this week at PyCon 2008.

Monday, April 30, 2007

MySQL Conference 2007

The 2007 MySQL Conference is over, and I finally made it back home. I have some notes on some of the sessions, which really aren't that great, so if you want to see what you missed, you should read Planet MySQL. But I will give some of the highlights.

There's a lot of new development around storage engines.

MySQL-5.1 has a pluggable storage engine architecture which allows you to load and unload storage engines while the server is running. Brian Aker explained that this is for cases where you have a stable server setup and only want to upgrade the storage engine. All the storage engines in 5.1 are pluggable, and there are already some third-party proprietary storage engines available.

One of the relatively new third-party storage engines is SolidDB. Solid has been around for quite awhile. In fact, I was using Solid for a project in the late 1990's before I started using MySQL.

IBM announced a partnership with MySQL AB to create a DB2 storage engine, but so far this is only on their i5/OS mainframe platform.

InnoDB is still very much alive and well, despite being purchased a year and a half ago by Oracle. In fact, InnoDB OY has already renewed their OEM agreement with MySQL AB until at least mid-2009, so there is no danger of current InnoDB users being cut off.

Falcon is a new storage engine which will be part of MySQL-6.0 (2008); alpha versions are available now, and a beta is expected later this year.

NitroEDB is a storage engine designed to handle heavy insert loads, and for fast aggregate queries. Aggregate values are stored in the index, so many aggregate functions can be evaluated just by looking at a couple of index values.

ScaleDB uses a Patricia trie index which is highly compressed compared to a B-tree. The implementation has three relatively small in-memory index layers on top of the trie which minimized disk access.

MySQL-5.1 adds log tables: The general and slow query logs can be set (on the fly) to record into tables instead of flag log files. The only supported engines for log tables are MyISAM and CSV.

Coolest presentation: Maybe The Declarative Power of Views. Basically, this is using SQL like Prolog, and creating an expert system with a couple of views.

Monty Taylor of MySQL AB has wrapped the NDB cluster API with Swig and come up with some APIs for various languages, including Python, and then created a patch for SQLAlchemy so that it could bypass using any SQL for object storage.

Jess Balint of MySQL AB has written a pure Python MySQL driver that looks pretty functional. I'll have more on this in my next post...

I held a BoF session for Python users. Considering how late I scheduled it, and that it was up against the MySQL Quiz Show, I had a pretty good turnout. Several MySQL-python users actually had vendor booths: Google uses MySQL (mostly 4.0) and Python for some of their (undisclosed) back-end processes. YouTube (now owned by Google) uses MySQL and Python for some of their user profile stuff. SnapLogic has a new data integration project written in Python which is using MySQLdb, and presumably other database backends. I seem to remember the NitroSecurity people were using it as well.

All-in-all, it was a great conference, and O'Reilly kept us all well-fed. Next conference starts April 15, 2008. I'll have to try to be a little better organized for that one.

Monday, April 9, 2007

Projects which use MySQLdb

I'm putting together a page of projects which use MySQLdb. If your project is not on this list, leave a comment, with a URL and brief description, and I'll check it out.

Frameworks/Libraries

Applications

Wednesday, March 28, 2007

Some previous MySQL Confererence posts

Just for reference, here are some previous posts I did for the 2005 MySQL User Conference:

Not that these are particularly awesome or anything, but there are a few travels notes which may possibly be useful if you decide to go this year, since it's in the same location.

If you have the time, and you are from outside The Valley, catch CalTrain and head out to San Francisco.

Tuesday, March 27, 2007

MySQL Conference & Expo 2007

Thanks to a gift from MySQL AB, I'll be attending the MySQL Conference & Expo 2007. I presented at the 2005 conference, and found the conference itself to be pretty educational. Plus the food is generally pretty good at the O'Reilly conferences. Of course the real reason people to conferences is for the swag. Now this year's PyCon had some pretty good swag; I got at least six free T-shirts and two Rubik's cubes. So top that.