6

UPDATE
After passing execute() a list of rows as per Nathan's suggestion, below, the code executes further but still gets stuck on the execute function. The error message reads:

    query = query % db.literal(args)
TypeError: not all arguments converted during string formatting

So it still isn't working. Does anybody know why there is a type error now?
END UPDATE

I have a large mailing list in .xls format. I am using python with xlrd to retrieve the name and email from the xls file into two lists. Now I want to put each name and email into a mysql database. I'm using MySQLdb for this part. Obviously I don't want to do an insert statement for every list item.
Here's what I have so far.

from xlrd import open_workbook, cellname
import MySQLdb

dbname = 'h4h'
host = 'localhost'
pwd = 'P@ssw0rd'
user = 'root'

book = open_workbook('h4hlist.xls')
sheet = book.sheet_by_index(0)
mailing_list = {}
name_list = []
email_list = []

for row in range(sheet.nrows):
    """name is in the 0th col. email is the 4th col."""
    name = sheet.cell(row, 0).value  
    email =  sheet.cell(row, 4).value
    if name and email:
        mailing_list[name] = email

for n, e in sorted(mailing_list.iteritems()):
    name_list.append(n)
    email_list.append(e)

db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s,%s)""",
              (name_list, email_list))

The problem when the cursor executes. This is the error: _mysql_exceptions.OperationalError: (1241, 'Operand should contain 1 column(s)') I tried putting my query into a var initially, but then it just barfed up a message about passing a tuple to execute().

What am I doing wrong? Is this even possible?

The list is huge and I definitely can't afford to put the insert into a loop. I looked at using LOAD DATA INFILE, but I really don't understand how to format the file or the query and my eyes bleed when I have to read MySQL docs. I know I could probably use some online xls to mysql converter, but this is a learning exercise for me as well. Is there a better way?

2
  • Don't use the % operator to build queries! Follow Nathan's code more closely. Commented Oct 22, 2012 at 23:01
  • 1
    @r3m0t query = query % db.literal(args) was the error message not my code. Anyway Jon fixed my problem. Commented Oct 22, 2012 at 23:39

4 Answers 4

20

You need to give executemany() a list of rows. You don't need break the name and email out into separate lists, just create one list with both of the values in it.

rows = []

for row in range(sheet.nrows):
    """name is in the 0th col. email is the 4th col."""
    name = sheet.cell(row, 0).value  
    email =  sheet.cell(row, 4).value
    rows.append((name, email))

db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.executemany("""INSERT INTO mailing_list (name,email) VALUES (%s,%s)""", rows)

Update: as @JonClements mentions, it should be executemany() not execute().

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks that fixed the original problem, if the answers always come this fast I may start becoming lazy. Now I'm getting a type error, I'm sure it's nothing that SO can't solve.
@JonClements with execute many the script comletes without errors but my db is still empty.
@shakabra Then just make sure your transaction is committed - db.commit()
@JonClements you're awesome. If you write up an answer I'll except it.
What is format executemany taking? Is that a list of lists? Like: [[1,2],[3,4],[5,6]]?
|
8

To fix TypeError: not all arguments converted during string formatting - you need to use the cursor.executemany(...) method, as this accepts an iterable of tuples (more than one row), while cursor.execute(...) expects the parameter to be a single row value.

After the command is executed, you need to ensure that the transaction is committed to make the changes active in the database by using db.commit().

Comments

3

If you are interested in high-performance of the code, this answer may be better.

Compare to excutemany method, the below execute will much faster:

INSERT INTO mailing_list (name,email) VALUES ('Jim','[email protected]'),('Lucy','[email protected]')

You can easily modify the answer from @Nathan Villaescusa and get the new code.

cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s)""".format(",".join(str(i) for i in rows))

here is my own test result:

excutemany:10000 runs takes 220 seconds

execute:10000 runs takes 12 seconds.

The speed difference will be about 15 times.

3 Comments

Doing this is a terrible idea, because you are now wide open to sql injection attacks
This still hints at a Python overhead in the executemany() function, perhaps executemany() even exerts many execute() statements, which would be very bad for the speed. See my answer for a hopefully "injection attack proof" solution and please comment.
@havlock Would it not be already secure when you changed it to cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s)""", ",".join(str(i) for i in rows))? Then the values are inserted only during runtime. Not sure though. If this does not work, consider my answer where as many "%s" are used as there are values.
1

Taking up the idea of @PengjuZhao, it should work to simply add one single placeholder for all values to be passed. The difference to @PengjuZhao's answer is that the values are passed as a second parameter to the execute() function, which should be injection attack safe because this is only evalutated during runtime (in contrast to ".format()").

cursor.execute("""INSERT INTO mailing_list (name,email) VALUES (%s)""", ",".join(str(i) for i in rows))

Only if this does not work properly, try the approach below.

####

@PengjuZhao's answer shows that executemany() has either a strong Python overhead or it uses multiple execute() statements where this is not needed, elsewise executemany() would not be so much slower than a single execute() statement.

Here is a function that puts NathanVillaescusa's and @PengjuZhao's answers in a single execute() approach.

The solution builds a dynamic number of placeholders to be added to the sql statement. It is a manually built execute() statement with multiple placeholders of "%s", which likely outperforms the executemany() statement.

For example, at 2 columns, inserting 100 rows:

  • execute(): 200 times "%s" (= dependent from the number of the rows)
  • executemany(): just 2 times "%s" (= independent from the number of the rows).

There is a chance that this solution has the high speed of @PengjuZhao's answer without risking injection attacks.

  1. Prepare parameters of the function:

You will store your values in 1-dimensional numpy arrays arr_name and arr_email which are then converted in a list of concatenated values, row by row. Alternatively, you use the approach of @NathanVillaescusa.

from itertools import chain

listAllValues = list(chain([
arr_name.reshape(-1,1), arr_email.reshape(-1,1)
]))

column_names = 'name, email'

table_name = 'mailing_list'
  1. Get sql query with placeholders:

The numRows = int((len(listAllValues))/numColumns) simply avoids passing the number of rows. If you insert 6 values in listAllValues at 2 columns this would make 6/2 = 3 rows then, obviously.

def getSqlInsertMultipleRowsInSqlTable(table_name, column_names, listAllValues):
    numColumns = len(column_names.split(","))
    numRows = int((len(listAllValues))/numColumns)
    
    placeholdersPerRow = "("+', '.join(['%s'] * numColumns)+")"
    placeholders = ', '.join([placeholdersPerRow] * numRows)
    
    sqlInsertMultipleRowsInSqlTable = "insert into `{table}` ({columns}) values {values};".format(table=table_name, columns=column_names, values=placeholders)

    return sqlInsertMultipleRowsInSqlTable

strSqlQuery = getSqlInsertMultipleRowsInSqlTable(table_name, column_names, listAllValues)
  1. Execute strSqlQuery

Final step:

db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.execute(strSqlQuery, listAllValues)

This solution is hopefully without the risk of injection attacks as in @PengjuZhao's answer since it fills the sql statement only with placeholders instead of values. The values are only passed separately in listAllValues at this point here, where strSqlQuery has only placeholders instead of values:

cursor.execute(strSqlQuery, listAllValues)

The execute() statement gets the sql statement with placeholders %s and the list of values in two separate parameters, as it is done in @NathanVillaescusa's answer. I am still not sure whether this avoids injection attacks. It is my understanding that injection attacks can only occur if the values are put directly in the sql statement, please comment if I am wrong.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.