Bulk Insert to Pandas DataFrame Using SQLAlchemy – Python

Last Updated : 28 Dec, 2022

Let’s start with SQLAlchemy, a Python library that allows communication with databases(MySQL, PostgreSQL etc.) and Python. This library is used as an Object Relational Mapper tool that translates Python classes to tables in relational databases and automatically converts function calls to SQL statements that gives application developers the full power and flexibility of SQL in the programming language communication with databases. SQLAlchemy provides an option that allows developers to create flexible ( format of data transmission is irrelevant or not in a format of data transmission) code to communicate with a different type of database engine. As an ORM tool, SQLAlchemy differs from most others in that it is rooted in a complementarity-oriented approach. The entire processes are exposed within a series of composable, transparent tools rather than being hidden behind a wall of automation.
Using the library, developers can remain in control by automating redundant tasks
of how the database is structured and how SQL is constructed.

Bulk Insert A Pandas DataFrame Using SQLAlchemy in Python

In this article, we will look at how to Bulk Insert A Pandas Data Frame Using SQLAlchemy and also a optimized approach for it as doing so directly with Pandas method is very slow.

Creating a connection and database using SQLAlchemy:

So first we will create an engine for connection with the database and further we can use to_sql() command to push the data with its constraints. we have used vs-code for inserting data frame using SQLAlchemy, you can use any other editor as well after installing the library.

Creating a connection with database using SQLAlchemy engine

Database Created from python in MySQL workbench

Creating a Data frame and inserting it to database using to_sql() function:

Inserting data frame into Database(MySQL)

Table inserted into database

Note : “Use below sql command to see above results of sql”

Bulk data Insert Pandas Data Frame Using SQLAlchemy:

We can perform this task by using a method “multi” which perform a batch insert by inserting multiple records at a time in a single INSERT statement. But effectiveness(time) for this method is very poor even compared from a default method.
So to overcome this problem we have another method “callable” with signature(pd_table, conn, keys, data_iter) which is very faster than “multi” method. That can be easily noticed in the below image:

Optimized approach for inserting bulk data into data base using SQLAlchemy.

Code implementation:

Python

from sqlalchemy import event 
import pandas as pd 
import sqlalchemy 
import pymysql 
  
from sqlalchemy import create_engine 
  
engine = create_engine("mysql+pymysql://" + \ 
                       "connection name" + ":" +\ 
                       "password" + "@" + "localhost" +
                       ":" + "3306" + "/" + "database name" \ 
                       + "?" + "charset=utf8mb4") 
# Choose your configuration in the above code. 
# replace connection name, password ,database name by yours. 
  
conn = engine.connect() 
  
dict = { 
    # data to insert into data frame 
    'id': [1, 2, 3], 'name': ["geeks", "for", "geeks"] 
} 
# storing data in df variable, hence creating dataframe. 
df = pd.DataFrame(dict) 
print(df) 
# inserting data into database 
df.to_sql("table_gfg", conn, if_exists="replace") 
  
  
@event.listens_for(engine, "before_cursor_execute") 
def receive_before_cursor_execute(conn,  
 cursor, statement, params, context, executemany): 
    if executemany: 
        cursor.fast_executemany = True
  
  
df.to_sql("table_gfg", conn, index=False, if_exists="append")