Bulk Insert to Pandas DataFrame Using SQLAlchemy – Python
Let’s start with SQLAlchemy, a Python library that allows communication with databases(MySQL, PostgreSQL etc.) and Python. This library is used as an Object Relational Mapper tool that translates Python classes to tables in relational databases and automatically converts function calls to SQL statements that gives application developers the full power and flexibility of SQL in the programming language communication with databases. SQLAlchemy provides an option that allows developers to create flexible ( format of data transmission is irrelevant or not in a format of data transmission) code to communicate with a different type of database engine. As an ORM tool, SQLAlchemy differs from most others in that it is rooted in a complementarity-oriented approach. The entire processes are exposed within a series of composable, transparent tools rather than being hidden behind a wall of automation.
Using the library, developers can remain in control by automating redundant tasks
of how the database is structured and how SQL is constructed.
Bulk Insert A Pandas DataFrame Using SQLAlchemy in Python
In this article, we will look at how to Bulk Insert A Pandas Data Frame Using SQLAlchemy and also a optimized approach for it as doing so directly with Pandas method is very slow.
Creating a connection and database using SQLAlchemy:
So first we will create an engine for connection with the database and further we can use to_sql() command to push the data with its constraints. we have used vs-code for inserting data frame using SQLAlchemy, you can use any other editor as well after installing the library.
Creating a connection with database using SQLAlchemy engine
Database Created from python in MySQL workbench
Creating a Data frame and inserting it to database using to_sql() function:
Inserting data frame into Database(MySQL)
Table inserted into database
Note : “Use below sql command to see above results of sql”
Bulk data Insert Pandas Data Frame Using SQLAlchemy:
We can perform this task by using a method “multi” which perform a batch insert by inserting multiple records at a time in a single INSERT statement. But effectiveness(time) for this method is very poor even compared from a default method.
So to overcome this problem we have another method “callable” with signature(pd_table, conn, keys, data_iter) which is very faster than “multi” method. That can be easily noticed in the below image:
Optimized approach for inserting bulk data into data base using SQLAlchemy.
Code implementation:
Python
from sqlalchemy import event import pandas as pd import sqlalchemy import pymysql from sqlalchemy import create_engine "connection name" + ":" + \ "password" + "@" + "localhost" + ":" + "3306" + "/" + "database name" \ + "?" + "charset=utf8mb4" ) # Choose your configuration in the above code. # replace connection name, password ,database name by yours. conn = engine.connect() dict = { # data to insert into data frame 'id' : [ 1 , 2 , 3 ], 'name' : [ "geeks" , "for" , "geeks" ] } # storing data in df variable, hence creating dataframe. df = pd.DataFrame( dict ) print (df) # inserting data into database df.to_sql( "table_gfg" , conn, if_exists = "replace" ) @event .listens_for(engine, "before_cursor_execute" ) def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany): if executemany: cursor.fast_executemany = True df.to_sql( "table_gfg" , conn, index = False , if_exists = "append" ) |
Output:
Data from python data frame is inserted into database