Export large datasets from MySQL using Python
In the entrance about export large datasets we’ll see several ways of doing it, and now it is time to see how to do it with Python.
Conecting to MySQL from Python
Before continue, you could read here how to conect to MySQL from Python, where you also use a class I use to connect because this class does what I like. But for this entrance, I will conect directly to MySQL by using the library PyMySQL.
You only need to import the library, set the correct informacion for the conection and … you’re done!
import pymysql conn = pymysql.connect(host='localhost', port=3306, user='root', passwd='pass', db='') print ('Conected!') conn.close()
To count time, I created my own class:
import time class Cronometro(): def __init__(self, tarea = ''): self.__inicio = time.time() self.tarea = tarea def iniciar(self): # Tiempo de inicio de ejecución. self.__inicio = time.time() def finalizar(self): # Devolvemos el tiempo de ejecución. tmp = time.time() tmp2 = self.__inicio self.__inicio = tmp x = self.tarea + ' --> ' if len(self.tarea) > 0 else '' return x + str(round(tmp - tmp2,2)) + " segundos"
What to do when retrieving thousands and thousands rows
MySQL cleints, such as PyMySQL, work very well to conect to databases: after executing a query, the result is saved into memory, before give control again to the program (once the database work has finished). But when the recorset is too large, performance comes down. Here you can read the problem, and the solution, by using “streaming result set“, with the use of cursors such as SSCursor.
In PyMySQL , you can access to this cursors like this:
cur = conn.cursor(pymysql.cursors.SSCursor)
Next, it is very important to fetch row by row (fetchone) instead of all at one time (fetchall).
Procedure for exporting data toCSV
If you ned to export recordsets to CSV, it is compulsory (I think) to use the CSV package. The best tutorial you can follow for using it is this: http://www.pythonforbeginners.com/systems-programming/using-the-csv-module-in-python/. For my needs, I only need to open a file for writting, link it to the csv package, use csv.writerow for writting, and once finished, close the file.
ofile = open(file_name,'wb') csv_writer = csv.writer(ofile, delimiter=delimiter, quotechar=quotechar,quoting=quoting) csv_writer.writerow(datos) ofile.close()
The CSV writter has several options, so be sure to put the one you need.
Before going on, it could be very useful to read this question on StackOverflow about how to store a recordset (in this case, using SQLite) as CSV file, and you have the right clue on the first answer.
Now, we have the routine ready, adn the code is this:
def write_sql_to_file(file_name, sql, with_header=True, delimiter=',',quotechar='"',quoting=csv.QUOTE_NONNUMERIC, con_sscursor=False): cur = conn.cursor(pymysql.cursors.SSCursor) if con_sscursor else conn.cursor() cur.execute(sql) cabecera= [campo for campo in cur.description] ofile = open(file_name,'wb') csv_writer = csv.writer(ofile, delimiter=delimiter, quotechar=quotechar,quoting=quoting) if with_header: csv_writer.writerow(cabecera) if con_sscursor: while True: x = cur.fetchone() if x: csv_writer.writerow(x) else: break else: for x in cur.fetchall(): csv_writer.writerow(x) cur.close() ofile.close()
I’ve prepared a query to export a large amount of data, so, let’s see the time using SSCursor and without SScursor. I’ve inserted a timer in two points to see the performance: one is for the query time, and the other for CSV file creating.
The results, for the same query, but increasing the amount of data to export are.
|Test 1||Test 2||Test 3|
As a conclusion, obtaining the recordset with SSCursor is faster than without using it, but the CSV file is written faster with fetchall instead of feching one by one. I need to keep investigatin about this issue.