-
Notifications
You must be signed in to change notification settings - Fork 82
Description
I was trying to write 50988 entries to a database and noticed that it was really slow (tens of seconds). So I created a simple benchmark in Julia and Python.
Julia
begin
import Pkg
Pkg.activate(temp=true)
Pkg.add(["DBInterface", "SQLite"])
end
using DBInterface, SQLite
const DB_FILE = "DB_JL.sqlite"
const DATA_SIZE = 50988
my_data = rand(DATA_SIZE)
conn = SQLite.DB(DB_FILE)
DBInterface.execute(conn, "CREATE TABLE test (number REAL)")
@info "Test with loop"
stmt = DBInterface.prepare(conn, "INSERT INTO test VALUES(?)")
@time for x ∈ my_data
DBInterface.execute(stmt, (x, ))
end
@info "Test with executemany"
stmt = DBInterface.prepare(conn, "INSERT INTO test VALUES(:col1)")
@time DBInterface.executemany(stmt, (col1=my_data, ))Output
...
[a10d1c49] + DBInterface v2.4.1
[0aa819cd] + SQLite v1.1.4
...
[ Info: Test with loop
86.731634 seconds (892.30 k allocations: 58.522 MiB, 0.01% gc time, 0.22% compilation time)
[ Info: Test with executemany
75.921188 seconds (478.19 k allocations: 43.262 MiB, 0.01% gc time, 0.11% compilation time)
Python
import sqlite3
import random, time
DB_FILE = "DB_PY.sqlite"
DATA_SIZE = 50988
my_data = [random.random() for _ in range(DATA_SIZE)]
conn = sqlite3.connect("DB_PY.sqlite")
conn.execute("CREATE TABLE test (number REAL)")
conn.commit()
print("Test with loop")
begin = time.time()
for x in my_data:
conn.execute("INSERT INTO test VALUES(?)", (x, ))
# Measure the time it takes to write everything
conn.commit()
elapsed = time.time() - begin
print(f"Time: {elapsed:.6} seconds")
print("Test with executemany")
begin = time.time()
conn.executemany("INSERT INTO test VALUES(?)", [(x, ) for x in my_data])
conn.commit()
elapsed = time.time() - begin
print(f"Time: {elapsed:.6} seconds")
conn.close()Output
Test with loop
Time: 0.190992 seconds
Test with executemany
Time: 0.0642009 seconds
I checked, both database files are 1.6 MB, so the data are being written by both.
- Julia's best timing is 75 (!) seconds using
DBInterface.executemany - Python's worst timing is 0.19 (!) seconds using a loop
...so Julia is about 400 times slower here.
- Julia's
executemanytiming: 75 seconds - Python's
executemanytiming: 0.064 seconds
Apparently, Python's executemany is about 1171 times (!!) faster than Julia's.
What's going on here? What am I doing wrong? It looks like each call to DBInterface.execute is immediately writing stuff to disk, while neither conn.execute, nor conn.executemany write anything (at first I forgot to conn.close(), and the DB file was around 20 KB, so almost empty). Apparently, conn.commit() writes everything in one go, which is faster than writing one row at a time. Does SQLite.jl not do such caching? Or is there a way of enabling it somehow?
- Python 3.9.5
- Julia 1.6.1