Skip to content

SQLite.jl appears to be 400 times slower than Python's sqlite3 #246

@ForceBru

Description

@ForceBru

I was trying to write 50988 entries to a database and noticed that it was really slow (tens of seconds). So I created a simple benchmark in Julia and Python.

Julia

begin
	import Pkg
	Pkg.activate(temp=true)
	Pkg.add(["DBInterface", "SQLite"])
end

using DBInterface, SQLite

const DB_FILE = "DB_JL.sqlite"
const DATA_SIZE = 50988
my_data = rand(DATA_SIZE)

conn = SQLite.DB(DB_FILE)

DBInterface.execute(conn, "CREATE TABLE test (number REAL)")

@info "Test with loop"
stmt = DBInterface.prepare(conn, "INSERT INTO test VALUES(?)")
@time for x  my_data
	DBInterface.execute(stmt, (x, ))
end

@info "Test with executemany"
stmt = DBInterface.prepare(conn, "INSERT INTO test VALUES(:col1)")
@time DBInterface.executemany(stmt, (col1=my_data, ))

Output

...
  [a10d1c49] + DBInterface v2.4.1
  [0aa819cd] + SQLite v1.1.4
...
[ Info: Test with loop
 86.731634 seconds (892.30 k allocations: 58.522 MiB, 0.01% gc time, 0.22% compilation time)
[ Info: Test with executemany
 75.921188 seconds (478.19 k allocations: 43.262 MiB, 0.01% gc time, 0.11% compilation time)

Python

import sqlite3
import random, time

DB_FILE = "DB_PY.sqlite"
DATA_SIZE = 50988
my_data = [random.random() for _ in range(DATA_SIZE)]

conn = sqlite3.connect("DB_PY.sqlite")
conn.execute("CREATE TABLE test (number REAL)")
conn.commit()

print("Test with loop")
begin = time.time()
for x in my_data:
	conn.execute("INSERT INTO test VALUES(?)", (x, ))
# Measure the time it takes to write everything
conn.commit()
elapsed = time.time() - begin
print(f"Time: {elapsed:.6} seconds")

print("Test with executemany")
begin = time.time()
conn.executemany("INSERT INTO test VALUES(?)", [(x, ) for x in my_data])
conn.commit()
elapsed = time.time() - begin
print(f"Time: {elapsed:.6} seconds")

conn.close()

Output

Test with loop
Time: 0.190992 seconds
Test with executemany
Time: 0.0642009 seconds

I checked, both database files are 1.6 MB, so the data are being written by both.

  • Julia's best timing is 75 (!) seconds using DBInterface.executemany
  • Python's worst timing is 0.19 (!) seconds using a loop

...so Julia is about 400 times slower here.

  • Julia's executemany timing: 75 seconds
  • Python's executemany timing: 0.064 seconds

Apparently, Python's executemany is about 1171 times (!!) faster than Julia's.


What's going on here? What am I doing wrong? It looks like each call to DBInterface.execute is immediately writing stuff to disk, while neither conn.execute, nor conn.executemany write anything (at first I forgot to conn.close(), and the DB file was around 20 KB, so almost empty). Apparently, conn.commit() writes everything in one go, which is faster than writing one row at a time. Does SQLite.jl not do such caching? Or is there a way of enabling it somehow?

  • Python 3.9.5
  • Julia 1.6.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions