Pandas.DataFrame.to_sql allows us to create tables into our sqlite database, tables can be newly created, appended, or overwritten. If we want to append data though, the function won’t handle duplicate items when adding new rows by default.

In order to update the tables without overwriting previous data and only adding new rows, we can use a middle-step where we load the data into a temporary table and then make sure we only insert new rows into the final table.

Assuming our final table has the following schema:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE TABLE IF NOT EXISTS stocks(
    id INTEGER NOT NULL PRIMARY KEY
    symbol TEXT NOT NULL UNIQUE,
    name TEXT NOT NULL,
    ipo_year INTEGER,
    country TEXT,
    sector TEXT,
    industry TEXT,
    is_public BOOLEAN,
    description TEXT,
    extraction_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)

And we have a csv file, for eg, here we have a file holding public traded company stock info:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
Symbol,Name,Last Sale,Net Change,% Change,Market Cap,Country,IPO Year,Volume,Sector,Industry
ZUMZ,Zumiez Inc. Common Stock,$42.92,-0.57,-1.311%,1106521421.00,United States,2005,188533,,
ZUO,Zuora Inc. Class A Common Stock,$16.11,-0.09,-0.556%,1970253000.00,,2018,570413,Technology,EDP Services
ZVO,Zovio Inc. Common Stock,$2.46,-0.15,-5.747%,81988233.00,United States,,130740,Miscellaneous,Other Consumer Services
ZWRK,Z-Work Acquisition Corp. Class A Common Stock,$9.66,-0.02,-0.207%,277725000.00,United States,2021,4673,,
ZWRKU,Z-Work Acquisition Corp. Units,$9.99,0.00,0.00%,0.00,United States,2021,3351,,
ZWRKW,Z-Work Acquisition Corp. Warrant,$0.9598,-0.0004,-0.042%,0.00,United States,2021,31435,Finance,Business Services
ZY,Zymergen Inc. Common Stock,$35.95,0.96,2.744%,3606675410.00,United States,2021,175179,Health Care,Biotechnology: Commercial Physical & Biological Resarch
ZYME,Zymeworks Inc. Common Shares,$35.73,0.13,0.365%,1649655315.00,Canada,2017,530694,,
ZYNE,Zynerba Pharmaceuticals Inc. Common Stock,$4.57,-0.13,-2.766%,188519524.00,United States,2015,358037,Health Care,Major Pharmaceuticals
ZYXI,Zynex Inc. Common Stock,$16.81,1.62,10.665%,585772287.00,United States,,779992,Health Care,Biotechnology: Electromedical & Electrotherapeutic Apparatus

We can then have a script that’ll process data coming in from the csv file and save it into our final table whilst appending only new rows in subsequent runs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import sqlite3
import pandas as pd
from datetime import datetime

con = sqlite3.connect('db.sqlite3')
extraction_datetime = datetime.now()
stocks_file = 'stocks.csv'


df: pd.DataFrame = pd.read_csv(stocks_file, usecols=['Symbol', 'Name', 'Country', 'IPO Year', 'Sector', 'Industry'])

# lowercase all the columns
df.columns = [x.lower() for x in df.columns]

df.rename(columns={'ipo year': 'ipo_year'}, inplace=True)
df['extraction_date'] = extraction_datetime
df['is_public'] = True

# create a temp table to hold the new data that'll be diffed against the final table
df.to_sql('stocks_temp', con, if_exists='replace', index=False, dtype={'ipo_year': 'INTEGER'})
con.execute('''
    INSERT INTO stocks(symbol, name, country, ipo_year, sector, industry, is_public, extraction_date)
    SELECT 
        stocks_temp.symbol, 
        stocks_temp.name, 
        stocks_temp.country, 
        stocks_temp.ipo_year, 
        stocks_temp.sector, 
        stocks_temp.industry,
        stocks_temp.is_public,
        stocks_temp.extraction_date
    FROM 
        stocks_temp
    WHERE NOT EXISTS (
        SELECT 1 FROM stocks WHERE stocks.symbol = stocks_temp.symbol
    )
''')
con.execute('DROP TABLE IF EXISTS stocks_temp')
con.commit()