Pandas.DataFrame.to_sql allows us to create tables into our
sqlite database, tables can be newly created, appended, or overwritten. If we want to append data though, the function won’t handle duplicate items when adding new rows by default.
In order to update the tables without overwriting previous data and only adding new rows, we can use a middle-step where we load the data into a temporary table and then make sure we only insert new rows into the final table.
Assuming our final table has the following schema:
And we have a
csv file, for eg, here we have a file holding public traded company stock info:
We can then have a script that’ll process data coming in from the
csv file and save it into our final table whilst appending only new rows in subsequent runs: