sql - How to bulk insert only new rows in PostreSQL -
i have list of products (3 million items) without ids - titles. don't know titles exist in db. new products (about 2.9 million items) must added db. after must know id each products (new , existing).
is there fastest way in postgresql? can change db needed (add default values, add columns etc.).
import data
copy
temporary staging table , insert new titles target table.
create temp table tmp(title text); copy tmp 'path/to/file.csv'; analyze tmp; insert tbl select distinct tmp.title tmp left join tbl using (title) tbl.title null;
ids should generated automatically serial
column tbl_id
in tbl
.
the left join
/ is null
construct disqualifies existing titles. not exists
possibility.
distinct
prevents duplicates in incoming data in temporary table tmp
.
analyze
useful make sure query planner picks sensible plan, , temporary tables not analyzed autovacuum.
since have 3 million items, might pay raise setting temp_buffer
(for session only):
set temp_buffers = 1000mb;
or can afford , enough hold temp table in ram, faster. note: must done first in session - before temp objects created.
retrieve ids
to see ids imported data:
select tbl.tbl_id, tbl.title tbl join tmp using (title)
in same session! temporary table dropped automatically @ end of session.
Comments
Post a Comment