sql - How to bulk insert only new rows in PostreSQL -
i have list of products (3 million items) without ids - titles. don't know titles exist in db. new products (about 2.9 million items) must added db. after must know id each products (new , existing).
is there fastest way in postgresql? can change db needed (add default values, add columns etc.).
import data
copy temporary staging table , insert new titles target table.
create temp table tmp(title text); copy tmp 'path/to/file.csv'; analyze tmp; insert tbl select distinct tmp.title tmp left join tbl using (title) tbl.title null; ids should generated automatically serial column tbl_id in tbl.
the left join / is null construct disqualifies existing titles. not exists possibility.
distinct prevents duplicates in incoming data in temporary table tmp.
analyze useful make sure query planner picks sensible plan, , temporary tables not analyzed autovacuum.
since have 3 million items, might pay raise setting temp_buffer (for session only):
set temp_buffers = 1000mb; or can afford , enough hold temp table in ram, faster. note: must done first in session - before temp objects created.
retrieve ids
to see ids imported data:
select tbl.tbl_id, tbl.title tbl join tmp using (title) in same session! temporary table dropped automatically @ end of session.
Comments
Post a Comment