sql - How to bulk insert only new rows in PostreSQL -


i have list of products (3 million items) without ids - titles. don't know titles exist in db. new products (about 2.9 million items) must added db. after must know id each products (new , existing).

is there fastest way in postgresql? can change db needed (add default values, add columns etc.).

import data

copy temporary staging table , insert new titles target table.

create temp table tmp(title text);  copy tmp 'path/to/file.csv'; analyze tmp;  insert tbl select distinct tmp.title   tmp  left   join tbl using (title)  tbl.title null; 

ids should generated automatically serial column tbl_id in tbl.

the left join / is null construct disqualifies existing titles. not exists possibility.

distinct prevents duplicates in incoming data in temporary table tmp.

analyze useful make sure query planner picks sensible plan, , temporary tables not analyzed autovacuum.

since have 3 million items, might pay raise setting temp_buffer (for session only):

set temp_buffers = 1000mb; 

or can afford , enough hold temp table in ram, faster. note: must done first in session - before temp objects created.

retrieve ids

to see ids imported data:

select tbl.tbl_id, tbl.title   tbl join   tmp using (title) 

in same session! temporary table dropped automatically @ end of session.


Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -