parsing - reading badly formed csv in R - mismatched quotes -


i have hundreds of large csv files (sizes vary 10k lines 100k lines in each) , of them have badly formed descriptions quotes within quotes might like

id,description,x 3434,"abc"def",988 2344,"fred",3484 2345,"fr""ed",3485 2346,"joe,fred",3486 

i need able cleanly parse of these lines in r csv. dput()'ing , reading ...

txt <- c("id,description,x",     "3434,\"abc\"def\",988",     "2344,\"fred\",3484",      "2345,\"fr\"\"ed\",3485",     "2346,\"joe,fred\",3486")  read.csv(text=txt[1:4], colclasses='character')     error in read.table(file = file, header = header, sep = sep, quote = quote,  :        incomplete final line found readtableheader on 'text' 

if change quoting , not include last line embedded comma - works well

read.csv(text=txt[1:4], colclasses='character', quote='') 

however, if change quoting , include last line embedded comma...

read.csv(text=txt[1:5], colclasses='character', quote='')     error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :        line 1 did not have 4 elements 

edit x2: should have said unfortunately of descriptions have commas in them - code edited above.

change quote setting:

read.csv(text=txt, colclasses='character',quote = "")      id description    x 1 3434   "abc"def"  988 2 2344      "fred" 3484 3 2345    "fr""ed" 3485 4 2346       "joe" 3486 

edit deal errant commas:

  txt <- c("id,description,x",          "3434,\"abc\"def\",988",          "2344,\"fred\",3484",           "2345,\"fr\"\"ed\",3485",          "2346,\"joe,fred\",3486")  txt2 <- readlines(textconnection(txt))   txt2 <- strsplit(txt2,",")  txt2 <- lapply(txt2,function(x) c(x[1],paste(x[2:(length(x)-1)],collapse=","),x[length(x)]) ) m <- do.call("rbind",txt2) df <- as.data.frame(m,stringsasfactors = false) names(df) <- df[1,] df <- df[-1,]  #     id description    x # 2 3434   "abc"def"  988 # 3 2344      "fred" 3484 # 4 2345    "fr""ed" 3485 # 5 2346  "joe,fred" 3486 

no idea, if sufficiently efficient use case.


Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -