r - reading huge csv files with illegal EOL marker -


i need read several huge(>400mb) csv log files r. file looks like:

n visit_date req_url type_level

126424 2013/1/25 23:42:34 http://weibo.cn/attgroup/privateatt?cat=user&f=atts 1

33559 2013/1/25 15:15:54 http://i.ifeng.com/mil/mili?vt=5&dh=touch&mid=akuiag 1

i use following command readin content of csv files. works fine of data. however, there illegal characters in of req_url field http://some.url/query=_1a_ 1a hex code quite similar lf marker. seems scan function treats these characters eol markers , stops when meets them. there way let r ignore these characters or being treated eol marker? thx.

dat<-scan(file='sample.sv', what=list("integer", "numeric", "character", "integer"), sep='\t', strip.white=t, quote="", multi.line=f, skip=1)

you can use fread similar read.table faster , more convenient.

text <- '126424 2013/1/25 23:42:34 http://weibo.cn/attgroup/privateatt?cat=user&f=atts 1 33559 2013/1/25 15:15:54 http://i.ifeng.com/mil/mili?vt=5&dh=touch&mid=akuiag 1 33556 2013/1/25 15:15:59 http://some.url/query=_1a_ 1' library(data.table) fread(text)        v1        v2       v3                                                   v4 v5 1: 126424 2013/1/25 23:42:34  http://weibo.cn/attgroup/privateatt?cat=user&f=atts  1 2:  33559 2013/1/25 15:15:54 http://i.ifeng.com/mil/mili?vt=5&dh=touch&mid=akuiag  1 3:  33556 2013/1/25 15:15:59                           http://some.url/query=_1a_  1 

Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -