r - reading huge csv files with illegal EOL marker -
i need read several huge(>400mb) csv log files r. file looks like:
n visit_date req_url type_level
126424 2013/1/25 23:42:34 http://weibo.cn/attgroup/privateatt?cat=user&f=atts 1
33559 2013/1/25 15:15:54 http://i.ifeng.com/mil/mili?vt=5&dh=touch&mid=akuiag 1
i use following command readin content of csv files. works fine of data. however, there illegal characters in of req_url
field http://some.url/query=_1a_
1a hex code quite similar lf marker. seems scan
function treats these characters eol markers , stops when meets them. there way let r ignore these characters or being treated eol marker? thx.
dat<-scan(file='sample.sv', what=list("integer", "numeric", "character", "integer"), sep='\t', strip.white=t, quote="", multi.line=f, skip=1)
you can use fread
similar read.table faster , more convenient.
text <- '126424 2013/1/25 23:42:34 http://weibo.cn/attgroup/privateatt?cat=user&f=atts 1 33559 2013/1/25 15:15:54 http://i.ifeng.com/mil/mili?vt=5&dh=touch&mid=akuiag 1 33556 2013/1/25 15:15:59 http://some.url/query=_1a_ 1' library(data.table) fread(text) v1 v2 v3 v4 v5 1: 126424 2013/1/25 23:42:34 http://weibo.cn/attgroup/privateatt?cat=user&f=atts 1 2: 33559 2013/1/25 15:15:54 http://i.ifeng.com/mil/mili?vt=5&dh=touch&mid=akuiag 1 3: 33556 2013/1/25 15:15:59 http://some.url/query=_1a_ 1
Comments
Post a Comment