regex - Is there a way to search terms in order with RegexpQuery in lucene? -
i search indexed documents in order using regexpquery.
for example have 2 document text: oracle unveils better expected quarterly results. text: research in motion shares gained 13 per cent on toronto stock exchange friday, day after smartphone maker posted better expected quarterly results.
so far tried got no luck.
query regexq = new regexpquery(new term("text", "^.+better.+quarterly.+results"));
is there way of implementing this?
thanks
i believe phrasequery
fits looking better. can use phrasequery.setslop(int)
allow terms appear between terms of query. like:
query pq = new phrasequery(); pq.add(new term("text", "better")); pq.add(new term("text", "quarterly")); pq.add(new term("text", "results")); pq.setslop(10); //or whatever appropriate slop value you.
this sort of query supported standard queryparser, as seen here, like:
text:"better quarterly results"~10
think phrasequery better implementation here, but...
regarding regexpquery:
i believe intended compare terms against regex, , since phrase searching (i assuming) tokenized, no single term matches whole regex. need index entire field single term make work, using stringfield
, keywordanalyzer
, or similar.
i believe works matcher.matches()
, rather matcher.find()
, say, must match entire input term, rather portion of it. so, if had specified "text" stringfield, need add .*
end consume rest of input.
on similar note, i'm not sure if supports use of character "^
" start of input, being redundant in case. don't see specified in lucene's regexp, have seen reference it's use, i'm not sure whether accepted or not.
to summarize, regexpquery
work like:
query regexq = new regexpquery(new term("text", ".+better.+quarterly.+results.*"));
if used stringfield
, or keywordanalyzer
index entire field single term.
with leading wildcard in regexp, though, expect poor performance (see warning @ top of regexpquery documentation).
Comments
Post a Comment