regex - Is there a way to search terms in order with RegexpQuery in lucene? -
i search indexed documents in order using regexpquery.
for example have 2 document text: oracle unveils better expected quarterly results. text: research in motion shares gained 13 per cent on toronto stock exchange friday, day after smartphone maker posted better expected quarterly results.
so far tried got no luck.
query regexq = new regexpquery(new term("text", "^.+better.+quarterly.+results"));
is there way of implementing this?
thanks
i believe phrasequery fits looking better. can use phrasequery.setslop(int) allow terms appear between terms of query. like:
query pq = new phrasequery(); pq.add(new term("text", "better")); pq.add(new term("text", "quarterly")); pq.add(new term("text", "results")); pq.setslop(10); //or whatever appropriate slop value you. this sort of query supported standard queryparser, as seen here, like:
text:"better quarterly results"~10 think phrasequery better implementation here, but...
regarding regexpquery:
i believe intended compare terms against regex, , since phrase searching (i assuming) tokenized, no single term matches whole regex. need index entire field single term make work, using stringfield, keywordanalyzer, or similar.
i believe works matcher.matches(), rather matcher.find(), say, must match entire input term, rather portion of it. so, if had specified "text" stringfield, need add .* end consume rest of input.
on similar note, i'm not sure if supports use of character "^" start of input, being redundant in case. don't see specified in lucene's regexp, have seen reference it's use, i'm not sure whether accepted or not.
to summarize, regexpquery work like:
query regexq = new regexpquery(new term("text", ".+better.+quarterly.+results.*")); if used stringfield, or keywordanalyzer index entire field single term.
with leading wildcard in regexp, though, expect poor performance (see warning @ top of regexpquery documentation).
Comments
Post a Comment