r - fuzzy matching with multiple words -


i'm trying fuzzy matching in r, have multiple fields of data match against.

for example:

try_to_match <- c('seoul korea', 'bisbane', 'korea', 'australia brisbane') locations <- data.frame(name=c('seoul', 'brisbane'),                         country=c('south korea', 'australia')) 

i want match user-entered locations in try_to_match locations dataframe.

now, there similar questions fuzzy matching r on so, , cover agrep. however, can't find cover fuzzy matching when there multiple words match over.

for example, if match against locations$name, match "bisbane" "brisbane", expect. also, no matches various searches country in them, locations$name has no country in it.

sapply(try_to_match, agrep, locations$name, value=t) # $`seoul korea` # character(0)     # $bisbane # [1] "brisbane"     # $korea # character(0) # $`australia brisbane` # character(0) 

so, guess should incorporate matching country too:

sapply(try_to_match, agrep, paste(locations$name, locations$country), value=t) # $`seoul korea` # character(0)     # $bisbane # [1] "brisbane australia"     # $korea # [1] "seoul south korea"     # $`australia brisbane` # character(0) 

however, still cannot match "seoul korea" "seoul south korea" because of missing word. likewise, while "brisbane australia" match appropriately, "australia brisbane" not (because order of words reversed). (it bit iffy "korea" match "seoul south korea", i'm happy stay now).

so, question is: how fuzzy matching when search and match terms may have multiple words , individually mis-spelled, , order of words can different?

is there package sort of searching me?

(yes, can use excellent geonames web service lot of matching, want avoid making many requests server. i'm more interested in ability sort of search in r ability geocode).


Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -