elasticsearch - Using a custom_score to sort by a nested child's timestamp -
i'm pretty new elasticsearch , have been banging head trying sorting work. general idea search email message threads nested messages , nested participants. goal display search results @ thread level, sorting participant doing search , either last_received_at or last_sent_at column depending on mailbox in.
my understanding can't sort single child's value among many nested children. in order saw couple of suggestions using custom_score script, sorting on score. plan dynamically change sort column , run nested custom_score query return date of 1 of participants score. i've been noticing issues both score format being strange (eg. has 4 zeros @ end) , may not returning date expecting.
below simplified versions of index , query in question. if has suggestions, i'd grateful. (fyi - using elasticsearch version 0.20.6.)
index:
mappings: { message_thread: { properties: { id: { type: long } subject: { dynamic: true properties: { id: { type: long } name: { type: string } } } participants: { dynamic: true properties: { id: { type: long } name: { type: string } last_sent_at: { format: dateoptionaltime type: date } last_received_at: { format: dateoptionaltime type: date } } } messages: { dynamic: true properties: { sender: { dynamic: true properties: { id: { type: long } } } id: { type: long } body: { type: string } created_at: { format: dateoptionaltime type: date } recipient: { dynamic: true properties: { id: { type: long } } } } } version: { type: long } } } } query:
{ "query": { "bool": { "must": [ { "term": { "participants.id": 3785 } }, { "custom_score": { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "term": { "participants.id": 3785 } } } }, "params": { "sort_column": "participants.last_received_at" }, "script": "doc[sort_column].value" } } ] } }, "filter": { "bool": { "must": [ { "term": { "messages.recipient.id": 3785 } } ] } }, "sort": [ "_score" ] } solution:
thanks @imotov, here final result. participants not nested in index (while messages didn't need be). in addition, include_in_root used participants simplify query (participants small records , not real size issue, although @imotov provided example without it). restructured json request use dis_max query.
curl -xdelete "localhost:9200/test-idx" curl -xput "localhost:9200/test-idx" -d '{ "mappings": { "message_thread": { "properties": { "id": { "type": "long" }, "messages": { "properties": { "body": { "type": "string", "analyzer": "standard" }, "created_at": { "type": "date", "format": "yyyy-mm-dd'\''t'\''hh:mm:ss'\''z'\''" }, "id": { "type": "long" }, "recipient": { "dynamic": "true", "properties": { "id": { "type": "long" } } }, "sender": { "dynamic": "true", "properties": { "id": { "type": "long" } } } } }, "messages_count": { "type": "long" }, "participants": { "type": "nested", "include_in_root": true, "properties": { "id": { "type": "long" }, "last_received_at": { "type": "date", "format": "yyyy-mm-dd'\''t'\''hh:mm:ss'\''z'\''" }, "last_sent_at": { "type": "date", "format": "yyyy-mm-dd'\''t'\''hh:mm:ss'\''z'\''" }, "name": { "type": "string", "analyzer": "standard" } } }, "subject": { "properties": { "id": { "type": "long" }, "name": { "type": "string" } } } } } } }' curl -xput "localhost:9200/test-idx/message_thread/1" -d '{ "id" : 1, "subject" : {"name": "test thread"}, "participants" : [ {"id" : 87793, "name" : "john smith", "last_received_at" : null, "last_sent_at" : "2010-10-27t17:26:58z"}, {"id" : 3785, "name" : "david jones", "last_received_at" : "2010-10-27t17:26:58z", "last_sent_at" : null} ], "messages" : [{ "id" : 1, "body" : "this test.", "sender" : { "id" : 87793 }, "recipient" : { "id" : 3785}, "created_at" : "2010-10-27t17:26:58z" }] }' curl -xput "localhost:9200/test-idx/message_thread/2" -d '{ "id" : 2, "subject" : {"name": "elastic"}, "participants" : [ {"id" : 57834, "name" : "paul johnson", "last_received_at" : "2010-11-25t17:26:58z", "last_sent_at" : "2010-10-25t17:26:58z"}, {"id" : 3785, "name" : "david jones", "last_received_at" : "2010-10-25t17:26:58z", "last_sent_at" : "2010-11-25t17:26:58z"} ], "messages" : [{ "id" : 2, "body" : "more testing of elasticsearch.", "sender" : { "id" : 57834 }, "recipient" : { "id" : 3785}, "created_at" : "2010-10-25t17:26:58z" },{ "id" : 3, "body" : "reply message.", "sender" : { "id" : 3785 }, "recipient" : { "id" : 57834}, "created_at" : "2010-11-25t17:26:58z" }] }' curl -xpost localhost:9200/test-idx/_refresh echo # using include in root curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{ "query": { "filtered": { "query": { "nested": { "path": "participants", "score_mode": "max", "query": { "custom_score": { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "term": { "participants.id": 3785 } } } }, "params": { "sort_column": "participants.last_received_at" }, "script": "doc[sort_column].value" } } } }, "filter": { "query": { "multi_match": { "query": "test", "fields": ["subject.name", "participants.name", "messages.body"], "operator": "and", "use_dis_max": true } } } } }, "sort": ["_score"], "fields": [] } ' # not using include in root curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{ "query": { "filtered": { "query": { "nested": { "path": "participants", "score_mode": "max", "query": { "custom_score": { "query": { "filtered": { "query": { "match_all": {} }, "filter": { "term": { "participants.id": 3785 } } } }, "params": { "sort_column": "participants.last_received_at" }, "script": "doc[sort_column].value" } } } }, "filter": { "query": { "bool": { "should": [{ "match": { "subject.name":"test" } }, { "nested" : { "path": "participants", "query": { "match": { "name":"test" } } } }, { "match": { "messages.body":"test" } } ] } } } } }, "sort": ["_score"], "fields": [] } '
there couple of issues here. asking nested objects, participants not defined in mapping nested objects. second possible issue score has type float, might not have enough precision represent timestamp is. if can figure out how fit value float, can take @ example: elastic search - tagging strength (nested/child document boosting). however, if developing new system, might prudent upgrade 0.90.0.beta1, supports sorting on nested fields.
Comments
Post a Comment