100字范文,内容丰富有趣,生活中的好帮手!
100字范文 > 白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cr

白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cr

时间:2021-07-12 15:15:47

相关推荐

白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cr

文章目录

概述官网示例

概述

继续跟中华石杉老师学习ES,第十四篇

课程地址: /view/55

官网

https://www.elastic.co/guide/en/elasticsearch/reference/7.2/query-dsl-multi-match-query.html

cross-fields搜索,一个唯一标识,跨了多个field。

比如一个人,标识,是姓名;一个建筑,它的标识是地址。

姓名可以散落在多个field中,比如first_name和last_name中,地址可以散落在country,province,city中。

跨多个field搜索一个标识,比如搜索一个人名,或者一个地址,就是cross-fields搜索

初步来说,如果要实现,可能用most_fields比较合适。因为best_fields是优先搜索单个field最匹配的结果,cross-fields本身就不是一个field的问题了。

示例

构造数据

POST /forum/article/_bulk{"update": {"_id": "1"} }{"doc" : {"author_first_name" : "Peter", "author_last_name" : "Smith"} }{"update": {"_id": "2"} }{"doc" : {"author_first_name" : "Smith", "author_last_name" : "Williams"} }{"update": {"_id": "3"} }{"doc" : {"author_first_name" : "Jack", "author_last_name" : "Ma"} }{"update": {"_id": "4"} }{"doc" : {"author_first_name" : "Robbin", "author_last_name" : "Li"} }{"update": {"_id": "5"} }{"doc" : {"author_first_name" : "Tonny", "author_last_name" : "Peter Smith"} }

执行查询

GET /forum/article/_search{"query": {"multi_match": {"query": "Peter Smith","type": "cross_fields","fields": ["author_first_name","author_last_name"]}}}

等同于 most_fileds

GET /forum/article/_search{"query": {"multi_match": {"query": "Peter Smith","type": "most_fields","fields": ["author_first_name","author_last_name"]}}}

返回结果

{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": 3,"max_score": 2.3258216,"hits": [{"_index": "forum","_type": "article","_id": "1","_score": 2.3258216,"_source": {"articleID": "XHDK-A-1293-#fJ3","userID": 1,"hidden": false,"postDate": "-01-01","tag": ["java","hadoop"],"tag_cnt": 2,"view_cnt": 30,"title": "this is java and elasticsearch blog","content": "i like to write best elasticsearch article","sub_title": "learning more courses","author_first_name": "Peter","author_last_name": "Smith"}},{"_index": "forum","_type": "article","_id": "5","_score": 1.7770995,"_source": {"articleID": "DHJK-B-1395-#Ky5","userID": 3,"hidden": false,"postDate": "-05-01","tag": ["elasticsearch"],"tag_cnt": 1,"view_cnt": 10,"title": "this is spark blog","content": "spark is best big data solution based on scala ,an programming language similar to java","sub_title": "haha, hello world","author_first_name": "Tonny","author_last_name": "Peter Smith"}},{"_index": "forum","_type": "article","_id": "2","_score": 0.5389965,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog","content": "i think java is the best programming language","sub_title": "learned a lot of course","author_first_name": "Smith","author_last_name": "Williams"}}]}}

5.x版本中可能会出现: Peter Smith,匹配author_first_name,匹配到了Smith,这时候它的分数很高,为什么???

因为IDF分数高,IDF分数要高,那么这个匹配到的term(Smith),在所有doc中的出现频率要低,author_first_name field中,Smith就出现过1次

Peter Smith这个人,doc 1,Smith在author_last_name中,但是author_last_name出现了两次Smith,所以导致doc 1的IDF分数较低

cross-fields弊端

问题1:只是找到尽可能多的field匹配的doc,而不是某个field完全匹配的doc问题2:most_fields,没办法用minimum_should_match去掉长尾数据,就是匹配的特别少的结果问题3:TF/IDF算法,比如Peter Smith和Smith Williams,搜索Peter Smith的时候,由于first_name中很少有Smith的,所以query在所有document中的频率很低,得到的分数很高,可能Smith Williams反而会排在Peter Smith前面

白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cross-fields search弊端

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。