100字范文 > 白话Elasticsearch20-深度探秘搜索技术之使用rescoring机制优化近似匹配搜索的性能

白话Elasticsearch20-深度探秘搜索技术之使用rescoring机制优化近似匹配搜索的性能

时间：2021-02-18 01:08:24

文章目录

概述官网match和phrase match(proximity match)区别优化proximity match的性能

概述

继续跟中华石杉老师学习ES，第19篇

课程地址： /view/55

官网

白话Elasticsearch17-match_phrase query 短语匹配搜索

白话Elasticsearch18-基于slop参数实现近似匹配以及原理剖析

白话Elasticsearch19-混合使用match和近似匹配实现召回率（recall）与精准度（precision）的平衡

上面3篇博客我们学习了短语匹配和近似匹配，当近视匹配出现性能问题时，该如何优化呢？

官网说明： https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-rescore.html

match和phrase match(proximity match)区别

简单来说

match ：只要简单的匹配到了一个term，就可以理解将term对应的doc作为结果返回，扫描倒排索引，扫描到了就ok

phrase match : 首先扫描到所有term的doc list; 找到包含所有term的doc list; 然后对每个doc都计算每个term的position，是否符合指定的范围; slop，需要进行复杂的运算，来判断能否通过slop移动，匹配一个doc

一般来讲，

match query的性能比phrase match和proximity match（有slop）要高很多。因为后两者都要计算position的距离。match query比phrase match的性能要高10倍，比proximity match的性能要高20倍。

但是别太担心，因为es的性能一般都在毫秒级别，match query一般就在几毫秒，或者几十毫秒，而phrase match和proximity match的性能在几十毫秒到几百毫秒之间，所以也是可以接受的。

优化proximity match的性能

优化proximity match的性能，一般就是减少要进行proximity match搜索的document数量。

主要思路就是，用match query先过滤出需要的数据，然后再用proximity match来根据term距离提高doc的分数，同时proximity match只针对每个shard的分数排名前n个doc起作用，来重新调整它们的分数，这个过程称之为rescoring，重计分。因为一般用户会分页查询，只会看到前几页的数据，所以不需要对所有结果进行proximity match操作。

那就是： match + proximity match同时实现召回率和精准度

白话Elasticsearch19-混合使用match和近似匹配实现召回率（recall）与精准度（precision）的平衡

默认情况下，match也许匹配了1000个doc，proximity match全都需要对每个doc进行一遍运算，判断能否slop移动匹配上，然后去贡献自己的分数。

但是很多情况下，match出来也许1000个doc，其实用户大部分情况下是分页查询的，所以可能最多只会看前几页，比如一页是10条，最多也许就看5页，就是50条

proximity match只要对前50个doc进行slop移动去匹配，去贡献自己的分数即可，不需要对全部1000个doc都去进行计算和贡献分数

rescore：重打分

match：1000个doc，其实这时候每个doc都有一个分数了; proximity match，前50个doc，进行rescore，重打分，即可; 让前50个doc，term举例越近的，排在越前面

DSL如下：

GET /forum/article/_search{"query": {"match": {"content": "java spark"}},"rescore": {"window_size": 50,"query": {"rescore_query": {"match_phrase": {"content": {"query": "java spark","slop": 10}}}}}}

返回结果

{"took": 3,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": 2,"max_score": 3.2468495,"hits": [{"_index": "forum","_type": "article","_id": "5","_score": 3.2468495,"_source": {"articleID": "DHJK-B-1395-#Ky5","userID": 3,"hidden": false,"postDate": "-05-01","tag": ["elasticsearch"],"tag_cnt": 1,"view_cnt": 10,"title": "this is spark blog","content": "spark is best big data solution based on scala ,an programming language similar to java spark","sub_title": "haha, hello world","author_first_name": "Tonny","author_last_name": "Peter Smith","new_author_last_name": "Peter Smith","new_author_first_name": "Tonny"}},{"_index": "forum","_type": "article","_id": "2","_score": 0.7721133,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog","content": "i think java is the best programming language","sub_title": "learned a lot of course","author_first_name": "Smith","author_last_name": "Williams","new_author_last_name": "Williams","new_author_first_name": "Smith"}}]}}

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。