一起來學ElasticSearch(十)

語言: CN / TW / HK

前言

目前正在出一個Es專題系列教程, 篇幅會較多, 喜歡的話,給個關注❤️ ~

承接上文,上節給大家講的es聚合還有一點內容,本節給大家更完~

本文偏實戰一些,為了方便演示,本節示例沿用上節索引,好了, 廢話不多說直接開整吧~

聚合排序

我們如何在聚合結果中進行自定義欄位排序呢?

預設排序

之前給大家講過,預設情況下terms聚合預設使用doc_count倒序排列,也可以使用_count同樣代表doc_count,下面一起看個例子:

json GET req_log/_search { "aggs": { "req_count": { "terms": { "field": "path" } } } }

返回:

```json .....此處省略 "aggregations" : { "req_count" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/3", "doc_count" : 3 }, { "key" : "/api/post/6", "doc_count" : 3 }, { "key" : "/api/post/1", "doc_count" : 2 }, { "key" : "/api/post/2", "doc_count" : 2 }, { "key" : "/api/post/4", "doc_count" : 2 }, { "key" : "/api/post/10", "doc_count" : 1 }, { "key" : "/api/post/12", "doc_count" : 1 }, { "key" : "/api/post/20", "doc_count" : 1 }, { "key" : "/api/post/7", "doc_count" : 1 }, { "key" : "/api/post/8", "doc_count" : 1 } ] } }

```

看結果可以看到,預設下_count倒序,如果想升序怎麼操作呢?

json GET req_log/_search { "aggs": { "req_count": { "terms": { "field": "path", "order": { "_count": "asc" } } } } }

再看結果:

```json .... 此處省略 "aggregations" : { "req_count" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/10", "doc_count" : 1 }, { "key" : "/api/post/12", "doc_count" : 1 }, { "key" : "/api/post/20", "doc_count" : 1 }, { "key" : "/api/post/7", "doc_count" : 1 }, { "key" : "/api/post/8", "doc_count" : 1 }, { "key" : "/api/post/1", "doc_count" : 2 }, { "key" : "/api/post/2", "doc_count" : 2 }, { "key" : "/api/post/4", "doc_count" : 2 }, { "key" : "/api/post/3", "doc_count" : 3 }, { "key" : "/api/post/6", "doc_count" : 3 } ] } }

```

看結果,它是按照_count升序排序的。當然,這裡也可以按照_key進行排序,來看個例子:

json GET req_log/_search { "aggs": { "req_count": { "terms": { "field": "times", "order": { "_key": "asc" } } } } }

結果:

```json .... "aggregations" : { "req_count" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 6, "buckets" : [ { "key" : 20, "doc_count" : 1 }, { "key" : 30, "doc_count" : 1 }, { "key" : 80, "doc_count" : 2 }, { "key" : 89, "doc_count" : 1 }, { "key" : 120, "doc_count" : 1 }, { "key" : 150, "doc_count" : 1 }, { "key" : 210, "doc_count" : 1 }, { "key" : 270, "doc_count" : 1 }, { "key" : 380, "doc_count" : 1 }, { "key" : 400, "doc_count" : 1 } ] } }

```

指定了times欄位,_key按照升序進行排序

同層級自定義排序

那如何進行自定義排序呢?我們依然從層級上來講,首先給大家說說同層級怎麼進行排序,下面看個例子:

假設,有這麼一個需求:要求統計所有請求日誌中請求耗時最高的api,怎麼做呢?

其實很簡單,我們只需要將排序的欄位存在聚合的內容按照指定的欄位進行排序即可,來看具體操作

json GET req_log/_search { "aggs": { "req_total": { "terms": { "field": "path", "order": { "total_times": "desc" } }, "aggs": { "total_times": { "sum": { "field": "times" } } } } } }

結果:

json .... "aggregations" : { "req_total" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/8", "doc_count" : 1, "total_times" : { "value" : 9000.0 } }, { "key" : "/api/post/6", "doc_count" : 3, "total_times" : { "value" : 3160.0 } }, { "key" : "/api/post/7", "doc_count" : 1, "total_times" : { "value" : 870.0 } }, { "key" : "/api/post/12", "doc_count" : 1, "total_times" : { "value" : 630.0 } }, { "key" : "/api/post/4", "doc_count" : 2, "total_times" : { "value" : 610.0 } }, { "key" : "/api/post/2", "doc_count" : 2, "total_times" : { "value" : 410.0 } }, { "key" : "/api/post/10", "doc_count" : 1, "total_times" : { "value" : 270.0 } }, { "key" : "/api/post/1", "doc_count" : 2, "total_times" : { "value" : 230.0 } }, { "key" : "/api/post/3", "doc_count" : 3, "total_times" : { "value" : 189.0 } }, { "key" : "/api/post/20", "doc_count" : 1, "total_times" : { "value" : 120.0 } } ] } }

深層級自定義排序

接下來難度加深,假設有這麼一個需求:

統計每天請求中為GET請求,並且按照請求耗時倒序排序,找出每天請求耗時最高的api

需求很短,但理解起來有不少關鍵點:

  • 需要統計每天的結果
  • 請求為GET
  • 結果按照請求耗時倒序排序

這個怎麼做呢?一起來看一下。先新增點資料,以便更好的理解這個例子:

json POST req_log/_bulk { "index": {}} { "times" : 180, "method" : "GET", "path" : "/api/post/1", "created" : "2023-02-09" } { "index": {}} { "times" : 120, "method" : "GET", "path" : "/api/post/3", "created" : "2023-02-09" } { "index": {}} { "times" : 140, "method" : "GET", "path" : "/api/post/2", "created" : "2023-02-09" } { "index": {}} { "times" : 130, "method" : "GET", "path" : "/api/post/20", "created" : "2023-02-09" } { "index": {}} { "times" : 60, "method" : "GET", "path" : "/api/post/9", "created" : "2023-02-09" }

下面我們就按照需求,把結果統計出來:

```json GET req_log/_search { "aggs": { "date": { "date_histogram": { "field": "created", "calendar_interval": "1d", "format": "yyyy-MM-dd" }, "aggs": { "req_path": { "terms": { "field": "path", "order": { "req_method>total_times": "desc" } }, "aggs": { "req_method": { "filter": { "terms": { "method": [ "GET" ] } }, "aggs": { "total_times": { "sum": { "field": "times" } } } } } } } } } }

```

結果返回:

json .... 此處省略 "aggregations" : { "date" : { "buckets" : [ { "key_as_string" : "2023-02-01", "key" : 1675209600000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/6", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 1300.0 } } } ] } }, { "key_as_string" : "2023-02-02", "key" : 1675296000000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/8", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 9000.0 } } } ] } }, { "key_as_string" : "2023-02-03", "key" : 1675382400000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/6", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 960.0 } } } ] } }, { "key_as_string" : "2023-02-04", "key" : 1675468800000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/3", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 80.0 } } } ] } }, { "key_as_string" : "2023-02-05", "key" : 1675555200000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/1", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 150.0 } } } ] } }, { "key_as_string" : "2023-02-06", "key" : 1675641600000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/20", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 120.0 } } } ] } }, { "key_as_string" : "2023-02-07", "key" : 1675728000000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/2", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 30.0 } } } ] } }, { "key_as_string" : "2023-02-08", "key" : 1675814400000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/3", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 20.0 } } } ] } }, { "key_as_string" : "2023-02-09", "key" : 1675900800000, "doc_count" : 6, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/1", "doc_count" : 2, "req_method" : { "doc_count" : 2, "total_times" : { "value" : 260.0 } } }, { "key" : "/api/post/2", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 140.0 } } }, { "key" : "/api/post/20", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 130.0 } } }, { "key" : "/api/post/3", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 120.0 } } }, { "key" : "/api/post/9", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 60.0 } } } ] } }, { "key_as_string" : "2023-02-10", "key" : 1675987200000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/4", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 400.0 } } } ] } }, { "key_as_string" : "2023-02-11", "key" : 1676073600000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/3", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 89.0 } } } ] } }, { "key_as_string" : "2023-02-12", "key" : 1676160000000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/2", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 380.0 } } } ] } }, { "key_as_string" : "2023-02-13", "key" : 1676246400000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/10", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 270.0 } } } ] } }, { "key_as_string" : "2023-02-14", "key" : 1676332800000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/12", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 630.0 } } } ] } }, { "key_as_string" : "2023-02-15", "key" : 1676419200000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/4", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 210.0 } } } ] } }, { "key_as_string" : "2023-02-16", "key" : 1676505600000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/6", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 900.0 } } } ] } }, { "key_as_string" : "2023-02-17", "key" : 1676592000000, "doc_count" : 1, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "/api/post/7", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 870.0 } } } ] } } ] } }

從結果來看,可以看出是按照日期每天進行統計的,除了9號其它的資料都是一條,所以我們之前插了一些9號的資料,我們重點看9號的資料,裡邊的資料是按照請求耗時倒序排序的。

查詢語句看似複雜,其實拆解開很簡單,其實就是將上節講的深層聚合,加個排序,如果還不熟悉的小夥伴,建議上節溫習一下。說一下幾個關鍵詞:

  • date_histogram, 日期表示式,允許我們聚合以時間為單位, 所以calendar_interval就是時間單位,支援分,時,天,周,月,季度,年

json "date_histogram": { "field": "created", "calendar_interval": "1d", "format": "yyyy-MM-dd" },

  • req_method>total_times 這個>大家可以簡單理解為類似css選擇器的>,我們可以通過它將結果進行連結

size結合使用

假設,需求又變動了,嫌資料太多,我只想看到想要的資料,在原有的基礎上,返回每天請求耗時最多的前兩條資料,怎麼做? 這是一個比較常見的需求

我們可以通過指定size,這個其實前幾節都給大家講過,下面一起看下吧~

json GET req_log/_search { "aggs": { "date": { "date_histogram": { "field": "created", "calendar_interval": "1d", "format": "yyyy-MM-dd" }, "aggs": { "req_path": { "terms": { "field": "path", "order": { "req_method>total_times": "desc" }, "size": 2 }, "aggs": { "req_method": { "filter": { "terms": { "method": [ "GET" ] } }, "aggs": { "total_times": { "sum": { "field": "times" } } } } } } } } } }

返回:

json ...省略 { "key_as_string" : "2023-02-09", "key" : 1675900800000, "doc_count" : 6, "req_path" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 3, "buckets" : [ { "key" : "/api/post/1", "doc_count" : 2, "req_method" : { "doc_count" : 2, "total_times" : { "value" : 260.0 } } }, { "key" : "/api/post/2", "doc_count" : 1, "req_method" : { "doc_count" : 1, "total_times" : { "value" : 140.0 } } } ] } } ...省略

可以看出9號的資料只返回了兩條,是不是很簡單~

去重

es中如何進行去重呢? 下面一起看下

cardinality & 去重統計

es聚合中使用cardinality來做去重操作,去重結果可能並不是很準確,但是可以保證極小的記憶體消耗和極高的響應效率

下面看個例子:

json GET req_log/_search { "aggs": { "path_num": { "cardinality": { "field": "path", "precision_threshold": 100 } } } }

結果返回:

json "aggregations" : { "path_num" : { "value" : 11 } }

從結果得出,一共存在11個api

  • precision_threshold 代表的是精度,接受的範圍是0–40,000

percentiles & 百分比統計

有時候,我們需要統計百分比,那麼在es中如何進行操作呢? 可以使用latency_percentiles來進行統計,來看個例子:

json GET req_log/_search { "aggs": { "latency_percentiles": { "percentiles": { "field": "times", "percents": [ 30, 40, 50, 60, 70, 80, 99 ] } } } }

結果:

json "aggregations" : { "latency_percentiles" : { "values" : { "30.0" : 120.0, "40.0" : 133.0, "50.0" : 165.0, "60.0" : 251.99999999999994, "70.0" : 398.0, "80.0" : 873.0, "99.0" : 9000.0 } } }

大家第一眼看到這個結果可能有點懵,我們看到結果返回了我們之前指定的百分比percents裡邊就是指定百分比的,意思大概這樣:

  • 在所有請求中有30%的請求耗時達到了120

其它以此類推,有時候老闆讓我們統計在所有訂單中,願意付費的使用者大概是多少,在付費使用者中,付款金額的指標是多少,是不是就會統計了~

percentile_ranks & 百分比統計(反向)

為什麼說是反向呢?假設,有這麼一個需求,我想統計請求耗時達到80, 120,600的請求大概佔比多少?這也是一個很常見的反向需求,就像平時老闆問你,購買黃金vip,白銀vip,鉑金vip的使用者佔比多少。

接著看剛剛的需求:

json GET req_log/_search { "aggs": { "load": { "percentile_ranks": { "field": "times", "values": [ 80, 120, 600 ] } } } }

返回:

json ... "aggregations" : { "load" : { "values" : { "80.0" : 18.181818181818183, "120.0" : 31.818181818181817, "600.0" : 74.37137330754351 } } } ....

從結果來看,請求耗時達到80的佔比18%其它依次類推,對比剛剛的percentiles是不是它的百分比在後邊,為了方便理解,所以叫反向

結束語

本節到此就結束了,大家一定要學會舉一反三,可以給自己出一些常見的場景需求,結合前面學的內容鞏固一下,不用去背查詢語句,理解了就可以。

下節我們就正式進入SpringBoot框架整合ES的相關內容~

本著把自己知道的都告訴大家,如果本文對您有所幫助,點贊+關注鼓勵一下唄~

相關文章

專案原始碼(原始碼已更新 歡迎star⭐️)

往期併發程式設計內容推薦

部落格(閱讀體驗較佳)

推薦 SpringBoot & SpringCloud (原始碼已更新 歡迎star⭐️)