Elasticsearch學習系列四(聚合搜尋)

語言: CN / TW / HK

聚合分析

聚合分析是資料庫中重要的功能特性,完成對一個查詢的集中資料的聚合計算。如:最大值、最小值、求和、平均值等等。對一個數據集求和,算最大最小值等等,在ES中稱為指標聚合,而對資料做類似關係型資料庫那樣的分組(group by),在ES中稱為分桶。

語法:

aggregations" : {
  "<aggregation_name>" : { <!--聚合的名字 -->
    "<aggregation_type>" : { <!--聚合的型別 -->
       <aggregation_body> <!--聚合體:對哪些欄位進行聚合 -->
    }
    [,"meta" : { [<meta_data_body>] } ]? <!--元 -->
    [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合裡面在定義子聚合 -->

 }
 [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
}

aggregations可以簡寫為aggs。

指標聚合

示例1:查詢所有商品裡最貴的價格

size就填0就行。

POST /item/_search
{
  "size":0,
  "aggs": {
    "max_price": {
      "max": {
        "field": "price"
      }
    }
  }
}

示例2:文件計數

POST /item/_count
{
  "query": {
    "range": {
      "price": {
        "gte": 10,
        "lte": 5000
      }
    }
  }
}

示例3:統計某欄位有值的文件數

POST /item/_search?size=0
{
  "aggs": {
    "price_count": {
      "value_count": {
        "field": "price"
      }
    }
  }
}

示例4:用cardinality值去重計數

如果有price重複的,就只會統計去重後的數量

POST /item/_search?size=0
{
  "aggs":{
    "price_count":{
      "cardinality": {
        "field": "price"
      }
    }
  }
}

示例5:stats統計count、max、min、avg、sum5個值

POST /item/_search?size=0
{
  "aggs":{
    "price_stats":{
      "stats": {
        "field": "price"
      }
    }
  }
}

結果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "price_stats" : {
      "count" : 5,
      "min" : 2333.0,
      "max" : 6888.0,
      "avg" : 4059.2,
      "sum" : 20296.0
    }
  }
}

示例6:extended stats,stats的增強版,增加了平方和、方差、標準差、平均值加/減兩個標準差的區間。

POST /item/_search?size=0
{
  "aggs":{
    "price_stats":{
      "extended_stats": {
        "field": "price"
      }
    }
  }
}

查詢結果:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "price_stats" : {
      "count" : 5,
      "min" : 2333.0,
      "max" : 6888.0,
      "avg" : 4059.2,
      "sum" : 20296.0,
      "sum_of_squares" : 9.9816722E7,
      "variance" : 3486239.7599999993,
      "std_deviation" : 1867.1474928349928,
      "std_deviation_bounds" : {
        "upper" : 7793.494985669986,
        "lower" : 324.9050143300142
      }
    }
  }
}

示例7:Percentiles 佔比百分位對應的值統計

POST /item/_search?size=0
{
  "aggs":{
    "price_percents":{
      "percentiles": {
        "field": "price"
        
      }
    }
  }
}

#指定分位值
POST /item/_search?size=0
{
  "aggs":{
    "price_percents":{
      "percentiles": {
        "field": "price",
        "percents": [
          1,
          5,
          25,
          50,
          75,
          95,
          99
        ]
      }
    }
  }
}

查詢結果:

......
  "aggregations" : {
    "price_percents" : {
      "values" : {
        "1.0" : 2333.0000000000005,
        "5.0" : 2333.0,
        "25.0" : 2599.25,
        "50.0" : 2688.0,
        "75.0" : 5996.25,
        "95.0" : 6888.0,
        "99.0" : 6888.0
      }
    }
  }
}

Percentiles rank 統計值小於等於指定值的文件佔比

price小於3000和5000的佔比

POST /item/_search?size=0
{
  "aggs":{
    "price_percents":{
      "percentile_ranks": {
        "field": "price"
        , "values": [3000,5000]
      }
    }
  }
}

桶聚合

他執行的是對文件分組的操作,把滿足相關特性的文件分到一個桶裡,即桶分。輸出結果往往是一個個包含多個文件的桶。

示例1:分組求平均值

POST /item/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 50,
            "to": 100
          },
          {
            "from": 2000,
            "to": 3000
          },
          {
            "from": 3000,
            "to": 5000
          }
        ]
      },
      "aggs": {
        "average_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

查詢結果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_price" : {
      "buckets" : [
        {
          "key" : "50.0-100.0",
          "from" : 50.0,
          "to" : 100.0,
          "doc_count" : 0,
          "average_price" : {
            "value" : null
          }
        },
        {
          "key" : "2000.0-3000.0",
          "from" : 2000.0,
          "to" : 3000.0,
          "doc_count" : 3,
          "average_price" : {
            "value" : 2569.6666666666665
          }
        },
        {
          "key" : "3000.0-7000.0",
          "from" : 3000.0,
          "to" : 7000.0,
          "doc_count" : 2,
          "average_price" : {
            "value" : 6293.5
          }
        }
      ]
    }
  }
}

示例2:分組的文件個數統計

POST /item/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 50,
            "to": 100
          },
          {
            "from": 2000,
            "to": 3000
          },
          {
            "from": 3000,
            "to": 7000
          }
        ]
      },
      "aggs": {
        "average_price": {
          "value_count": {
            "field": "price"
          }
        }
      }
    }
  }
}

示例3:使用having語法

POST /item/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 50,
            "to": 100
          },
          {
            "from": 2000,
            "to": 3000
          },
          {
            "from": 3000,
            "to": 7000
          }
        ]
      },
      "aggs": {
        "average_price": {
          "avg": {
            "field": "price"
          }
        },
        "having":{
          "bucket_selector": {
            "buckets_path": {
              "avg_price":"average_price"
            },
            "script": {
              "source": "params.avg_price >=2600"
            }
          }
        }
      }
  
    }
  }
}