ES聚合的几种大类分析|Annabelle's Blog

聚合的几种大类分析

The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data.

聚合框架帮助提供基于搜索查询的聚合数据。它基于称为聚合的简单构件，可以组合起来，以构建复杂的数据摘要

An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request).

聚合可以被看作是一种工作单元，它可以在一组文档中构建分析信息。执行的上下文定义了这个文档集是什么（例如，在搜索请求的执行查询/过滤器的上下文中执行顶级聚合）。

There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to break them into four main families:

有许多不同类型的聚合，每一种都有自己的目的和输出。为了更好地理解这些类型，通常更容易将它们分为四个主要的类型：

Bucketing (桶)

分桶类型

A family of aggregations that build buckets, where each bucket is associated with a key and a document criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in the context and when a criterion matches, the document is considered to "fall in" the relevant bucket. By the end of the aggregation process, we’ll end up with a list of buckets - each one with a set of documents that "belong" to it.

一组构建桶的集合，其中每个桶与一个键和一个文档标准相关联。在执行聚合时，所有木桶的标准都在上下文中的每个文档上进行评估，当一个标准匹配时，文档被认为是“落在”相关的桶中。在聚合过程的最后，我们将得到一个桶的列表——每个桶都有一组“属于”的文档。

解释

一个桶就是满足特定条件的一个文档集合：

一名员工要么属于男性桶，或者女性桶。城市Albany属于New York州这个桶。日期2014-10-28属于十月份这个桶。随着聚合被执行，每份文档中的值会被计算来决定它们是否匹配了桶的条件。如果匹配成功，那么该文档会被置入该桶中，同时聚合会继续执行。

桶也能够嵌套在其它桶中，能让你完成层次或者条件划分这些需求。比如，Cincinnati可以被放置在Ohio州这个桶中，而整个Ohio州则能够被放置在美国这个桶中。

ES中有很多类型的桶，让你可以将文档通过多种方式进行划分(按小时，按最流行的词条，按年龄区间，按地理位置，以及更多)。但是从根本上，它们都根据相同的原理运作：按照条件对文档进行划分。

Metric (指标)

指标分析类型，如计算最大值，最小值，平均值等等

Aggregations that keep track and compute metrics over a set of documents.

在一组文档中跟踪和计算度量的聚合。

解释

桶能够让我们对文档进行有意义的划分，但是最终我们还是需要对每个桶中的文档进行某种指标计算。分桶是达到最终目的的手段：提供了对文档进行划分的方法，从而让你能够计算需要的指标。

多数指标仅仅是简单的数学运算(比如，min，mean，max以及sum)，它们使用文档中的值进行计算。在实际应用中，指标能够让你计算例如平均薪资，最高出售价格，或者百分之95的查询延迟

Matrix (矩阵)

管道分析类型，可以进行基于上一级的聚合分析结果进行再次分析，比如说Metric计算出了每天平均订单金额，那么还可以使用Pipeline去计算这个平均订单金额的最大值。

A family of aggregations that operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.

一组在多个字段上操作并根据从所请求的文档字段中提取的值生成矩阵结果的聚合体。与度量和bucket聚合不同，这个聚合家族还不支持脚本编制。

Pipeline

Aggregations that aggregate the output of other aggregations and their associated metrics

聚集了其他聚合及其相关指标的聚合