在我之前的文章 “Elasticsearch：一些有趣的資料型別”，我已經介紹了一下很有趣的資料型別。在今天的文章中，我再進一步介紹一下高階的資料型別，雖然這裡的資料型別可能和之前的一些資料型別有所重複。即便如此，我希望能從另外的一個方面來描述這些資料型別。希望大家能在自己的應用中熟練地運用這些資料型別。

Geopoint（geo_point）資料型別

我們中的大多數人可能在聖誕節期間使用過智慧裝置來查詢最近的餐館的位置，或者詢問過 GPS 導航到我們奶奶家的方向。 Elasticsearch 開發了一種專門的資料型別 geo_point 用於捕獲地點的位置。

位置資料表示為 geo_point 資料型別，表示經度和緯度。我們可以使用它來確定餐廳、學校、高爾夫球場等的地址。

下面顯示的程式碼清單演示了名為 restaurants 的索引的模式定義。它保護帶有名稱和地址的餐館。值得注意的是，地址欄位被定義為 geo_point 資料型別：

```

A restaurants index with address declared as geo_point
PUT restaurants
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"address": {
"type": "geo_point"
}
}
}
}

```

現在我們有了一個索引，讓我們索引一個示例餐廳（位於倫敦的虛構 Sticky Fingers），其位置以經度和緯度形式提供（列表如下）：

```

Indexing a restaurant - the location is provided as lon and lat
PUT restaurants/_doc/1
{
"name": "Sticky Fingers",
"address": {
"lon": "0.1278",
"lat": "51.5074"
}
}

```

在上面的程式碼片段中，餐廳的地址以經度 (lon) 和緯度 (lat) 對的形式提供。還有其他方法可以提供這些輸入，我們稍後會介紹。上面的位置使用 Elastic Maps 可以顯示如下：

我們無法搜尋和獲取周邊位置內的餐廳。我們可以使用 geo_bounding_box 查詢來搜尋涉及地理地址的資料。它需要輸入 top_left 和 bottom_right 點來圍繞我們的興趣點建立一個框起來的區域，如下圖所示

我們使用 lon（經度）和 lat（緯度）對（地址位置指向倫敦）為該查詢提供上限和下限。我們編寫 geo_bounding_box 查詢以矩形的形式提供地址，其中 top_left 和 bottom_right 座標以緯度和經度提供，如下面的清單所示

```

Listing to Fetch the restaurants around a geographical location
GET restaurants/_search?filter_path=**.hits
{
"query": {
"geo_bounding_box": {
"address": {
"top_left": {
"lon": "0",
"lat": "52"
},
"bottom_right": {
"lon": "1",
"lat": "50"
}
}
}
}
}

```

此查詢獲取我們的餐廳，因為地理邊界框包含我們的餐廳：

```

{
"hits": {
"hits": [
{
"_index": "restaurants",
"_id": "1",
"_score": 1,
"_source": {
"name": "Sticky Fingers",
"address": {
"lon": "0.1278",
"lat": "51.5074"
}
}
}
]
}
}

```

正如我之前提到的，我們可以提供各種格式的位置資訊，而不僅僅是緯度和經度：例如，陣列或字串。下表提供了建立位置資料的方法和示例：

更多關於 geo_point 的搜尋，請參考：

Elastic：開發者上手指南中的 “Maps” 章節
開始使用 Elasticsearch （2）

物件資料型別 - object data type

我們經常以分層方式查詢資料，例如一封電子郵件，其中包含頂級欄位，如 subject、to 和 from 欄位以及用於儲存附件的內部物件，如下面的程式碼片段所示：

```

"to:":"[email protected]",
"subject":"Testing Object Type",
"attachments":{
"filename":"file1.txt",
"filetype":"confidential"
}

```

JSON 允許我們建立這樣的分層物件：一個包含在其他物件中的物件。為了表示這種物件層次結構，Elasticsearch 有一種特殊的資料型別來表示物件的層次結構 —— 物件型別。在上面的示例中，由於附件（attachments）包含其他屬性，我們將其歸類為物件本身，因此屬於物件型別。 attachments 物件中的 filename 和 filetype 這兩個屬性可以分別建模為 text 和 text 欄位。有了這些資訊，我們就可以建立一個對映定義，如下面的清單所示：

`` ![圖片轉存失敗，建議將圖片儲存下來直接上傳

Defining the attatchments as object type. Though we can set the type as object speficially, Elasticsearch is clever enough to deduce it as an object type when it sees hierarchical data sets. Hence we can omit declaring the object type
PUT emails
{
"mappings": {
"properties": {
"to": {
"type": "text"
},
"subject": {
"type": "text"
},
"attachments": {
"type":"object",
"properties": {
"filename": {
"type": "text"
},
"filetype": {
"type": "text"
}
}
}
}
}
}

` (http://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png) ]()```

attachments 欄位是一個物件，因為它封裝了其他兩個欄位。雖然我們已經明確提到型別是 object，但 Elasticsearch 並不期望我們這樣做。每當遇到具有分層資料的欄位時，它將欄位的資料型別設定為 object。我們其實甚至可以省去 "type": "object" 這一行。

模式成功執行後，我們可以通過呼叫 GET emails/_mapping 命令來檢索它（清單如下）：

GET emails/_mapping

```

The mapping schema for emails
The attachments type is not listed (inferred as object by Elasticsearch!)
{
"emails" : {
"mappings" : {
"properties" : {
"attachments" : {
"properties" : {
"filename" : {"type" : "text"},
"filetype" : {"type" : "text"}
}
},
"subject" : {"type" : "text"},
"to" : {"type" : "text"}
}
}
}
}

```

雖然所有其他欄位都顯示其關聯的資料型別，但 attachments 不會。內部物件的物件型別由 Elasticsearch 預設推斷。讓我們索引一個電子郵件文件，下面給出的清單顯示了查詢：

```

Indexing an email document
PUT emails/_doc/1
{
"to:": "[email protected]",
"subject": "Testing Object Type",
"attachments": {
"filename": "file1.txt",
"filetype": "confidential"
}
}

```

現在我們已經用文件準備好我們的電子郵件索引，我們可以在內部物件欄位上發出匹配搜尋查詢（我們將在接下來的文章中瞭解搜尋查詢）以獲取相關文件（並證明我們的觀點），如清單如下：

```

Searching for an email based on the attachment name
GET emails/_search?filter_path=**.hits
{
"query": {
"match": {
"attachments.filename": "file1.txt"
}
}
}

```

上面命令的響應為：

```

{
"hits": {
"hits": [
{
"_index": "emails",
"_id": "1",
"_score": 0.5753642,
"_source": {
"to:": "[email protected]",
"subject": "Testing Object Type",
"attachments": {
"filename": "file1.txt",
"filetype": "confidential"
}
}
}
]
}
}

```

這將從我們的 Elasticsearch 返回文件，因為 filename 與我們在 Elasticsearch 中的文件相匹配。

雖然物件型別非常簡單，但它們有一個侷限性：內部物件被扁平化並且不儲存為單獨的文件。此操作的缺點是從陣列索引的物件之間的關係丟失。好訊息是我們有另一種稱為巢狀（nested）資料的資料型別來解決這個問題。

遺憾的是，由於篇幅所限，我無法在此處介紹物件的侷限性 —— 你可以在 “Elasticsearch: object 及 nested 資料型別” 做更進一步的閱讀。

巢狀資料型別 - nested data type

巢狀資料型別是物件型別的特殊形式，其中維護文件中物件陣列之間的關係。

以我們的電子郵件和附件為例，這次讓我們將附件欄位定義為 nested 資料型別，而不是讓 Elasticsearch 將其派生為物件型別。這需要通過將附件欄位宣告為 nested 資料型別來建立模式。該模式顯示在下面給出的清單中：

```

Creating the attachments field as nested datatype
PUT emails_nested
{
"mappings": {
"properties": {
"attachments": {
"type": "nested",
"properties": {
"filename": {
"type": "keyword"
},
"filetype": {
"type": "text"
}
}
}
}
}
}

```

我們已經建立了一個模式（schema）定義，所以我們需要做的就是索引一個文件。下面給出的清單正是這樣做的：

```

Indexing a document with attachments
PUT emails_nested/_doc/1
{
"attachments": [
{
"filename": "file1.txt",
"filetype": "confidential"
},
{
"filename": "file2.txt",
"filetype": "private"
}
]
}

```

一旦該文件被成功索引，拼圖的最後一塊就是搜尋。下面的清單將演示為獲取文件而編寫的搜尋查詢——標準是帶有 file1.txt 附件的電子郵件和 private 分別作為檔名及其分類型別。這種組合不存在，因此結果必須為空，這與交叉搜尋資料的 object 不同。

```

This query shoulnd't return resutls as we don't have file name as "file1.txt" and type as "private" data (look at the document above)
GET emails_nested/_search
{
"query": {
"nested": {
"path": "attachments",
"query": {
"bool": {
"must": [
{
"match": {
"attachments.filename": "file1.txt"
}
},
{
"match": {
"attachments.filetype": "private"
}
}
]
}
}
}
}
}

```

上面清單中的查詢正在搜尋一個名為 file1.txt 的檔案，該檔案具有不存在的 private 分類（檢視我們之前索引的文件）。此查詢沒有返回任何文件，這正是我們所期望的。 file1.txt 的分類是 confidential 的而不是 private 的，因此它不匹配。因此，當 nested 型別表示內部物件陣列時，單個物件將作為隱藏文件進行儲存和索引。

Nested 資料型別非常擅長尊重關聯和關係，因此如果我們需要建立一個物件陣列，其中每個物件都必須被視為一個單獨的物件，nested 資料型別將成為我們的朋友。

更多閱讀，請參閱 “Elasticsearch: object 及 nested 資料型別”。

沒有陣列型別

當我們談到陣列時，有趣的是，Elasticsearch 中沒有陣列資料型別。但是，我們可以為任何欄位設定多個值，從而將欄位表示為一個數組。例如，具有一個 name 欄位的文件可以從單個值更改為陣列："name": "John Doe" 到 "name": ["John Smith", "John Doe"] 只需新增一個列表資料值到欄位。建立陣列時必須考慮一個重點：不能將陣列與各種型別混在一起。例如，你不能像這樣宣告 name 欄位："name": ["John Smith", 13, "Neverland"]。這是非法的，因為該欄位由多種型別組成，是不允許的。

Flattened（扁平化）資料型別

到目前為止，我們已經研究了對從 JSON 文件解析的各個欄位建立索引。在分析和儲存時，每個欄位都被視為一個單獨且獨立的欄位。然而，有時我們可能不需要將所有子欄位作為單獨的欄位進行索引，從而讓它們通過分析過程。想一想聊天系統上的聊天訊息流、現場足球比賽中的評論、醫生記錄病人的病痛等等。我們可以將這種資料作為一個大 blob 載入，而不是顯式宣告每個欄位（或動態派生）。 Elasticsearch 為此提供了一種稱為扁平化的特殊資料型別。

flattened 資料型別以一個或多個子欄位的形式儲存資訊，每個子欄位的值作為關鍵字索引。也就是說，沒有一個值被視為文字欄位，因此不經過文字分析過程。更多關分析方面的知識，請參考文章 “Elasticsearch: analyzer”。

讓我們考慮一個醫生在諮詢期間記錄他/她的病人的執行筆記的例子。該對映由兩個欄位組成：患者 name 和 doctor_notes - doctor_notes 欄位被宣告為 flattened 型別。下面給出的清單提供了對映：

```

Listing for Creating a mapping with flattened data type
PUT consultations
{
"mappings": {
"properties": {
"patient_name": {
"type": "text"
},
"doctor_notes": {
"type": "flattened"
}
}
}
}

```

任何宣告為 flattened 的欄位（及其子欄位）都不會被分析。即所有的值都被索引為 keyword。讓我們建立一個患者諮詢文件（在下面列出）併為其編制索引：

```

The consultation document with doctor’s notes
PUT consultations/_doc/1
{
"patient_name": "John Doe",
"doctor_notes": {
"temperature": 103,
"symptoms": [
"chills",
"fever",
"headache"
],
"history": "none",
"medication": [
"Antibiotics",
"Paracetamol"
]
}
}

```

如你所見，doctor_notes 包含大量資訊，但請記住我們並未在對映定義中建立這些內部欄位。由於doctor_notes是一個 flattened 的型別，所以所有的值都被索引為 keyword。

最後，我們使用醫生筆記中的任何關鍵字搜尋索引，如下所示：

```

Searching for patients prescribed with paracetomol
GET consultations/_search?filter_path=**.hits
{
"query": {
"match": {
"doctor_notes": "Paracetamol"
}
}
}

```

上面命令返回的結果為：

```

{
"hits": {
"hits": [
{
"_index": "consultations",
"_id": "1",
"_score": 0.44303042,
"_source": {
"patient_name": "John Doe",
"doctor_notes": {
"temperature": 103,
"symptoms": [
"chills",
"fever",
"headache"
],
"history": "none",
"medication": [
"Antibiotics",
"Paracetamol"
]
}
}
}
]
}
}

```

搜尋 Paracetamol 將返回我們的 John Doe 的諮詢檔案。你可以通過將匹配查詢更改為任何欄位來進行試驗，例如："doctor_notes": "chills" 或者甚至編寫如下所示的複雜查詢：

```

An advanced query to fetch patients based on multiple search criteria
Search for non-diabetic patients with headache and prescribed with antibiotcs
GET consultations/_search?filter_path=**.hits
{
"query": {
"bool": {
"must": [{"match": {"doctor_notes": "headache"}},
{"match": {"doctor_notes": "Antibiotics"}}],
"must_not": [{"term": {"doctor_notes": {"value": "diabetics"}}}]
}
}
}

```

在查詢中，我們檢查 headaches （頭痛）和 antibiotics（抗生素），但患者不應該患有 diabetic（糖尿病）—— 查詢返回 John Doe，因為他沒有糖尿病但有頭痛並且正在服用抗生素（快點好起來，Doe！）。

Flattened 的資料型別會派上用場，尤其是當我們期望有很多臨時的欄位並且必須事先為所有欄位定義對映定義是不可行的時候。請注意，flattened 欄位的子欄位始終是 keyword 型別。

更多關於 flattened 資料型別的內容，請閱讀文章 “Elasticsearch：Flattened 資料型別對映”。

Join 資料型別

如果你來自關係資料庫世界，你就會知道資料之間的關係 —— joins —— 支援父子關係。然而，在 Elasticsearch 中，每個被索引的文件都是獨立的，並且與該索引中的任何其他文件都沒有關係。 Elasticsearch 對資料進行反規範化，以在索引和搜尋操作期間提高速度和效能。 Elasticsearch 提供了一個 join 資料型別來考慮我們需要的父子關係。

考慮一個醫患（一對多）關係的例子：一個醫生可以有多個病人，每個病人被分配給一個醫生。

讓我們建立一個 doctors 索引，其中包含一個包含關係定義的模式。要使用 join 資料型別處理父子關係，我們需要

建立一個 join 型別的欄位和
通過提及 relations 的關係物件新增附加資訊（例如，當前上下文中的醫患關係）

如下命令準備具有模式定義的 doctors 索引：

```

Creating an indx with join datatype - make sure you create a field with the name "relations"
PUT doctors
{
"mappings": {
"properties": {
"relationship": {
"type": "join",
"relations": {
"doctor": "patient"
}
}
}
}
}

```

一旦我們準備好模式並建立索引，我們就會索引兩種型別的文件：一種代表 doctor（父），另一種代表 patient（子）。這是醫生的檔案，其中提到了作為醫生的關係：

```

Indexing a doctor - make sure the relationship field is set to doctor type
PUT doctors/_doc/1
{
"name": "Dr Mary Montgomery",
"relationship": {
"name": "doctor"
}
}

```

上面程式碼片段中值得注意的一點是關係物件將文件型別宣告為 doctor。 name 屬性必須是在 relations 標記下的對映模式中宣告的父值 (doctor)。一旦我們的住院醫生 Mary Montgomery 醫生準備就緒，下一步就是讓兩名患者與她聯絡。以下查詢（下面列出）執行此操作：

```

Listing for Creating two patients for our doctor
PUT doctors/_doc/2?routing=mary
{
"name": "John Doe",
"relationship": {
"name": "patient",
"parent": 1
}
}
PUT doctors/_doc/3?routing=mary
{
"name": "Mrs Doe",
"relationship": {
"name": "patient",
"parent": 1
}
}

```

關係物件的值應設定為 patient（還記得模式中關係屬性的父子部分嗎？）並且應該為父物件分配關聯醫生的文件識別符號（在我們的示例中為 ID 1）。

在處理父子關係時，我們還需要了解一件事。父母和相關的孩子將被索引到同一個分片中，以避免多分片搜尋開銷。由於文件應該共存，我們需要在 URL 中使用強制 routing 引數。路由是一個函式，可以確定文件所在的分片。

最後，是時候搜尋屬於 ID 為 1 的醫生的患者了。下面列表中的查詢搜尋與 Montgomery 醫生相關的所有患者：

```

Searching for all patients of Dr Montgomery
GET doctors/_search?filter_path=**.hits
{
"query": {
"parent_id": {
"type": "patient",
"id": 1
}
}
}

```

上面的響應為：

```

{
"hits": {
"hits": [
{
"_index": "doctors",
"_id": "2",
"_score": 0.10536051,
"_routing": "mary",
"_source": {
"name": "John Doe",
"relationship": {
"name": "patient",
"parent": 1
}
}
},
{
"_index": "doctors",
"_id": "3",
"_score": 0.10536051,
"_routing": "mary",
"_source": {
"name": "Mrs Doe",
"relationship": {
"name": "patient",
"parent": 1
}
}
}
]
}
}

```

當我們希望獲取屬於醫生的患者時，我們使用一個名為 parent_id 的搜尋查詢，該查詢需要子型別（患者）和父 ID（Montgomery 醫生文件 ID 為 1）。該查詢將返回 Montgomery 醫生的患者 —— Doe 先生和夫人。

使用 join 資料型別並不是直截了當的，因為我們要求非關係資料儲存引擎處理關係 —— 有點要求太多，所以只有在你必須的時候才使用 join 資料型別。

在 Elasticsearch 中實現父子關係會對效能產生影響。如果你正在考慮文件關係，Elasticsearch 可能不是合適的工具，因此請謹慎使用此功能。

更多閱讀，請參閱

Search as you type 資料型別

當我們在搜尋欄中鍵入時，大多數搜尋引擎會建議單詞和短語。這個功能有幾個名字 —— 通常有幾個名字：搜尋即輸入或預先輸入或自動完成或建議。 Elasticsearch 提供了一種方便的資料型別 —— search_as_you_type —— 來支援這個特性。在幕後，Elasticsearch 非常努力地確保標記為 search_as_you_type 的欄位被索引以生成 n-gram，我們將在本節中看到它的實際應用。

n-gram 是給定大小的單詞序列。例如，如果單詞是 “action”，則 3-ngram（大小為 3 的 ngram）是：["act", "cti","tio","ion"] 和 bi-grams（大小為 2）是： [“ac”、“ct”、“ti”、“io”、“on”] 等。

另一方面，edge n-gram 是每個單詞的 n-gram，其中 n-gram 的開頭錨定到單詞的開頭。以 “action” 這個詞為例，邊 n-gram 產生：["a","ac","act","acti","actio","action"]。

另一方面，Shingles 是單詞 n-gram。例如 “Elasticsearch in Action” 這句話會輸出：["Elasticsearch", "Elasticsearch in", "Elasticsearch in Action", "in", "in Action", "Action"]

更多閱讀：Elasticsearch: Ngrams, edge ngrams, and shingles

比如說，我們被要求支援對 books 索引的預輸入查詢，即，當用戶開始在搜尋欄中逐字輸入書名時，我們應該能夠根據他/她輸入的字母推薦這本書 .

首先，我們需要建立一個模式，其中所討論的欄位是 search_as_you_type 資料型別。下面的列表提供了這個對映模式：

```

Mapping schema for technical books with the title defined as search_as_you_type datatype
PUT tech_books
{
"mappings": {
"properties": {
"title": {
"type": "search_as_you_type"
}
}
}
}

```

我們現在索引幾本書：

```

Indexing few documents
PUT tech_books/_doc/1
{
"title": "I love Elasticsearch technology"
}
PUT tech_books/_doc/2
{
"title":"Elasticsearch is the most popular search engine in the world"
}
PUT tech_books/_doc/3
{
"title":"Elastic is the company behind Elasticsearch"
}

```

由於 title 欄位的型別是 search_as_you_type 資料型別，因此 Elasticsearch 在根欄位（title）之外建立了一組稱為 n-gram 的子欄位，如下表所示：

顯示引擎自動建立的子欄位的表格

由於這些欄位是為我們額外建立的，因此在該欄位上搜索有望返回預輸入建議，因為 n-gram 有助於有效地生成它們。

讓我們建立搜尋查詢，如下面的清單所示。

```

Searching in a search_as_you_type field and its subfields
GET tech_books/_search?filter_path=**.hits
{
"query": {
"multi_match": {
"query": "elas",
"type": "bool_prefix",
"fields": [
"title",
"title._2gram",
"title._3gram"
]
}
}
}

```

上面命令顯示的結果是：

```

{
"hits": {
"hits": [
{
"_index": "tech_books",
"_id": "1",
"_score": 1,
"_source": {
"title": "Elasticsearch in Action"
}
},
{
"_index": "tech_books",
"_id": "3",
"_score": 1,
"_source": {
"title": "Elastic Stack in Action"
}
}
]
}
}

```

此查詢應返回所有的 3 個文件。我們使用多重匹配查詢，因為我們正在跨多個欄位搜尋一個值 —— title、title._2gram、title._3gram、title._index_prefix。

```

{
"hits": {
"hits": [
{
"_index": "tech_books",
"_id": "1",
"_score": 1,
"_source": {
"title": "I love Elasticsearch technology"
}
},
{
"_index": "tech_books",
"_id": "2",
"_score": 1,
"_source": {
"title": "Elasticsearch is the most popular search engine in the world"
}
},
{
"_index": "tech_books",
"_id": "3",
"_score": 1,
"_source": {
"title": "Elastic is the company behind Elasticsearch"
}
}
]
}
}

```

更多關於 search_as_you_type 的介紹，請參閱文章 “Elasticsearch：使用 search_analyzer 及 edge ngram 來實現 Search-As-You-Type”。

總結

在本文中，我們學習了高階資料型別，如 object、nested、flattened 以及其他如 geo_point 和 search_as_you_type。有關其他資料型別的更多詳細資訊以及深入的討論和程式碼示例，請詳細參閱 “Elastic：開發者上手指南”。

Elasticsearch：高階資料型別介紹

Geopoint（geo_point）資料型別

A restaurants index with address declared as geo_point

Indexing a restaurant - the location is provided as lon and lat

Listing to Fetch the restaurants around a geographical location

物件資料型別 - object data type

Defining the attatchments as object type. Though we can set the type as object speficially, Elasticsearch is clever enough to deduce it as an object type when it sees hierarchical data sets. Hence we can omit declaring the object type

The mapping schema for emails

The attachments type is not listed (inferred as object by Elasticsearch!)

Indexing an email document

Searching for an email based on the attachment name

巢狀資料型別 - nested data type

Creating the attachments field as nested datatype

Indexing a document with attachments

This query shoulnd't return resutls as we don't have file name as "file1.txt" and type as "private" data (look at the document above)

沒有陣列型別

Flattened（扁平化）資料型別

Listing for Creating a mapping with flattened data type

The consultation document with doctor’s notes

Searching for patients prescribed with paracetomol

An advanced query to fetch patients based on multiple search criteria

Search for non-diabetic patients with headache and prescribed with antibiotcs

Join 資料型別

Creating an indx with join datatype - make sure you create a field with the name "relations"

Indexing a doctor - make sure the relationship field is set to doctor type

Listing for Creating two patients for our doctor

Searching for all patients of Dr Montgomery

Search as you type 資料型別

Mapping schema for technical books with the title defined as search_as_you_type datatype

Indexing few documents

Searching in a search_as_you_type field and its subfields

總結