ElasticSearch
Mapping
Aggregate
Stores pre-aggregated numeric values for metric aggregations. An aggregate_metric_double field is an object containing one or more of the following metric sub-fields: min, max, sum, and value_count.
When you run certain metric aggregations on an aggregate_metric_double field, the aggregation uses the related sub-field’s values. For example, a min aggregation on an aggregate_metric_double field returns the minimum value of all min sub-fields.
An
aggregate_metric_doublefield stores a single numeric doc value for each metric sub-field. Array values are not supported.min,max, andsumvalues aredoublenumbers.value_countis a positivelongnumber.
Alias
Arrays
In Elasticsearch, there is no dedicated array data type. Any field can contain zero or more values by default, however, all values in the array must be of the same data type. For instance:
- an array of strings: [
"one","two"] - an array of integers: [
1,2] - an array of arrays: [
1, [2,3]] which is the equivalent of [1,2,3] - an array of objects: [
{ "name": "Mary", "age": 12 },{ "name": "John", "age": 10 }]
Arrays of objects
Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the [nested](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) data type instead of the [object](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/object.html) data type.
This is explained in more detail in [Nested](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/nested.html).
When adding a field dynamically, the first value in the array determines the field type. All subsequent values must be of the same data type or it must at least be possible to [coerce](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/coerce.html) subsequent values to the same data type.
Arrays with a mixture of data types are not supported: [ 10, "some string" ]
An array may contain null values, which are either replaced by the configured [null_value](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html) or skipped entirely. An empty array [] is treated as a missing field — a field with no values.
Nothing needs to be pre-configured in order to use arrays in documents, they are supported out of the box:
PUT my-index-000001/_doc/1
{
"message": "some arrays in this document...",
"tags": [ "elasticsearch", "wow" ],
"lists": [
{
"name": "prog_list",
"description": "programming list"
},
{
"name": "cool_list",
"description": "cool stuff list"
}
]
}
PUT my-index-000001/_doc/2
{
"message": "no arrays in this document...",
"tags": "elasticsearch",
"lists": {
"name": "prog_list",
"description": "programming list"
}
}
GET my-index-000001/_search
{
"query": {
"match": {
"tags": "elasticsearch"
}
}
}
The tags field is dynamically added as a string field. |
|
|---|---|
The lists field is dynamically added as an object field. |
|
| The second document contains no arrays, but can be indexed into the same fields. | |
The query looks for elasticsearch in the tags field, and matches both documents. |
Multi-value fields and the inverted index
The fact that all field types support multi-value fields out of the box is a consequence of the origins of Lucene. Lucene was designed to be a full text search engine. In order to be able to search for individual words within a big block of text, Lucene tokenizes the text into individual terms, and adds each term to the inverted index separately.
This means that even a simple text field must be able to support multiple values by default. When other data types were added, such as numbers and dates, they used the same data structure as strings, and so got multi-values for free.
Binary
Boolean
Boolean fields accept JSON true and false values, but can also accept strings which are interpreted as either true or false:
| False values | false, "false", "" (empty string) |
|---|---|
| True values | true, "true" |
For example:
PUT my-index-000001
{
"mappings": {
"properties": {
"is_published": {
"type": "boolean"
}
}
}
}
POST my-index-000001/_doc/1?refresh
{
"is_published": "true"
}
GET my-index-000001/_search
{
"query": {
"term": {
"is_published": true
}
}
}
Indexing a document with "true", which is interpreted as true. |
|
|---|---|
Searching for documents with a JSON true. |
Aggregations like the terms aggregation use 1 and 0 for the key, and the strings "true" and "false" for the key_as_string. Boolean fields when used in scripts, return true and false:
POST my-index-000001/_doc/1?refresh
{
"is_published": true
}
POST my-index-000001/_doc/2?refresh
{
"is_published": false
}
GET my-index-000001/_search
{
"aggs": {
"publish_state": {
"terms": {
"field": "is_published"
}
}
},
"sort": [ "is_published" ],
"fields": [
{"field": "weight"}
],
"runtime_mappings": {
"weight": {
"type": "long",
"script": "emit(doc['is_published'].value ? 10 : 0)"
}
}
}
Parameters for boolean fields
The following parameters are accepted by boolean fields:
boost |
Mapping field-level query time boosting. Accepts a floating point number, defaults to 1.0. |
|---|---|
doc_values |
Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting? Accepts true (default) or false. |
index |
Should the field be searchable? Accepts true (default) and false. |
null_value |
Accepts any of the true or false values listed above. The value is substituted for any explicit null values. Defaults to null, which means the field is treated as missing. Note that this cannot be set if the script parameter is used. |
on_script_error |
Defines what to do if the script defined by the script parameter throws an error at indexing time. Accepts fail (default), which will cause the entire document to be rejected, and continue, which will register the field in the document’s _ignored metadata field and continue indexing. This parameter can only be set if the script field is also set. |
script |
If this parameter is set, then the field will index values generated by this script, rather than reading the values directly from the source. If a value is set for this field on the input document, then the document will be rejected with an error. Scripts are in the same format as their runtime equivalent. |
store |
Whether the field value should be stored and retrievable separately from the _source field. Accepts true or false (default). |
meta |
Metadata about the field. |
Date
JSON doesn’t have a date data type, so dates in Elasticsearch can either be:
- strings containing formatted dates, e.g.
"2015-01-01"or"2015/01/01 12:10:30". - a number representing milliseconds-since-the-epoch.
- a number representing seconds-since-the-epoch (configuration).
Values for milliseconds-since-the-epoch must be non-negative. Use a formatted date to represent dates before 1970.
Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch.
Queries on dates are internally converted to range queries on this long representation, and the result of aggregations and stored fields is converted back to a string depending on the date format that is associated with the field.
Dates will always be rendered as strings, even if they were initially supplied as a long in the JSON document.
Date formats can be customised, but if no format is specified then it uses the default:
"strict_date_optional_time||epoch_millis"
This means that it will accept dates with optional timestamps, which conform to the formats supported by strict_date_optional_time or milliseconds-since-the-epoch.
For instance:
PUT my-index-000001
{
"mappings": {
"properties": {
"date": {
"type": "date"
}
}
}
}
PUT my-index-000001/_doc/1
{ "date": "2015-01-01" }
PUT my-index-000001/_doc/2
{ "date": "2015-01-01T12:10:30Z" }
PUT my-index-000001/_doc/3
{ "date": 1420070400001 }
GET my-index-000001/_search
{
"sort": { "date": "asc"}
}
The date field uses the default format. |
|
|---|---|
| This document uses a plain date. | |
| This document includes a time. | |
| This document uses milliseconds-since-the-epoch. | |
Note that the sort values that are returned are all in milliseconds-since-the-epoch. |
Dates will accept numbers with a decimal point like
{"date": 1618249875.123456}but there are some cases (#70085) where we’ll lose precision on those dates so should avoid them.
Multiple date formats
Multiple formats can be specified by separating them with || as a separator. Each format will be tried in turn until a matching format is found. The first format will be used to convert the milliseconds-since-the-epoch value back into a string.
PUT my-index-000001
{
"mappings": {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
}
}
}
Parameters for date fields
The following parameters are accepted by date fields:
boost |
Mapping field-level query time boosting. Accepts a floating point number, defaults to 1.0. |
|---|---|
doc_values |
Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting? Accepts true (default) or false. |
format |
The date format(s) that can be parsed. Defaults to `strict_date_optional_time |
locale |
The locale to use when parsing dates since months do not have the same names and/or abbreviations in all languages. The default is the ROOT locale, |
ignore_malformed |
If true, malformed numbers are ignored. If false (default), malformed numbers throw an exception and reject the whole document. Note that this cannot be set if the script parameter is used. |
index |
Should the field be searchable? Accepts true (default) and false. |
null_value |
Accepts a date value in one of the configured format’s as the field which is substituted for any explicit null values. Defaults to null, which means the field is treated as missing. Note that this cannot be set of the script parameter is used. |
on_script_error |
Defines what to do if the script defined by the script parameter throws an error at indexing time. Accepts fail (default), which will cause the entire document to be rejected, and continue, which will register the field in the document’s _ignored metadata field and continue indexing. This parameter can only be set if the script field is also set. |
script |
If this parameter is set, then the field will index values generated by this script, rather than reading the values directly from the source. If a value is set for this field on the input document, then the document will be rejected with an error. Scripts are in the same format as their runtime equivalent, and should emit long-valued timestamps. |
store |
Whether the field value should be stored and retrievable separately from the _source field. Accepts true or false (default). |
meta |
Metadata about the field. |
Epoch seconds
If you need to send dates as seconds-since-the-epoch then make sure the format lists epoch_second:
PUT my-index-000001
{
"mappings": {
"properties": {
"date": {
"type": "date",
"format": "strict_date_optional_time||epoch_second"
}
}
}
}
PUT my-index-000001/_doc/example?refresh
{ "date": 1618321898 }
POST my-index-000001/_search
{
"fields": [ {"field": "date"}],
"_source": false
}
Which will reply with a date like:
{
"hits": {
"hits": [
{
"_id": "example",
"_index": "my-index-000001",
"_type": "_doc",
"_score": 1.0,
"fields": {
"date": ["2021-04-13T13:51:38.000Z"]
}
}
]
}
}
{
"hits": {
"hits": [
{
"_id": "example",
"_index": "my-index-000001",
"_type": "_doc",
"_score": 1.0,
"fields": {
"date": ["2021-04-13T13:51:38.000Z"]
}
}
]
}
}
Date nanoseconds
This data type is an addition to the date data type. However there is an important distinction between the two. The existing date data type stores dates in millisecond resolution. The date_nanos data type stores dates in nanosecond resolution, which limits its range of dates from roughly 1970 to 2262, as dates are still stored as a long representing nanoseconds since the epoch.
Dense vector
Flattened
Geo-point
Geo-shape
Histogram
IP
Join
Keyword
Nested
Numeric
Object
JSON documents are hierarchical in nature: the document may contain inner objects which, in turn, may contain inner objects themselves:
PUT my-index-000001/_doc/1
{
"region": "US",
"manager": {
"age": 30,
"name": {
"first": "John",
"last": "Smith"
}
}
}
Internally, this document is indexed as a simple, flat list of key-value pairs, something like this:
{
"region": "US",
"manager.age": 30,
"manager.name.first": "John",
"manager.name.last": "Smith"
}
An explicit mapping for the above document could look like this:
PUT my-index-000001
{
"mappings": {
"properties": {
"region": {
"type": "keyword"
},
"manager": {
"properties": {
"age": { "type": "integer" },
"name": {
"properties": {
"first": { "type": "text" },
"last": { "type": "text" }
}
}
}
}
}
}
}
-
Properties in the top-level mappings definition.
-
The manager field is an inner object field.
-
The manager.name field is an inner object field within the manager field.
You are not required to set the field type to object explicitly, as this is the default value.
Parameters for object fieldsedit
The following parameters are accepted by object fields:
dynamic |
Whether or not new properties should be added dynamically to an existing object. Accepts true (default), false and strict. |
|---|---|
enabled |
Whether the JSON value given for the object field should be parsed and indexed (true, default) or completely ignored (false). |
properties |
The fields within the object, which can be of any data type, including object. New properties may be added to an existing object. |
If you need to index arrays of objects instead of single objects, read Nested first.