ElasticSearch

Mapping

Aggregate

Stores pre-aggregated numeric values for metric aggregations. An aggregate_metric_double field is an object containing one or more of the following metric sub-fields: min, max, sum, and value_count.

When you run certain metric aggregations on an aggregate_metric_double field, the aggregation uses the related sub-field’s values. For example, a min aggregation on an aggregate_metric_double field returns the minimum value of all min sub-fields.

An aggregate_metric_double field stores a single numeric doc value for each metric sub-field. Array values are not supported. min, max, and sum values are double numbers. value_count is a positive long number.

Alias

Arrays

In Elasticsearch, there is no dedicated array data type. Any field can contain zero or more values by default, however, all values in the array must be of the same data type. For instance:

Arrays of objects

Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the [nested](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) data type instead of the [object](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/object.html) data type.

This is explained in more detail in [Nested](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/nested.html).

When adding a field dynamically, the first value in the array determines the field type. All subsequent values must be of the same data type or it must at least be possible to [coerce](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/coerce.html) subsequent values to the same data type.

Arrays with a mixture of data types are not supported: [ 10, "some string" ]

An array may contain null values, which are either replaced by the configured [null_value](dfile:///Users/trylife/Library/Application Support/Dash/DocSets/ElasticSearch/ElasticSearch.docset/Contents/Resources/Documents/www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html) or skipped entirely. An empty array [] is treated as a missing field — a field with no values.

Nothing needs to be pre-configured in order to use arrays in documents, they are supported out of the box:

PUT my-index-000001/_doc/1
{
  "message": "some arrays in this document...",
  "tags":  [ "elasticsearch", "wow" ], 
  "lists": [ 
    {
      "name": "prog_list",
      "description": "programming list"
    },
    {
      "name": "cool_list",
      "description": "cool stuff list"
    }
  ]
}

PUT my-index-000001/_doc/2 
{
  "message": "no arrays in this document...",
  "tags":  "elasticsearch",
  "lists": {
    "name": "prog_list",
    "description": "programming list"
  }
}

GET my-index-000001/_search
{
  "query": {
    "match": {
      "tags": "elasticsearch" 
    }
  }
}
The tags field is dynamically added as a string field.
The lists field is dynamically added as an object field.
The second document contains no arrays, but can be indexed into the same fields.
The query looks for elasticsearch in the tags field, and matches both documents.

Multi-value fields and the inverted index

The fact that all field types support multi-value fields out of the box is a consequence of the origins of Lucene. Lucene was designed to be a full text search engine. In order to be able to search for individual words within a big block of text, Lucene tokenizes the text into individual terms, and adds each term to the inverted index separately.

This means that even a simple text field must be able to support multiple values by default. When other data types were added, such as numbers and dates, they used the same data structure as strings, and so got multi-values for free.

Binary

Boolean

Boolean fields accept JSON true and false values, but can also accept strings which are interpreted as either true or false:

False values false, "false", "" (empty string)
True values true, "true"

For example:

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "is_published": {
        "type": "boolean"
      }
    }
  }
}

POST my-index-000001/_doc/1?refresh
{
  "is_published": "true" 
}

GET my-index-000001/_search
{
  "query": {
    "term": {
      "is_published": true 
    }
  }
}
Indexing a document with "true", which is interpreted as true.
Searching for documents with a JSON true.

Aggregations like the terms aggregation use 1 and 0 for the key, and the strings "true" and "false" for the key_as_string. Boolean fields when used in scripts, return true and false:

POST my-index-000001/_doc/1?refresh
{
  "is_published": true
}

POST my-index-000001/_doc/2?refresh
{
  "is_published": false
}

GET my-index-000001/_search
{
  "aggs": {
    "publish_state": {
      "terms": {
        "field": "is_published"
      }
    }
  },
  "sort": [ "is_published" ],
  "fields": [
    {"field": "weight"}
  ],
  "runtime_mappings": {
    "weight": {
      "type": "long",
      "script": "emit(doc['is_published'].value ? 10 : 0)"
    }
  }
}

Parameters for boolean fields

The following parameters are accepted by boolean fields:

boost Mapping field-level query time boosting. Accepts a floating point number, defaults to 1.0.
doc_values Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting? Accepts true (default) or false.
index Should the field be searchable? Accepts true (default) and false.
null_value Accepts any of the true or false values listed above. The value is substituted for any explicit null values. Defaults to null, which means the field is treated as missing. Note that this cannot be set if the script parameter is used.
on_script_error Defines what to do if the script defined by the script parameter throws an error at indexing time. Accepts fail (default), which will cause the entire document to be rejected, and continue, which will register the field in the document’s _ignored metadata field and continue indexing. This parameter can only be set if the script field is also set.
script If this parameter is set, then the field will index values generated by this script, rather than reading the values directly from the source. If a value is set for this field on the input document, then the document will be rejected with an error. Scripts are in the same format as their runtime equivalent.
store Whether the field value should be stored and retrievable separately from the _source field. Accepts true or false (default).
meta Metadata about the field.

Date

JSON doesn’t have a date data type, so dates in Elasticsearch can either be:

Values for milliseconds-since-the-epoch must be non-negative. Use a formatted date to represent dates before 1970.

Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch.

Queries on dates are internally converted to range queries on this long representation, and the result of aggregations and stored fields is converted back to a string depending on the date format that is associated with the field.

Dates will always be rendered as strings, even if they were initially supplied as a long in the JSON document.

Date formats can be customised, but if no format is specified then it uses the default:

"strict_date_optional_time||epoch_millis"

This means that it will accept dates with optional timestamps, which conform to the formats supported by strict_date_optional_time or milliseconds-since-the-epoch.

For instance:

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "date": {
        "type": "date"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{ "date": "2015-01-01" } 

PUT my-index-000001/_doc/2
{ "date": "2015-01-01T12:10:30Z" } 

PUT my-index-000001/_doc/3
{ "date": 1420070400001 } 

GET my-index-000001/_search
{
  "sort": { "date": "asc"} 
}
The date field uses the default format.
This document uses a plain date.
This document includes a time.
This document uses milliseconds-since-the-epoch.
Note that the sort values that are returned are all in milliseconds-since-the-epoch.

Dates will accept numbers with a decimal point like {"date": 1618249875.123456} but there are some cases (#70085) where we’ll lose precision on those dates so should avoid them.

Multiple date formats

Multiple formats can be specified by separating them with || as a separator. Each format will be tried in turn until a matching format is found. The first format will be used to convert the milliseconds-since-the-epoch value back into a string.

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "date": {
        "type":   "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

Parameters for date fields

The following parameters are accepted by date fields:

boost Mapping field-level query time boosting. Accepts a floating point number, defaults to 1.0.
doc_values Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting? Accepts true (default) or false.
format The date format(s) that can be parsed. Defaults to `strict_date_optional_time
locale The locale to use when parsing dates since months do not have the same names and/or abbreviations in all languages. The default is the ROOT locale,
ignore_malformed If true, malformed numbers are ignored. If false (default), malformed numbers throw an exception and reject the whole document. Note that this cannot be set if the script parameter is used.
index Should the field be searchable? Accepts true (default) and false.
null_value Accepts a date value in one of the configured format’s as the field which is substituted for any explicit null values. Defaults to null, which means the field is treated as missing. Note that this cannot be set of the script parameter is used.
on_script_error Defines what to do if the script defined by the script parameter throws an error at indexing time. Accepts fail (default), which will cause the entire document to be rejected, and continue, which will register the field in the document’s _ignored metadata field and continue indexing. This parameter can only be set if the script field is also set.
script If this parameter is set, then the field will index values generated by this script, rather than reading the values directly from the source. If a value is set for this field on the input document, then the document will be rejected with an error. Scripts are in the same format as their runtime equivalent, and should emit long-valued timestamps.
store Whether the field value should be stored and retrievable separately from the _source field. Accepts true or false (default).
meta Metadata about the field.

Epoch seconds

If you need to send dates as seconds-since-the-epoch then make sure the format lists epoch_second:

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "date": {
        "type":   "date",
        "format": "strict_date_optional_time||epoch_second"
      }
    }
  }
}

PUT my-index-000001/_doc/example?refresh
{ "date": 1618321898 }

POST my-index-000001/_search
{
  "fields": [ {"field": "date"}],
  "_source": false
}

Which will reply with a date like:

{
  "hits": {
    "hits": [
      {
        "_id": "example",
        "_index": "my-index-000001",
        "_type": "_doc",
        "_score": 1.0,
        "fields": {
          "date": ["2021-04-13T13:51:38.000Z"]
        }
      }
    ]
  }
}
{
  "hits": {
    "hits": [
      {
        "_id": "example",
        "_index": "my-index-000001",
        "_type": "_doc",
        "_score": 1.0,
        "fields": {
          "date": ["2021-04-13T13:51:38.000Z"]
        }
      }
    ]
  }
}

Date nanoseconds

This data type is an addition to the date data type. However there is an important distinction between the two. The existing date data type stores dates in millisecond resolution. The date_nanos data type stores dates in nanosecond resolution, which limits its range of dates from roughly 1970 to 2262, as dates are still stored as a long representing nanoseconds since the epoch.

Dense vector

Flattened

Geo-point

Geo-shape

Histogram

IP

Join

Keyword

Nested

Numeric

Object

JSON documents are hierarchical in nature: the document may contain inner objects which, in turn, may contain inner objects themselves:

PUT my-index-000001/_doc/1
{ 
  "region": "US",
  "manager": { 
    "age":     30,
    "name": { 
      "first": "John",
      "last":  "Smith"
    }
  }
}

Internally, this document is indexed as a simple, flat list of key-value pairs, something like this:

{
  "region":             "US",
  "manager.age":        30,
  "manager.name.first": "John",
  "manager.name.last":  "Smith"
}

An explicit mapping for the above document could look like this:

PUT my-index-000001
{
  "mappings": {
    "properties": { 
      "region": {
        "type": "keyword"
      },
      "manager": { 
        "properties": {
          "age":  { "type": "integer" },
          "name": { 
            "properties": {
              "first": { "type": "text" },
              "last":  { "type": "text" }
            }
          }
        }
      }
    }
  }
}

You are not required to set the field type to object explicitly, as this is the default value.

Parameters for object fieldsedit

The following parameters are accepted by object fields:

dynamic Whether or not new properties should be added dynamically to an existing object. Accepts true (default), false and strict.
enabled Whether the JSON value given for the object field should be parsed and indexed (true, default) or completely ignored (false).
properties The fields within the object, which can be of any data type, including object. New properties may be added to an existing object.

If you need to index arrays of objects instead of single objects, read Nested first.

Percolator

Point

Range

Rank feature

Rank features

Search-as-you-type

Shape

Sparse vector

Text

Token count

Unsigned long

Version