How to Improve Elasticsearch Performance

How to Improve Elasticsearch Performance

Fine-tune Elasticsearch with these pro tips

Elasticsearch is one of the most important tools for those looking to enable search within their applications at scale; however, it can be quite challenging to optimize its performance. Often, SLAs require that your APIs return data to customers with extremely low latency, which can be difficult to ensure as your datasets and customer base grow. Similarly, reducing latency is important for a number of applications that rely on search as part of their implementation, such as recommendation systems, fraud detection, cyber security, social networks, and navigation systems. Beyond increasing customer satisfaction, enabling improvements in latency can also lead to significant reductions in cost. Therefore, ensuring that your Elasticsearch performance is fine-tuned is essential for delivering the best customer experience at the lowest possible cost.

Typical Reasons ElasticSearch Performance Suffers

There are a few common, underlying causes that frequently lead ElasticSearch performance to suffer. They are:

  1. Increasingly large datasets

  2. Performing complex aggregations and filters on top of data

  3. Non-textual searches, for example, vector search

  4. Insufficient hardware for scaling searches and challenges in scaling the Elasticsearch cluster

  5. Sharding and data replication

  6. Poorly constructed Elasticsearch queries

  7. Aggregations on high-cardinality fields

We break these down into three categories and suggest how to improve Elasticsearch performance for each problem type.

Optimizing Elasticsearch Queries

Optimizing the Elasticsearch query is the most direct way to improve the speed of your searches. Many times, poor performance is simply due to poorly written queries that neglect Elasticsearch best practices.

One common Elasticsearch slowdown involves nested search queries, i.e. queries that search over nested object structures. Nested queries should be avoided when possible as they are much slower than their non-nested counterparts. Such queries can be avoided by collapsing your Elasticsearch data structures so that the fields you are searching over occur at the top level. While this can lead to a reduction in readability, taking a hit in this area can be worth it due to the significant performance gains that you’re likely to see.

Another common source of Elasticsearch bottlenecks happens in join queries. Performing joins across your data structures are expensive operations and will greatly increase query response times. One approach to solving this issue involves denormalization, which refers to duplicating data across databases so as to avoid joins. This incurs a cost in terms of the disk space required to store your data but will speed up join queries significantly.

Finally, high cardinality fields, i.e. those with a large number of unique values, can be slow to perform aggregations on. Often the slowness of aggregations on high cardinality fields is due to the cost of (re)computing Global Ordinals, basically a data structure which contains an enumeration of the unique values in a field. One way to reduce the impact of recomputation of Global Ordinals is to enable the eager building of the global ordinals. In this case, the ordinals will be recomputed eagerly upon segment refresh and not at query time. However, since ordinals will then be computed at each refresh, this can potentially slow down data ingestion. Therefore the refresh time should also be increased so as to avoid too frequent recomputation.

Here is an example of aggregation on a high cardinality field

GET my_index/_search

{
   "size":0,
   "aggs":{
      "my_aggregation":{
         "terms":{
            "field":"my_high_cardinality_field",
            "size":10
         }
      }
   }
}

This query performs a "terms" aggregation on the "my_high_cardinality_field" field, which is a high cardinality field. The "size" parameter limits the number of results returned to 10. You can adjust this parameter to get more or fewer results.

Note that this query specifies a "size" of 0 for the main search request, which means that Elasticsearch will not return any search hits. Instead, it will only return the aggregation results.

You can enable eager building of Global Ordinals in Elasticsearch by setting the "global_ordinals_eager" parameter to true in your index mapping.

Following on the previous example, here's a snippet that shows how to enable eager building of Global Ordinals for a field named "my_high_cardinality_field":

PUT my_index
{
  "mappings": {
    "properties": {
      "my_high_cardinality_field": {
        "type": "keyword",
        "eager_global_ordinals": true
      }
    }
  }
}

In this example, we've specified the eager_global_ordinals parameter with a value of true for the my_high_cardinality_field field. This tells Elasticsearch to build global ordinals eagerly for this field, which can improve aggregation performance for high cardinality fields.

Note that eager building of Global Ordinals can increase index size and indexing time, so you should only enable it for fields that you know will benefit from it. Also, it's important to note that Global Ordinals are not suitable for all types of aggregations, so you should test your queries to ensure that they are performing well.

Tuning for Indexing Speed

Optimizing indexing speed is one way to boost search query performance and improve Elasticsearch performance. As with high cardinality aggregations, turning up the Elasticsearch refresh interval to the highest tolerable value can lead to improvements in indexing speed. Disabling swapping of the underlying Java process and increasing the amount of memory given to the filesystem cache (ideally at least half of the total system RAM) are other simple performance optimizations.

You can use the following configuration in the elasticsearch.yml configuration file:

bootstrap.memory_lock: true

indices.memory.index_buffer_size: 50%

index.refresh_interval: 30s

Here's what each of these settings does:

  • bootstrap.memory_lock: When set to true, this setting locks the Elasticsearch process in memory, preventing it from being swapped out by the operating system. This is important for ensuring consistent performance and avoiding unexpected latency spikes caused by disk I/O.

Note that setting bootstrap.memory_lock to true may require additional configuration changes depending on your operating system and security policies. You can refer to the Elasticsearch documentation for more information on how to configure memory locking for your specific platform.

  • indices.memory.index_buffer_size: This setting controls the amount of memory allocated for the file system cache used by Elasticsearch to cache frequently accessed data. By default, this setting is set to 10% of the total available memory on the machine, but you can increase it to as much as 50% of available memory, depending on the needs of your use case. Additionally, make sure you have enough RAM available for both the Elasticsearch process and the operating system file cache.

  • index.refresh_interval: controls how frequently Elasticsearch refreshes its search index. By default, this is set to 1 second, which means that Elasticsearch will update its search index once per second. However, in some cases, you may want to increase the refresh interval to reduce the amount of indexing overhead and improve indexing throughput.

In this example, we set the refresh interval to 30 seconds by setting the value to 30s. This means that Elasticsearch will refresh its search index every 30 seconds, reducing the frequency of index updates and potentially improving indexing performance.

Note that increasing the refresh interval can affect the real-time search capabilities of Elasticsearch, as search results may not reflect recent updates to the index until the next refresh interval. As such, you should carefully consider the trade-offs between indexing performance and real-time search requirements when adjusting the refresh interval.

One more complex optimization involves multithreading the process of sending requests. Because a single thread is unlikely to make maximal use of the indexing capacity of an Elasticsearch cluster, splitting requests across multiple threads can lead to immediate performance improvements. Testing should be performed to determine the optimal number of workers for each cluster and search workload. This will greatly help to improve Elasticsearch performance.

Elasticsearch provides support for executing search queries in a multi-threaded fashion using the "search" API. Here's an example of how to split an Elasticsearch query across multiple threads:

from elasticsearch import AsyncElasticsearch
import asyncio

async def search(es, index, query):
    res = await es.search(index=index, body=query)
    return res

async def main():
    es = AsyncElasticsearch(["localhost:9200"])
    queries = [
        {"query": {"match": {"field1": "value1"}}},
        {"query": {"match": {"field2": "value2"}}},
        {"query": {"match": {"field3": "value3"}}}
    ]
    tasks = [search(es, "my_index", query) for query in queries]
    results = await asyncio.gather(*tasks)
    for result in results:
        print(result)

if __name__ == '__main__':
    asyncio.run(main())

Finally, hardware improvements can benefit indexing performance. Using faster storage, such as SSDs rather than HDDs, and configuring them in RAID arrays, can greatly increase the speed of indexing workloads that are I/O bound. It’s also best to avoid remote storage when possible as this introduces the extra overhead of communicating over a network. You should also aim to have at least 64GB of RAM per machine in your Elasticsearch cluster and allocate approximately 28GB of that for the heap. Ideally, Xms and Xmx should be no more than 50% of your total memory, so these numbers can be adjusted as needed. Elasticsearch will use memory beyond that which is required for the JVM heap. The Xms and Xmx values should additionally be less than the threshold for compressed ordinary object pointers. The ideal range is 26GB-30GB on most systems.

Here's an example of how you could set these flags in the Elasticsearch start-up script:

ES_JAVA_OPTS="-Xms28g -Xmx28g" ./bin/elasticsearch

You should also perform testing to determine the optimal number of shards to use, and you should set your shard size to be around 10-50GB.

Increasing Query Throughput

When optimizing for query throughput, first ensure that you are optimally batching your queries. It’s better to submit requests in small batches than using a few large batches that need to compute a huge number of hits. Additionally, increasing the number of replicas that you’re using to support queries is a simple way to boost throughput, assuming that you have the hardware resources available to enable this. Adding more CPU and RAM to your existing replicas is another easy way to increase throughput.

Another optimization for throughput is to use search templates which can allow parameterized queries to be performed while communicating less data over the network. If you expect a query to return a large number of documents, you can also use _source filtering to reduce the size of the data returned by filtering for only the specific fields the user needs. You should also use term queries, rather than match queries when you are looking to find exact matches and searching on keyword fields. Match queries impose an overhead that term queries do not as a result of performing additional analyses on queries. Finally, avoid using wildcard queries whenever possible, and reduce the usage of regex and parent-child queries which can all increase latency and decrease query throughput.

Conclusion

Improving ElasticSearch performance can be tricky, but there are a number of easy, concrete steps that can be taken to improve search speed once you understand your application’s expected query workload. Immediate improvements can usually be gained by simply increasing the amount and power of the hardware allocated to your Elasticsearch cluster. Simple changes such as using SSDs, increasing the filesystem cache, and increasing the number of replicas will often yield large performance gains without forcing you to compromise on search accuracy. Furthermore, once hardware gains have been maximized, you can further tune your searches to avoid certain operations such as wildcard queries. You can also optimize the use of Global Ordinals when your searches need to operate on high cardinality fields, you can tune your cluster’s refresh rate, and you can denormalize your data structures so as to avoid the performance impact of join queries. Through these subtle tweaks, you can often achieve orders of magnitude reductions in search latency and ensure that you are keeping your customers happy and satisfying your SLAs.