Elasticsearch Disk And Data Storage Optimizations

Out of the four basic computing resources (storage, memory, compute, network), storage tends to be positioned as the foremost one to focus on for any architect optimizing an Elasticsearch cluster. Let’s take a closer look at a couple of interesting aspects in relation to the Elasticsearch storage optimization and let’s do some hands-on tests along the way to get actionable insights.

Storage Consist Of two basic parameters:

Data
Disk

Disk Storage Optimization:

When to choose SSD?

Latency is Primary Factor
Data ingesting
Freshest data
Mixed Read/Write Flow

When to choose HDD?

Latency is not primary factor
Historical Data

Initial Phase Life Cycle:

Intensive Writes
Heterogeneous queries
High demand on CPUs (especially if transforming the data on the ingest side)
High RAM to Disk Ratio (e.g. up to 1:30 to keep most of the queries in memory)
Need for high-performance storage allowing for significant sustained indexing IOPS (input/output operations per second) and significant random-read IOPS (which is where SSDs shine, especially NVMe)

Later Phase Life Cycle:

Occasional read-only queries
Higher volumes of historical data
Allowing for more compression (to store the data more densely)
Much lower RAM to Disk ratio (e.g. up to 1:1000+)

RAID

RAID is another topic frequently discussed on Elastic discussion forums as it is usually required in enterprise datacenters. Generally, RAID is optional given the default shards replication (if correctly set up eg. not sharing specific underlying resources) and the decision is driven by whether you want to handle this at the hardware level as well. RAID0 can improve performance but should be kept in pairs only (to keep your infra ops sane). Other RAID configurations with reasonable performance (from the Write perspective for example 1/10) are acceptable but can be costly (in terms of a redundant space used), while RAID5 (though very usual in data centers) tends to be slow(ish).

Multiple Data Paths

Besides RAID, you also have the option to use multiple data paths to link your data volumes (in the elasticsearch.yml path.data). This will result in the distribution of shards across the paths with one shard always in one path only (which is ensured by ES). This way you can achieve a form of data striping across your drives and parallel utilization of multiple drives (when sharding is correctly set up). ES will handle the placement of replica shards on different nodes from the primary shard.

Other Storage Considerations

There are many other performance-impacting technical factors (even for the same class of hardware) such as the current network state, JVM and garbage collection settings, SSD trimming, file system type, etc. So as mentioned above, structured tests are better than guessing.
What remains clear is that the “closer” your storage is to your “compute” resources, the better performance you’ll achieve. Recommended is especially DAS or as second option SAN, avoid NAS. Generally, you don’t want to be exposed to additional factors like networking, communication overhead, etc. (the recommended minimum on throughput is ~ 3Gb/s, 250MB/s).
Always remember that the “local optimum” of a storage layer doesn’t mean a “global optimum” of the cluster performance as a whole. You need to make sure the other aforementioned computing resources are aligned to reach desired performance levels.

Hands-on: Storage and Node Tiering

Node Tagging: Use the “node.attr.KEY: VALUE” configuration in the elasticsearch.yml to mark your nodes according to their performance/usage/allocation classes (eg. ssd or hdd nodes, hot/warm/cold nodes, latest/archive, rack_one/rack_two nodes…)
Shard Filtering: Use the index-level shard allocation filtering to force or prevent the allocation of shards on nodes with previously-defined tags with an index setting parameter of “index.routing.allocation.require.KEY”: “VALUE”
Hot-Warm-Cold: you can automate the whole shard reallocation process via the index lifecycle management tool where specific phases (warm and cold) allow for the definition of the Allocate action to require allocating shards based on desired node attributes.

Comments

comments

Tags: cloud, elastic, elk, performance, REST

Related Posts

Leave a Reply