はじめに

検索チームの加藤です。

弊社ではElasticsearchを使っているのですが、日々モニタリングを見ているとたまに負荷が高くなることがあり、「チューニングできたらなあ」と思っています。

チューニングするには、まず計測ですね。ということで今回はElasticsearchの負荷試験を試してみたいと思います。

今回やること

RallyというElasticsearch用のベンチマークツールを使って負荷試験をやってみます。

弊社ではElasticCloudを使っているため、できるだけ本番と同じ環境を再現するため以下に取り組みます。

ElasticCloud上の既存のdeploymentから検証用データを作成する
ElasticCloud上の新しいのdeploymentで検証用データを使って負荷試験実施する
シナリオを修正してみる

Rallyとは

RallyはElastic社が提供するElasticsearchをベンチマークするためのPython製ベンチマークツールです。

esrally.readthedocs.io

Rallyのできること

ベンチマークツールなので、負荷試験の実行と結果の記録はもちろん

ベンチマークのための Elasticsearch クラスターのセットアップと破棄
パフォーマンス結果の比較
既存クラスターからの検証用データの作成

のようなこともできます。

特徴的なのは、Elasticsearchとテストデータを準備しなくてもよいという点です。

Quickstart参考に以下のコマンドですぐに試せます。

> esrally race --distribution-version=6.5.3 --track=geonames

Elasticsearch6.5.3がダウンロードされて、Rallyが用意しているドキュメントコーパスを使用したベンチマークシナリオ(trackとよぶ)の一つである「geonames」が実行されます。

用意されているtrackは以下のコマンドで確認できます。

> esrally list tracks


Available tracks:

Name              Description                                                              Documents    Compressed Size    Uncompressed Size    Default Challenge        All Challenges
----------------  -----------------------------------------------------------------------  -----------  -----------------  -------------------  -----------------------  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
geonames          POIs from Geonames                                                       11,396,503   252.9 MB           3.3 GB               append-no-conflicts      append-no-conflicts,append-no-conflicts-index-only,append-fast-with-conflicts,significant-text
percolator        Percolator benchmark based on AOL queries                                2,000,000    121.1 kB           104.9 MB             append-no-conflicts      append-no-conflicts
http_logs         HTTP server log data                                                     247,249,096  1.2 GB             31.1 GB              append-no-conflicts      append-no-conflicts,runtime-fields,append-no-conflicts-index-only,append-sorted-no-conflicts,append-index-only-with-ingest-pipeline,update,append-no-conflicts-index-reindex-only
geoshape          Shapes from PlanetOSM                                                    84,220,567   17.0 GB            58.7 GB              append-no-conflicts      append-no-conflicts,append-no-conflicts-big
elastic/security  Track for simulating Elastic Security workloads                          77,513,777   N/A                N/A                  security-querying        security-querying,security-indexing,index-alert-source-events,security-indexing-querying
elastic/logs      Track for simulating logging workloads                                   16,469,078   N/A                N/A                  logging-indexing         logging-snapshot-restore,logging-indexing-querying,logging-disk-usage,many-shards-quantitative,logging-snapshot,many-shards-base,logging-indexing,cross-clusters-search,many-shards-snapshots,logging-querying,many-shards-full
elastic/endpoint  Endpoint track                                                           0            0 bytes            0 bytes              default                  default
tsdb              metricbeat information for elastic-app k8s cluster                       116,633,698  N/A                123.0 GB             append-no-conflicts      append-no-conflicts
metricbeat        Metricbeat data                                                          1,079,600    87.7 MB            1.2 GB               append-no-conflicts      append-no-conflicts
geopoint          Point coordinates from PlanetOSM                                         60,844,404   482.1 MB           2.3 GB               append-no-conflicts      append-no-conflicts,append-no-conflicts-index-only,append-fast-with-conflicts
nyc_taxis         Taxi rides in New York in 2015                                           165,346,692  4.5 GB             74.3 GB              append-no-conflicts      append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts-index-only,update,append-ml,aggs
geopointshape     Point coordinates from PlanetOSM indexed as geoshapes                    60,844,404   470.8 MB           2.6 GB               append-no-conflicts      append-no-conflicts,append-no-conflicts-index-only,append-fast-with-conflicts
so                Indexing benchmark using up to questions and answers from StackOverflow  36,062,278   8.9 GB             33.1 GB              append-no-conflicts      append-no-conflicts
so_vector         Benchmark for vector search with StackOverflow data                      2,000,000    12.3 GB            32.2 GB              index-and-search         index-and-search
dense_vector      Benchmark for dense vector indexing and search                           10,000,000   7.2 GB             19.5 GB              index-and-search         index-and-search
eql               EQL benchmarks based on endgame index of SIEM demo cluster               60,782,211   4.5 GB             109.2 GB             default                  index-sorting,default
nested            StackOverflow Q&A stored as nested docs                                  11,203,029   663.3 MB           3.4 GB               nested-search-challenge  nested-search-challenge,index-only
noaa              Global daily weather measurements from NOAA                              33,659,481   949.4 MB           9.0 GB               append-no-conflicts      append-no-conflicts,append-no-conflicts-index-only,aggs
pmc               Full text benchmark with academic papers from PMC                        574,199      5.5 GB             21.7 GB              append-no-conflicts      indexing-querying,append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-fast-with-conflicts
sql               SQL query performance based on NOAA Weather data                         33,659,481   949.4 MB           9.0 GB               sql                      sql

使ってみる

環境

macOS: Monterey
python: 3.9.13

RallyはPython3.8 以上に対応しています。

ローカルでElasticsearchクラスターを作るためにはElasticsearchの各バージョンに対応したJDKが必要です。今回はElasticCloudを使うのでJDKの準備はしていません。

準備

インストール

> pip3 install esrally

ElasticCloudのdeployment作成

こちらの記事を参考にさくっと立ち上げます。

tech.visasq.com

カスタムトラックを作成

create-trackというサブコマンドを使って、既存のクラスタからデータを生成します。

今回はElasticCloudのdeploymentを対象とするため、公式ドキュメントを参考にtarget-hosts とclient-optionsを設定します。

> esrally create-track --track=test-track --target-hosts="abcdef123456.europe-west1.gcp.cloud.es.io:9243"  --client-options="use_ssl:true,verify_certs:true,basic_auth_user:'user',basic_auth_password:'password'" --indices="index_name" --output-path=./tracks

option	説明	ElasticCloudで使う際のポイント
track	作成するtrack名
target-hosts	既存クラスタのhost	elastic-cloudのurl:9243を指定する
client-options	Rally内部で使っているElasticsearchクライアントのオプション	basic_auth_userとbasic_auth_passwordにcredentialsの値を入れる
indices	対象にするインデックス名
output-path	ユースケースやデータの作成先のpath

作成されたファイルを確認します。

> tree track/test-track

track/test-track
├── track.json
├── index-name-documents-1k.json
├── index-name-documents-1k.json.bz2
├── index-name-documents.json
├── index-name-documents.json.bz2
├── index-name-documents.json.offset
└── index-name.json

track.jsonがtrackの定義です。

mapping定義と取得したドキュメントも追加されています。

負荷試験実行

raceというサブコマンドを使って負荷試験を実行します。

> esrally race --track-path=./tracks/test-track --elasticsearch-plugins="analysis-icu,analysis-kuromoji" --pipeline=benchmark-only --target-hosts="abcdef123456.europe-west1.gcp.cloud.es.io:9243"  --client-options="use_ssl:true,verify_certs:true,basic_auth_user:'user',basic_auth_password:'password'"

option	説明
track-path	独自のtrackを使う場合にpathを指定する。
elasticsearch-plugins	Elasticsearchでpluginをインストールしたい場合に指定する。
pipeline	Rallyを実行するpipelineを指定。既存のクラスタを使う場合はElasticsearchを起動させないbenchmark-onlyを指定する。
target-hosts	既存クラスタのhost。
client-options	Rally内部で使っているElasticsearchクライアントのオプション。

処理が実行されて、結果が表示されます。

[INFO] Race id is [a572ceb0-7665-44ae-8d78-103a5973f2ee]
[INFO] Racing on track [test-track] and car ['hoge'] with version [7.17.5].

[WARNING] merges_total_time is 135788 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 109703 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 602334 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]
Running bulk                                                                   [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                         Metric |       Task |          Value |   Unit |
|---------------------------------------------------------------:|-----------:|---------------:|-------:|
....................................................

track定義を修正

create-trackで作成されるオペレーションは以下の４つです。

indexの削除
indexの作成
clusterのhelthチェック
bulkでのデータの追加

実際に負荷試験を行う場合は、シナリオに沿ったオペレーションも試したいので今回は以下のオペレーションを試してみます。

bulkの実行数を増やす
match_allで検索する
indexとsearchが同時に行われる

track.jsonの見方

track.jsonには、使うindexやデータの情報、オペレーションの定義などが書いてあります。

長くなるので一部抜粋ですが、オペレーションの定義している部分が以下のscheduleの配列です。

{
    "schedule": [
        {
        "operation": "delete-index"
        },
        {
        "operation": {
            "operation-type": "create-index",
            "settings": {{index_settings | default({}) | tojson}}
        }
        },
        .....
    ]
}

bulkの実行数を増やす

bulkのオペレーションはすでに定義されているので、設定を変更します。

scheduleから"operation-type": "bulk"の連想配列を探します。オペレーションの実行回数を変更する際はiterationsの値を変更すればOKです。

{
    "operation": {
    "operation-type": "bulk",
    "bulk-size": 1000,
    "ingest-percentage": {{ingest_percentage | default(100)}}
    },
    "clients": {{bulk_indexing_clients | default(8)}},
    "iterations": 5000  # ここを変更
}

match_allで検索する

検索オペレーションを追加するには、scheduleの配列に"operation-type": "search"の連想配列を追加します。

bodyで検索クエリを、clientsで並列数を、iterationsで実行回数を指定します。

Track Reference - Rally 2.6.0 documentation

{
    "operation": {
    "name": "search-all",
    "operation-type": "search",
    "body": {
        "query": {
        "match_all": {}
        }
    }
    },
    "clients": 10,
    "iterations": 10
}

indexとsearchが同時に行われる

これまでのオペレーションでは、更新だけ検索だけとタスクを一つずつ実行してきました。しかし、実際の運用では更新や検索が同時に実行されることもあります。

複数のタスクを同時に実行する場合は、parallelを指定します。

Track Reference - Rally 2.6.0 documentation

parallel以下のtasksの配列に同時に実行したいをオペレーションを定義します。

以下の例では、bulkでの更新とmatch_all queryを使った検索を同時に5並列で1000回実行します。

{
    "parallel": {
        "tasks": [
        {
            "name": "update",
            "operation": {
            "operation-type": "bulk",
            "bulk-size": 1
            },
            "clients": 5,
            "iterations": 1000
        },
        {
            "name": "search",
            "operation": {
            "name": "search-all",
            "operation-type": "search",
            "body": {
                "query": {
                "match_all": {}
                }
            }
            },
            "clients": 5,
            "iterations": 1000
        }
        ]
    }
}

実行

オペレーションを追加したので実行してみます。

> esrally race --track-path=./tracks/test-track --elasticsearch-plugins="analysis-icu,analysis-kuromoji" --pipeline=benchmark-only --target-hosts="abcdef123456.europe-west1.gcp.cloud.es.io:9243"  --client-options="use_ssl:true,verify_certs:true,basic_auth_user:'user',basic_auth_password:'password'"

結果を見ると、追加したsearch-allの実行とsearchとupdateが並列に実行されたことがわかります。

[INFO] Race id is [a572ceb0-7665-44ae-8d78-103a5973f2ee]
[INFO] Racing on track [test-track] and car ['hoge'] with version [7.17.5].

[WARNING] merges_total_time is 135788 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 109703 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 602334 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running cluster-health                                                         [100% done]
Running bulk                                                                   [100% done]
Running search-all                                                             [100% done]
Running search,update                                                           [100% done]

おわりに

RallyはElastic公式のツールだけあって、Elasticsearchの負荷試験を行うにはとても便利なツールです。

最初はすこしとっつきにくさもありますが、現状では慣れてくれば割と柔軟にやりたいことができてくるように感じます。

これでガリガリElasticsearchのチューニングができますね。そのチューニングをする役目はこれを見ているあなたかもしれません！

open.talentio.com

VisasQ Dev Blog

ビザスク開発ブログ

ベンチマークツールRallyを使ってElasticsearchの負荷試験を実施する