[BEAM-7502] Create ParDo Python Load Test Jenkins job#9042
[BEAM-7502] Create ParDo Python Load Test Jenkins job#9042lgajowy merged 3 commits intoapache:masterfrom
Conversation
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
1 similar comment
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
|
@lgajowy Please take a look. |
|
Run Seed Job |
| jobProperties: [ | ||
| job_name : 'load-tests-python-dataflow-batch-pardo-1-' + now, | ||
| project : 'apache-beam-testing', | ||
| temp_location : 'gs://temp-storage-for-perf-tests/smoketests', |
There was a problem hiding this comment.
Is this the correct temp location?
There was a problem hiding this comment.
It's correct, but 'gs://temp-storage-for-perf-tests/loadtests' seems to be used more often in load tests.
There was a problem hiding this comment.
Could you explain why is this a problem? Did you experience any issues when using /loadtests?
There was a problem hiding this comment.
I think you misunderstood. It's not a problem. /smoketests was there accidentally, and I've already replaced it to /loadtests, which suits better.
There was a problem hiding this comment.
Right, we're seeing stale code in the thread. Thanks! :)
| '"value_size": 90}\'', | ||
| iterations : 10, | ||
| number_of_counter_operations: 0, | ||
| number_of_counters : 1, |
There was a problem hiding this comment.
The number_of_counter_operations is zero, so there is actually no metrics overhead.
But I guess this 1 counter can be misleading, so I'll change it.
| iterations : 10, | ||
| number_of_counter_operations: 0, | ||
| number_of_counters : 1, | ||
| num_workers : 10, |
There was a problem hiding this comment.
Why are we using 10 workers here?
There was a problem hiding this comment.
I've already changed it to 5.
| ], | ||
| ]} | ||
|
|
||
| def loadTestConfigurationManyCounters = { datasetName -> [ |
There was a problem hiding this comment.
I don't think splitting the job to 2 jobs is the solution here. It is very weird that it takes so much time (above 2 hours) to run, whereas java needs 28 minutes to run all the tests (as we discussed offline). Could you investigate this a little bit more? Maybe the pipeline shape is not as we think it is?
There was a problem hiding this comment.
I have double-checked the code and it seems fine. The problem is poor performance of metrics operations in Python — test case with 10 iterations and 100 counter operations needed almost 3 hours to complete.
The solution is to lower the number of iterations to 1, which makes the test way faster. Job split will be also unnecessary.
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
74cdcdd to
d58fbee
Compare
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
To keep it compliant with other files in the same directory.
|
@lgajowy It's done, commits are ready |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
Based on the following proposal: https://s.apache.org/load-test-basic-operations
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.