docs: Add instructions on running TPC-H on macOS#1647
Conversation
|
|
||
| # Comet Benchmarking on macOS | ||
|
|
||
| This guide is for setting up TPC-H benchmarks locally on macOS using the 100 GB dataset. |
There was a problem hiding this comment.
There is also -
CometTPCHQueryBenchmark and CometTPCDSQueryBenchmark
| --conf spark.memory.offHeap.enabled=true \ | ||
| --conf spark.memory.offHeap.size=16g \ | ||
| --conf spark.eventLog.enabled=true \ | ||
| /path/to/datafusion-benchmarks/tpcbench.py \ |
There was a problem hiding this comment.
It seems to be datafusion-benchmarks/runners/datafusion-comet/tpcbench.py
| Install Spark | ||
|
|
||
| ```shell | ||
| wget https://archive.apache.org/dist/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz |
There was a problem hiding this comment.
should we use 3.5.5?
| Set `SPARK_MASTER` env var (host name will need to be edited): | ||
|
|
||
| ```shell | ||
| export SPARK_MASTER=spark://Rustys-MacBook-Pro.local:7077 |
There was a problem hiding this comment.
Spark master does not bind to localhost by default. We could specify --host localhost when starting the master process, but I have not tried this on macOS yet. I will test this and update this PR.
There was a problem hiding this comment.
FWIW, this is what I currently use to start Spark for benchmarking TPC-H on macOS:
export SPARK_HOME=/opt/spark-3.5.5-bin-hadoop3
export SPARK_MASTER="local[*]"
export SPARK_MASTER_HOST="127.0.0.1"
export SPARK_LOCAL_IP="127.0.0.1"
$SPARK_HOME/sbin/start-master.sh
$SPARK_HOME/sbin/start-worker.sh $SPARK_MASTER_HOST:7077
$SPARK_HOME/sbin/start-history-server.sh
There was a problem hiding this comment.
I have been using standalone mode rather than local mode, but perhaps local mode may make more sense for macOS.
|
are we okay to merge this PR? |
Yes, I'll go ahead and merge and we can follow up with change |
Which issue does this PR close?
Closes #1648 (sort of .. this PR will explain why the benchmark results are unstable on macOS)
Rationale for this change
What changes are included in this PR?
How are these changes tested?