Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-46320][CORE] Support
spark.master.rest.host
### What changes were proposed in this pull request? This PR aims to support a new configuration `spark.master.rest.host` to allow REST Server to choose a host to listen. Note that this PR is a re-try of apache#42841 which was closed in favor of `SPARK_MASTER_HOST`. If we use `SPARK_MASTER_HOST`, - `Worker` should use `spark.worker.preferConfiguredMasterAddress=true` always to ignore the address from Master. It's an overhead. - Moreover, there exists a conner case like the following where `Worker` received `ReconnectWorker` like the following. It's very confusing to the users. - We had better support `spark.master.rest.host` instead of using a workaround environment variable, `SPARK_MASTER_HOST`. https://github.com/apache/spark/blob/80dc64a573e1c7678f92f8690f09a52329f7d30b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L335 ``` 23/12/08 04:56:57 INFO Worker: Worker cleanup enabled; old application directories will be deleted in: /data/spark 23/12/08 04:59:34 INFO Worker: Master with url spark://0.0.0.0:7077 requested this worker to reconnect. ``` ### Why are the changes needed? This allows additional controllability on the REST Server. In addition, in K8s environment, K8s port-forwarding only works with 127.0.0.1. ``` $ kubectl port-forward svc/master 6066 Forwarding from 127.0.0.1:6066 -> 6066 Forwarding from [::1]:6066 -> 6066 ``` It's difficult to use `port-forward` because the AS-IS Spark REST Server only listens the podIP and not 127.0.0.1. We can use `spark.master.rest.host=0.0.0.0` with this PR. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? It was a little difficult to write a unit test code because it requires two IPs to test in CI. For manual testing, start the REST API. ``` $ SPARK_NO_DAEMONIZE=1 SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true -Dspark.master.rest.host=0.0.0.0" sbin/start-master.sh ``` And, connect with both IPs (your public IP and 127.0.0.1) on the machine. ``` $ curl -v http://127.0.0.1:6066/v1/submissions/status/0 $ curl -v http://a.b.c.d:6066/v1/submissions/status/0 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#44249 from dongjoon-hyun/SPARK-46320. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
- Loading branch information