Spark in minikube
Here we will explain how to run the example from https://spark.apache.org/docs/3.0.1/running-on-kubernetes.html in minikube.
We assume the you have installed:
- Spark locally How to install pyspark locally
- Install minikube Kubernetes hello world
- Also java 11 How to install multiple java versions
Java
Now spark support Java 11, so we are going to use it. If you followed the tutorial How to install multiple java versions you can set it up by
jenv global 11
Create dockers (spark and spark-py)
First got to downloaded spark and use spark’s script to create docker for k8s:
cd ${SPARK_HOME}
./bin/docker-image-tool.sh -r barteks -t v${SPARK_VERSION}-java11 \
-p kubernetes/dockerfiles/spark/bindings/python/Dockerfile \
-b java_image_tag=11-slim build
Here barteks
is my Docker Hub account https://hub.docker.com/.
Then push dockers to Docker Hub:
docker push barteks/spark:v${SPARK_VERSION}-java11
docker push barteks/spark-py:v${SPARK_VERSION}-java11
Run minikube
minikube --memory 4096 --cpus 2 start
It’s important to reserve some memory and cpu’s.
Create service account
kubectl create serviceaccount spark-job
kubectl create clusterrolebinding spark-role \
--clusterrole=edit --serviceaccount=default:spark-job \
--namespace=default
This corresponds to:
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark-job
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: spark-role
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: edit
subjects:
- kind: ServiceAccount
name: spark-job
namespace: default
Submit example job
From
kubectl cluster-info
get Kubernetes master
. Should look like https://127.0.0.1:32776
and modify in the command below:
./bin/spark-submit \
--master k8s://https://127.0.0.1:32776 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=barteks/spark-py:v${SPARK_VERSION}-java11 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-job \
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar
(check spark’s example directory for the proper link).
Check results
kubectl get pods
The check logs of the corresponding pod:
kubectl logs spark-pi-1751c5764c6efff5-driver
Updated: 2020-12-28