Here we will explain how to run the example from https://spark.apache.org/docs/3.0.1/running-on-kubernetes.html in minikube.

We assume the you have installed:

Java

Now spark support Java 11, so we are going to use it. If you followed the tutorial How to install multiple java versions you can set it up by

jenv global 11

Create dockers (spark and spark-py)

First got to downloaded spark and use spark’s script to create docker for k8s:

cd ${SPARK_HOME} 
./bin/docker-image-tool.sh -r barteks -t v${SPARK_VERSION}-java11 \
    -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile \
    -b java_image_tag=11-slim build

Here barteks is my Docker Hub account https://hub.docker.com/.

Then push dockers to Docker Hub:

docker push barteks/spark:v${SPARK_VERSION}-java11
docker push barteks/spark-py:v${SPARK_VERSION}-java11

Run minikube

minikube --memory 4096 --cpus 2 start

It’s important to reserve some memory and cpu’s.

Create service account

kubectl create serviceaccount spark-job
kubectl create clusterrolebinding spark-role \
    --clusterrole=edit --serviceaccount=default:spark-job \
    --namespace=default

This corresponds to:

apiVersion: v1
kind: ServiceAccount
metadata:
  name:  spark-job
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spark-role
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: edit
subjects:
  - kind: ServiceAccount
    name: spark-job
    namespace: default

Submit example job

From

 kubectl cluster-info

get Kubernetes master. Should look like https://127.0.0.1:32776 and modify in the command below:

./bin/spark-submit \
  --master k8s://https://127.0.0.1:32776 \
  --deploy-mode cluster \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.executor.instances=2 \
  --conf spark.kubernetes.container.image=barteks/spark-py:v${SPARK_VERSION}-java11 \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-job \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar

(check spark’s example directory for the proper link).

Check results

kubectl get pods

The check logs of the corresponding pod:

kubectl logs spark-pi-1751c5764c6efff5-driver

Updated: 2020-12-28