How to install pyspark locally
How to install pyspark locally
Download and configure spark
First create a directory of storing spark. We will use directory ~/programs
.
Then in your ~/.zshrc
add the following variables:
export SPARK_VERSION=3.0.1
export SPARK_PACKAGE=spark-${SPARK_VERSION}-bin-hadoop3.2
export SPARK_HOME=$HOME/programs/${SPARK_PACKAGE}
export PATH=${SPARK_HOME}/bin:$PATH
Then call
source ~/.zshrc
Then run
curl -O https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/${SPARK_PACKAGE}.tgz\
&& tar -xvzf ${SPARK_PACKAGE}.tgz \
&& mv ${SPARK_PACKAGE} ${SPARK_HOME} \
&& rm ${SPARK_PACKAGE}.tgz
Install pyspark
Then create python virtual environment and install pyspark with
pip install pyspark==${SPARK_VERSION}
Test it
Create script x2.py
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
if __name__ == "__main__":
spark = SparkSession\
.builder\
.getOrCreate()
lst = [(0, ), (1, ), (2, ), (3, )]
dataset = spark.createDataFrame(lst, ["x"])
x = dataset.select(
F.sum(F.pow(F.col("x"), F.lit(2))).alias("sumSquares")
).collect()
print("*************************")
print(" Sum of squares is ", x[0]["sumSquares"])
print("*************************")
spark.stop()
Run it with
python x2.py
Updated: 2020-12-09