-
Notifications
You must be signed in to change notification settings - Fork 120
PySpark Project Creation
Awantik Das edited this page Feb 13, 2019
·
21 revisions
- sudo add-apt-repository ppa:webupd8team/java
- sudo apt update; sudo apt install oracle-java8-installer
- export SPARK_HOME=/home/awantik/packages/spark-2.4.0-bin-hadoop2.7
- export PATH=$SPARK_HOME/bin:$PATH
- Create Project directory
- Copy launch_spark_submit script here ( Required if notebook also running on same spark )
#!/bin/bash
unset PYSPARK_DRIVER_PYTHON
spark-submit $*
export PYSPARK_DRIVER_PYTHON=jupyter
-
Now create entry program entry.py with 'main'
-
create another dir 'additionalCode'
-
cd additionalCode
-
Create setup.py
from setuptools import setup
setup(
name='PySparkUtilities',
version='0.1dev',
packages=['utilities'],
license='''
Creative Commons
Attribution-Noncommercial-Share Alike license''',
long_description='''
An example of how to package code for PySpark'''
)
-
mkdir utilities
-
Copy modules inside it
-
In additionalCode execute - python setup.py bdist_egg
-
This will create dist dir.
-
dist will contain egg file
-
To run ./launch_spark_submit.sh --master local[4] --py-files additionalCode/dist/PySparkUtilities-0.2.dev0-py2.7.egg entry.py