File tree 4 files changed +1026
-14
lines changed
4 files changed +1026
-14
lines changed Original file line number Diff line number Diff line change 1
- FROM ubuntu:18.04
1
+ FROM ubuntu:18.04 as pyspark
2
2
3
3
ENV SPARK_VERSION=3.1.2
4
4
ENV HADOOP_VERSION=2.7
5
5
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
6
6
7
- WORKDIR /app
8
-
9
7
COPY requirements.txt .
8
+
9
+ # Install OpenJDK-8, Hadoop, & PySpark 3
10
10
RUN apt-get update \
11
- && apt-get install -y python3 python3-pip wget software-properties-common openjdk-8-jdk \
11
+ && apt-get install -y python3.8 python3-pip wget software-properties-common openjdk-8-jdk \
12
12
&& export JAVA_HOME \
13
13
&& pip3 install --upgrade pip \
14
- && pip3 install --no-cache -r requirements.txt \
14
+ && pip3 install --no-cache-dir -r requirements.txt \
15
15
&& wget --no-verbose http://apache.mirror.iphh.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
16
16
&& tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
17
17
&& mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
18
18
&& rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
19
- && apt-get remove -y curl bzip2 \
20
- && apt-get autoremove -y \
21
19
&& apt-get clean
Original file line number Diff line number Diff line change @@ -21,13 +21,29 @@ The docker image may assume local AWS configurations and secrets on run for spec
21
21
docker run --rm=true -v ~ /.aws:/root/.aws < etc...>
22
22
```
23
23
24
- This image can be extended to run any PySpark .py script using python3 or spark-submit .
24
+ This image can be extended to run any PySpark .py script using ` python3 ` .
25
25
26
+ ## Example
27
+ Set up your local docker container that will run your scripts,
28
+ in our case it could be ` scripts/main.py ` and one could set up the data in ` data/* ` :
26
29
``` docker
27
30
FROM dirkscgm/pyspark3:latest
28
31
29
32
WORKDIR /app
33
+
30
34
COPY scripts/* scripts/
35
+ COPY data/* data/
36
+
37
+ ENTRYPOINT ["python3"]
38
+ CMD ["scripts/main.py"]
39
+ ```
40
+
41
+ Build the local image:
42
+ ``` shell
43
+ docker build -t pyspark3 .
44
+ ```
31
45
32
- ENTRYSCRIPT ["python3", "scripts/main.py"]
33
- ```
46
+ Run the container with the CMD set to the main entrypoint of the spark application:
47
+ ``` shell
48
+ docker run --rm=true pyspark3
49
+ ```
You can’t perform that action at this time.
0 commit comments