Skip to content

Commit 2c40e43

Browse files
committed
Ensured entrypoint for python
1 parent 09804f0 commit 2c40e43

File tree

4 files changed

+1026
-14
lines changed

4 files changed

+1026
-14
lines changed

Dockerfile

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,19 @@
1-
FROM ubuntu:18.04
1+
FROM ubuntu:18.04 as pyspark
22

33
ENV SPARK_VERSION=3.1.2
44
ENV HADOOP_VERSION=2.7
55
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
66

7-
WORKDIR /app
8-
97
COPY requirements.txt .
8+
9+
# Install OpenJDK-8, Hadoop, & PySpark 3
1010
RUN apt-get update \
11-
&& apt-get install -y python3 python3-pip wget software-properties-common openjdk-8-jdk \
11+
&& apt-get install -y python3.8 python3-pip wget software-properties-common openjdk-8-jdk \
1212
&& export JAVA_HOME \
1313
&& pip3 install --upgrade pip \
14-
&& pip3 install --no-cache -r requirements.txt \
14+
&& pip3 install --no-cache-dir -r requirements.txt \
1515
&& wget --no-verbose http://apache.mirror.iphh.net/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
1616
&& tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
1717
&& mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark \
1818
&& rm spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
19-
&& apt-get remove -y curl bzip2 \
20-
&& apt-get autoremove -y \
2119
&& apt-get clean

README.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,29 @@ The docker image may assume local AWS configurations and secrets on run for spec
2121
docker run --rm=true -v ~/.aws:/root/.aws <etc...>
2222
```
2323

24-
This image can be extended to run any PySpark .py script using python3 or spark-submit.
24+
This image can be extended to run any PySpark .py script using `python3`.
2525

26+
## Example
27+
Set up your local docker container that will run your scripts,
28+
in our case it could be `scripts/main.py` and one could set up the data in `data/*`:
2629
```docker
2730
FROM dirkscgm/pyspark3:latest
2831
2932
WORKDIR /app
33+
3034
COPY scripts/* scripts/
35+
COPY data/* data/
36+
37+
ENTRYPOINT ["python3"]
38+
CMD ["scripts/main.py"]
39+
```
40+
41+
Build the local image:
42+
```shell
43+
docker build -t pyspark3 .
44+
```
3145

32-
ENTRYSCRIPT ["python3", "scripts/main.py"]
33-
```
46+
Run the container with the CMD set to the main entrypoint of the spark application:
47+
```shell
48+
docker run --rm=true pyspark3
49+
```

0 commit comments

Comments
 (0)