Skip to content

Helm Chart and Dockerfile for deploying Spark History Server in Kubernetes

License

Notifications You must be signed in to change notification settings

austinorth/spark-history-server

Repository files navigation

Artifact Hub Docker Image

Spark History Server

A Helm chart and Docker container for deploying Spark History Server (Spark Web UI) to monitor the metrics and performance of your Spark jobs.

Overview

This project provides two main components:

  1. Docker Image: A ready-to-use Docker image for Spark History Server configured to read Spark Event Logs from S3
  2. Helm Chart: A Kubernetes Helm chart for easy deployment of Spark History Server on Kubernetes clusters

The Spark History Server provides a web interface that visualizes information about completed Spark applications, including event timelines, stages, tasks, and executor metrics, helping you diagnose and troubleshoot Spark job performance.

Quick Start

Using the Helm Chart

# Add the Helm repository
helm repo add austinorth https://austinorth.github.io/spark-history-server
helm repo update

# Install the chart with your S3 bucket information
helm install spark-history-server austinorth/spark-history-server \
  --namespace spark-history-server \
  --create-namespace \
  --set sparkHistoryOpts="-Dspark.history.fs.logDirectory=s3a://YOUR-BUCKET/spark-event-logs/" \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="YOUR-IRSA-ROLE-ARN"

For detailed chart configuration options, see the Helm Chart README.

Using the Docker Image

# Run the Docker container with S3 access
docker run -d -p 18080:18080 \
  -e SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=s3a://YOUR-BUCKET/spark-event-logs/ \
  -Dspark.hadoop.fs.s3a.access.key=YOUR_AWS_ACCESS_KEY \
  -Dspark.hadoop.fs.s3a.secret.key=YOUR_AWS_SECRET_KEY" \
  austinorth/spark-history-server

Docker Image Details

The provided Docker image:

  • Based on Amazon Corretto 8 with Maven 3.6
  • Includes Spark 3.3.2 (without Hadoop)
  • Configured with required AWS/S3 dependencies
  • Default port: 18080

Building the Docker Image Locally

git clone https://github.com/austinorth/spark-history-server.git
cd spark-history-server
docker build -t your-username/spark-history-server:latest .

Helper Script for Local Testing

The repository includes a convenient script for running Spark History Server locally:

# Start the server pointing to your S3 bucket
sh launch_spark_history_server_locally.sh start -sb your-bucket-name -sp path/to/events

Run sh launch_spark_history_server_locally.sh help for all available options.

Helm Chart Details

The Helm chart provides:

  • Easy configuration for S3 event log sources
  • AWS IAM Role for Service Account (IRSA) support
  • Customizable resource requests and limits
  • Ingress support for exposing the service
  • Readiness and liveness probes

For complete configuration options, see the Helm Chart README.

AWS IAM Setup

If deploying in AWS EKS with S3 access:

  1. Create an IAM policy for S3 read access to your event logs bucket
  2. Create an IAM role for service account (IRSA):
eksctl create iamserviceaccount \
  --cluster=your-eks-cluster \
  --name=spark-history-server \
  --namespace=spark-history-server \
  --attach-policy-arn=arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
  --approve

Community

Give us a star ⭐️ - If you are using Spark History Server, we would love a star ❤️

License

This project is licensed under the Apache License 2.0.

About

Helm Chart and Dockerfile for deploying Spark History Server in Kubernetes

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Contributors 7