Merge pull request #375 from makaveli10/update_trt_docs

zoq · web-flow · commit 2375924b45f9 · 2025-05-14T11:05:36.000-04:00
Update tensorrt_llm docker setup
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -99,35 +99,6 @@ jobs:
           push: true
           tags: ghcr.io/collabora/whisperlive-cpu:latest
   
-  build-and-push-docker-tensorrt:
-    needs: [run-tests, check-code-format]
-    timeout-minutes: 60
-    runs-on: ubuntu-22.04
-    if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/'))
-    steps:
-      - uses: actions/checkout@v2
-
-      - name: Log in to GitHub Container Registry
-        uses: docker/login-action@v1
-        with:
-          registry: ghcr.io
-          username: ${{ github.repository_owner }}
-          password: ${{ secrets.GHCR_TOKEN }}
-
-      - name: Docker Prune
-        run: docker system prune -af
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v1
-
-      - name: Build and push Docker GPU image
-        uses: docker/build-push-action@v2
-        with:
-          context: .
-          file: docker/Dockerfile.tensorrt
-          push: true
-          tags: ghcr.io/collabora/whisperlive-tensorrt:latest
-
   build-and-push-docker-gpu:
     needs: [run-tests, check-code-format, build-and-push-docker-cpu]
     timeout-minutes: 20
diff --git a/README.md b/README.md
@@ -142,12 +142,18 @@ client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/b
 ## Browser Extensions
 - Run the server with your desired backend as shown [here](https://github.com/collabora/WhisperLive?tab=readme-ov-file#running-the-server).
 - Transcribe audio directly from your browser using our Chrome or Firefox extensions. Refer to [Audio-Transcription-Chrome](https://github.com/collabora/whisper-live/tree/main/Audio-Transcription-Chrome#readme) and https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md
+
+## Whisper Live Server in Docker
+- GPU
+  - Faster-Whisper
+  ```bash
   docker run -it --gpus all -p 9090:9090 ghcr.io/collabora/whisperlive-gpu:latest
   ```
 
   - TensorRT. Refer to [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) for setup and more tensorrt backend configurations.
   ```bash
-  docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it ghcr.io/collabora/whisperlive-tensorrt
+  docker build . -f docker/Dockerfile.tensorrt -t whisperlive-tensorrt
+  docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it whisperlive-tensorrt
 
   # Build small.en engine
   bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en        # float16
@@ -173,8 +179,6 @@ client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/b
   docker run -it -p 9090:9090 ghcr.io/collabora/whisperlive-cpu:latest
   ```
 
-**Note**: By default we use "small" model size. To build docker image for a different model size, change the size in server.py and then build the docker image.
-
 ## Future Work
 - [ ] Add translation to other languages on top of transcription.
 
diff --git a/TensorRT_whisper.md b/TensorRT_whisper.md
@@ -8,7 +8,8 @@ We have only tested the TensorRT backend in docker so, we recommend docker for a
 
 - Run WhisperLive TensorRT in docker
 ```bash
-docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it ghcr.io/collabora/whisperlive-tensorrt:latest
+docker build . -f docker/Dockerfile.tensorrt -t whisperlive-tensorrt
+docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it whisperlive-tensorrt
 ```
 
 ## Whisper TensorRT Engine