Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entrypoint container images #819

Merged
merged 1 commit into from
Feb 17, 2025
Merged

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Feb 14, 2025

Install podman-remote and ramalama so we can use ramalama
from within a container.

$ podman run --env XDG_RUNTIME_DIR -v $HOME:$HOME -v /run/user:/run/user --userns=keep-id -ti --privileged quay.io/ramalama/ramalama ramalama run granite

Signed-off-by: Daniel J Walsh dwalsh@redhat.com

Copy link
Contributor

sourcery-ai bot commented Feb 14, 2025

Reviewer's Guide by Sourcery

This pull request introduces functionality to add entrypoints to container images for whisper-server and llama-server. It adds the add_entrypoint and add_entrypoints functions to handle the creation of temporary Containerfiles and image building. The changes integrate these functions into the existing build process for both single and multi-architecture images.

Sequence diagram for adding entrypoints to container images

sequenceDiagram
  participant CB as container_build.sh
  participant A as add_entrypoints()
  participant AE as add_entrypoint()
  participant P as podman/conman

  CB->>A: build()
  A->>AE: add_entrypoint "whisper-server"
  AE->>P: Build image with entrypoint
  P-->>AE: Image built
  A->>AE: add_entrypoint "llama-server"
  AE->>P: Build image with entrypoint
  P-->>AE: Image built
  AE-->>A: Entrypoints added
  A-->>CB: Complete
Loading

File-Level Changes

Change Details Files
Introduces a function to add entrypoints to container images.
  • Introduces add_entrypoint function to create a temporary Containerfile, define an entrypoint, build the image, and then remove the temporary file.
  • Introduces add_entrypoints function to add whisper-server and llama-server entrypoints.
container_build.sh
Integrates the new entrypoint functionality into the image building process.
  • Calls add_entrypoints after building a single-architecture container image.
  • Calls add_entrypoints after building a multi-architecture container image.
container_build.sh

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@rhatdan
Copy link
Member Author

rhatdan commented Feb 14, 2025

@benoitf PTAL

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rhatdan - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider using a single add_entrypoint call with a loop inside the function to reduce code duplication.
  • It might be better to pass the entrypoint script name as an argument to add_entrypoints.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 40 to 48
add_entrypoint() {
containerfile=$(mktemp)
cat > ${containerfile} <<EOF
FROM $2
ENTRYPOINT [ "/usr/bin/$3.sh" ]
EOF
eval $1 --no-cache -t "$2-$3" -f ${containerfile} .
rm ${containerfile}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider ensuring cleanup of temporary files on errors.

If the eval or the build command fails, the temporary file created by mktemp may remain. Using a trap to remove the file on error would improve the robustness of this function.

FROM $2
ENTRYPOINT [ "/usr/bin/$3.sh" ]
EOF
eval $1 --no-cache -t "$2-$3" -f ${containerfile} .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on my end it's failing on --no-cache parameter not found in podman CLI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems $1 $2 $3 are not the expected parameters
1 = podman
2 = build
3 = whisper-server

@benoitf
Copy link
Contributor

benoitf commented Feb 14, 2025

so changing the order of the arguments in the script, I was able to build an image and start it using

podman run --rm -it --env=HOME=/tmp --cap-drop=all --security-opt=no-new-privileges --init -v $HOME/.local/share/containers/podman-desktop/extensions-storage/redhat.ai-lab/models:/tmp/models -p 9000:9000 --device /dev/dri -e PORT=9000 -e MODEL_PATH=/tmp/models/hf.bartowski.granite-3.1-8b-instruct-GGUF/granite-3.1-8b-instruct-Q4_K_M.gguf ramalama-llama-server:latest

and was able to use curl to query the model and it worked 👍

 curl --location 'http://localhost:9000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "What is the capital of France?",
      "role": "user"
    }
  ]
}'
{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"The capital of France is Paris.\n\....

@rhatdan rhatdan force-pushed the entrypoint branch 2 times, most recently from 314ce8c to 245f6db Compare February 14, 2025 19:26
Install podman-remote and ramalama so we can use ramalama
from within a container.

$ podman run --env XDG_RUNTIME_DIR -v $HOME:$HOME -v /run/user:/run/user --userns=keep-id -ti --privileged quay.io/ramalama/ramalama ramalama run granite

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
@rhatdan
Copy link
Member Author

rhatdan commented Feb 14, 2025

@benoitf This ramalama image now contains the upstream version of RamaLama along with podman-remote, so we should be able to run RamaLama in this container.

With a command like:

$ podman run --env XDG_RUNTIME_DIR -v $HOME:$HOME -v /run/user:/run/user --userns=keep-id -ti --privileged quay.io/ramalama/ramalama ramalama run granite

Copy link
Contributor

@benoitf benoitf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried the dedicated images

I think there is an issue in the script (not related to this PR)

I provided -e MODEL_CHAT_FORMAT="llama-2" and got a core dumped about a missing --jinja

podman run --rm -it --env=HOME=/tmp --cap-drop=all --security-opt=no-new-privileges --init --pull=newer -v $HOME/.local/share/containers/podman-desktop/extensions-storage/redhat.ai-lab/models:/tmp/models -p 9000:9000 --device /dev/dri -e PORT=9000 -e HOST=0.0.0.0 -e MODEL_PATH=/tmp/models/hf.bartowski.granite-3.1-8b-instruct-GGUF/granite-3.1-8b-instruct-Q4_K_M.gguf -e MODEL_CHAT_FORMAT="llama-2" quay.io/ramalama/ramalama-llama-server:latest
terminate called after throwing an instance of 'std::runtime_error'
  what():  error: the supplied chat template is not supported: llama-2
note: llama.cpp was started without --jinja, we only support commonly used templates

/usr/bin/llama-server.sh: line 14:     3 Aborted                 (core dumped) llama-server --model ${MODEL_PATH} --host ${HOST:=0.0.0.0} --port ${PORT:=8001} --gpu_layers ${GPU_LAYERS:=0} ${CHAT_FORMAT}

it seems that llama.cpp server accept "llama2" and crash with "llama-2"

but llamacpp python is using llama-2
https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#chat-completion

I'm wondering if we could for example replace "llama-2" by "llama2" in the llama-sever script to have a better backward compatibility

# link podman-remote to podman for use by RamaLama
ln -sf /usr/bin/podman-remote /usr/bin/podman
git clone https://github.com/containers/ramalama
cd ramalama
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a pip install? At least it wouldn't be installing from main branch, a more unstable target...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it just copy the files from where we are.
Like if I change some scripts I would assume it's embedding the files from my current working directory (can be my changes, the tag, etc) rather than the latest stable or main branch

Also if I checkout 2 months after it would use the same files (the one stored in the current working copy)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a chicken and egg situation:

Currently we are building the images at the same time as the release. Juggling the two at the same time is going to be difficult.

When I build the 0.7.0 image, I usually do this before cutting the release. Installing the older version say 0.6.0 into the 0.7.0 image seems wrong. I don't think that is the proper solution.

Building a users home checked out version is also very risky, because that version may never be released to github period. We could change the scripts to allow it to install the current version if this is really necessary, but I don't think it is, and users could hack their scripts in themselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about it over the week-end and my conclusion would be that there should be a dedicated image with only the CLI (no any inference implementation like llama.cpp).

ramalama-cli image would be the image for the python code of RamaLama that will the run containers (so with podman-remote packaged inside this image)

And I would not expect to have podman-remote or any python code/dependency in any of the ramalama image that provides llama.cpp

And then the RamaLama cli should also be a smaller image (and the other as well)

ramalama-cli (python code + podman-remote)--> start containers like llama.cpp or whisper.cpp or vLLM images (that only contain the runner)

Building a users home checked out version is also very risky, because that version may never be released to github period

if I do 'make images' I expect that it builds a specific tag like :next or :dev or another one from my local copy.
Then I can test the images locally.

The CI job would publish only for tags or from main branch code so I don't see any issues

Copy link
Collaborator

@ericcurtin ericcurtin Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about it over the week-end and my conclusion would be that there should be a dedicated image with only the CLI (no any inference implementation like llama.cpp).

ramalama-cli image would be the image for the python code of RamaLama that will the run containers (so with podman-remote packaged inside this image)

And I would not expect to have podman-remote or any python code/dependency in any of the ramalama image that provides llama.cpp

And then the RamaLama cli should also be a smaller image (and the other as well)

ramalama-cli (python code + podman-remote)--> start containers like llama.cpp or whisper.cpp or vLLM images (that only contain the runner)

This almost works, except for on Windows and now we have to package and maintain python on Windows, not good.

Building a users home checked out version is also very risky, because that version may never be released to github period

if I do 'make images' I expect that it builds a specific tag like :next or :dev or another one from my local copy. Then I can test the images locally.

The CI job would publish only for tags or from main branch code so I don't see any issues

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a chicken and egg situation:

I think we solve the chicken and egg situation, if we do something like:

COPY ../../* /ramalama/

(maybe even a create an .rpm/tarball during the build script, copy it, install it).

git clone main branch, could be in any sort of random state.

Currently we are building the images at the same time as the release. Juggling the two at the same time is going to be difficult.

When I build the 0.7.0 image, I usually do this before cutting the release. Installing the older version say 0.6.0 into the 0.7.0 image seems wrong. I don't think that is the proper solution.

Building a users home checked out version is also very risky, because that version may never be released to github period. We could change the scripts to allow it to install the current version if this is really necessary, but I don't think it is, and users could hack their scripts in themselves.

@rhatdan
Copy link
Member Author

rhatdan commented Feb 17, 2025

I don't think it is worth managing another Image for adding RamaLama to the base image. Their is a maintainence issue and a pain for every user who wants to use RamaLama within a container. Lets get this merged and then we can debate ways to get the current RamaLama into the container at Runtime.

Adding a -v .:/run/ramalama might be enough. But I would like to get this in for this weeks release.

@rhatdan rhatdan merged commit c6a91af into containers:main Feb 17, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants