-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add entrypoint container images #819
Conversation
Reviewer's Guide by SourceryThis pull request introduces functionality to add entrypoints to container images for whisper-server and llama-server. It adds the Sequence diagram for adding entrypoints to container imagessequenceDiagram
participant CB as container_build.sh
participant A as add_entrypoints()
participant AE as add_entrypoint()
participant P as podman/conman
CB->>A: build()
A->>AE: add_entrypoint "whisper-server"
AE->>P: Build image with entrypoint
P-->>AE: Image built
A->>AE: add_entrypoint "llama-server"
AE->>P: Build image with entrypoint
P-->>AE: Image built
AE-->>A: Entrypoints added
A-->>CB: Complete
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
@benoitf PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rhatdan - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider using a single
add_entrypoint
call with a loop inside the function to reduce code duplication. - It might be better to pass the entrypoint script name as an argument to
add_entrypoints
.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
container_build.sh
Outdated
add_entrypoint() { | ||
containerfile=$(mktemp) | ||
cat > ${containerfile} <<EOF | ||
FROM $2 | ||
ENTRYPOINT [ "/usr/bin/$3.sh" ] | ||
EOF | ||
eval $1 --no-cache -t "$2-$3" -f ${containerfile} . | ||
rm ${containerfile} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Consider ensuring cleanup of temporary files on errors.
If the eval or the build command fails, the temporary file created by mktemp may remain. Using a trap to remove the file on error would improve the robustness of this function.
container_build.sh
Outdated
FROM $2 | ||
ENTRYPOINT [ "/usr/bin/$3.sh" ] | ||
EOF | ||
eval $1 --no-cache -t "$2-$3" -f ${containerfile} . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on my end it's failing on --no-cache
parameter not found in podman CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems $1 $2 $3 are not the expected parameters
1 = podman
2 = build
3 = whisper-server
so changing the order of the arguments in the script, I was able to build an image and start it using
and was able to use curl to query the model and it worked 👍
|
314ce8c
to
245f6db
Compare
Install podman-remote and ramalama so we can use ramalama from within a container. $ podman run --env XDG_RUNTIME_DIR -v $HOME:$HOME -v /run/user:/run/user --userns=keep-id -ti --privileged quay.io/ramalama/ramalama ramalama run granite Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
@benoitf This ramalama image now contains the upstream version of RamaLama along with podman-remote, so we should be able to run RamaLama in this container. With a command like: $ podman run --env XDG_RUNTIME_DIR -v $HOME:$HOME -v /run/user:/run/user --userns=keep-id -ti --privileged quay.io/ramalama/ramalama ramalama run granite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried the dedicated images
I think there is an issue in the script (not related to this PR)
I provided -e MODEL_CHAT_FORMAT="llama-2"
and got a core dumped about a missing --jinja
podman run --rm -it --env=HOME=/tmp --cap-drop=all --security-opt=no-new-privileges --init --pull=newer -v $HOME/.local/share/containers/podman-desktop/extensions-storage/redhat.ai-lab/models:/tmp/models -p 9000:9000 --device /dev/dri -e PORT=9000 -e HOST=0.0.0.0 -e MODEL_PATH=/tmp/models/hf.bartowski.granite-3.1-8b-instruct-GGUF/granite-3.1-8b-instruct-Q4_K_M.gguf -e MODEL_CHAT_FORMAT="llama-2" quay.io/ramalama/ramalama-llama-server:latest
terminate called after throwing an instance of 'std::runtime_error'
what(): error: the supplied chat template is not supported: llama-2
note: llama.cpp was started without --jinja, we only support commonly used templates
/usr/bin/llama-server.sh: line 14: 3 Aborted (core dumped) llama-server --model ${MODEL_PATH} --host ${HOST:=0.0.0.0} --port ${PORT:=8001} --gpu_layers ${GPU_LAYERS:=0} ${CHAT_FORMAT}
it seems that llama.cpp server accept "llama2" and crash with "llama-2"
but llamacpp python is using llama-2
https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#chat-completion
I'm wondering if we could for example replace "llama-2" by "llama2" in the llama-sever script to have a better backward compatibility
# link podman-remote to podman for use by RamaLama | ||
ln -sf /usr/bin/podman-remote /usr/bin/podman | ||
git clone https://github.com/containers/ramalama | ||
cd ramalama |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a pip install? At least it wouldn't be installing from main branch, a more unstable target...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it just copy the files from where we are.
Like if I change some scripts I would assume it's embedding the files from my current working directory (can be my changes, the tag, etc) rather than the latest stable or main branch
Also if I checkout 2 months after it would use the same files (the one stored in the current working copy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a chicken and egg situation:
Currently we are building the images at the same time as the release. Juggling the two at the same time is going to be difficult.
When I build the 0.7.0 image, I usually do this before cutting the release. Installing the older version say 0.6.0 into the 0.7.0 image seems wrong. I don't think that is the proper solution.
Building a users home checked out version is also very risky, because that version may never be released to github period. We could change the scripts to allow it to install the current version if this is really necessary, but I don't think it is, and users could hack their scripts in themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about it over the week-end and my conclusion would be that there should be a dedicated image with only the CLI (no any inference implementation like llama.cpp).
ramalama-cli image would be the image for the python code of RamaLama that will the run containers (so with podman-remote packaged inside this image)
And I would not expect to have podman-remote or any python code/dependency in any of the ramalama image that provides llama.cpp
And then the RamaLama cli should also be a smaller image (and the other as well)
ramalama-cli (python code + podman-remote)--> start containers like llama.cpp or whisper.cpp or vLLM images (that only contain the runner)
Building a users home checked out version is also very risky, because that version may never be released to github period
if I do 'make images' I expect that it builds a specific tag like :next
or :dev
or another one from my local copy.
Then I can test the images locally.
The CI job would publish only for tags or from main branch code so I don't see any issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about it over the week-end and my conclusion would be that there should be a dedicated image with only the CLI (no any inference implementation like llama.cpp).
ramalama-cli image would be the image for the python code of RamaLama that will the run containers (so with podman-remote packaged inside this image)
And I would not expect to have podman-remote or any python code/dependency in any of the ramalama image that provides llama.cpp
And then the RamaLama cli should also be a smaller image (and the other as well)
ramalama-cli (python code + podman-remote)--> start containers like llama.cpp or whisper.cpp or vLLM images (that only contain the runner)
This almost works, except for on Windows and now we have to package and maintain python on Windows, not good.
Building a users home checked out version is also very risky, because that version may never be released to github period
if I do 'make images' I expect that it builds a specific tag like
:next
or:dev
or another one from my local copy. Then I can test the images locally.The CI job would publish only for tags or from main branch code so I don't see any issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a chicken and egg situation:
I think we solve the chicken and egg situation, if we do something like:
COPY ../../* /ramalama/
(maybe even a create an .rpm/tarball during the build script, copy it, install it).
git clone main branch, could be in any sort of random state.
Currently we are building the images at the same time as the release. Juggling the two at the same time is going to be difficult.
When I build the 0.7.0 image, I usually do this before cutting the release. Installing the older version say 0.6.0 into the 0.7.0 image seems wrong. I don't think that is the proper solution.
Building a users home checked out version is also very risky, because that version may never be released to github period. We could change the scripts to allow it to install the current version if this is really necessary, but I don't think it is, and users could hack their scripts in themselves.
I don't think it is worth managing another Image for adding RamaLama to the base image. Their is a maintainence issue and a pain for every user who wants to use RamaLama within a container. Lets get this merged and then we can debate ways to get the current RamaLama into the container at Runtime. Adding a -v .:/run/ramalama might be enough. But I would like to get this in for this weeks release. |
Install podman-remote and ramalama so we can use ramalama
from within a container.
$ podman run --env XDG_RUNTIME_DIR -v $HOME:$HOME -v /run/user:/run/user --userns=keep-id -ti --privileged quay.io/ramalama/ramalama ramalama run granite
Signed-off-by: Daniel J Walsh dwalsh@redhat.com