Skip to content

Commit c00eba8

Browse files
committed
Merge remote-tracking branch 'origin/main' into recook
2 parents 5df7a07 + b564d05 commit c00eba8

18 files changed

+785
-221
lines changed

.github/dependabot.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,12 @@ updates:
88
- package-ecosystem: "pip" # See documentation for possible values
99
directory: "/" # Location of package manifests
1010
schedule:
11-
interval: "weekly"
11+
interval: "daily"
1212
- package-ecosystem: "github-actions"
1313
directory: "/"
1414
schedule:
15-
interval: "weekly"
15+
interval: "daily"
16+
- package-ecosystem: "docker"
17+
directory: "/"
18+
schedule:
19+
interval: "daily"

.github/workflows/build-and-release.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
python -m pip install -e .[all]
3030
3131
- name: Build wheels
32-
uses: pypa/cibuildwheel@v2.17.0
32+
uses: pypa/cibuildwheel@v2.18.0
3333
env:
3434
# disable repair
3535
CIBW_REPAIR_WHEEL_COMMAND: ""
@@ -56,7 +56,7 @@ jobs:
5656
platforms: linux/arm64
5757

5858
- name: Build wheels
59-
uses: pypa/cibuildwheel@v2.17.0
59+
uses: pypa/cibuildwheel@v2.18.0
6060
env:
6161
CIBW_SKIP: "*musllinux* pp*"
6262
CIBW_REPAIR_WHEEL_COMMAND: ""

CHANGELOG.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,61 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.2.75]
11+
12+
- feat: Update llama.cpp to ggerganov/llama.cpp@13ad16af1231ab2d245d35df3295bcfa23de1305
13+
- fix: segfault for models without eos / bos tokens by @abetlen in d99a6ba607a4885fb00e63e967964aa41bdbbbcb
14+
- feat: add MinTokensLogitProcessor and min_tokens argument to server by @twaka in #1333
15+
- misc: Remove unnecessary metadata lookups by @CISC in #1448
16+
17+
## [0.2.74]
18+
19+
- feat: Update llama.cpp to ggerganov/llama.cpp@b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2
20+
- fix: Enable CUDA backend for llava by @abetlen in 7f59856fa6f3e23f07e12fc15aeb9359dc6c3bb4
21+
- docs: Fix typo in README.md by @yupbank in #1444
22+
23+
## [0.2.73]
24+
25+
- feat: Update llama.cpp to ggerganov/llama.cpp@25c6e82e7a1ad25a42b0894e87d9b5c557409516
26+
- fix: Clear kv cache at beginning of image chat formats to avoid bug when image is evaluated first by @abetlen in ac55d0a175115d1e719672ce1cb1bec776c738b1
27+
28+
## [0.2.72]
29+
30+
- fix(security): Remote Code Execution by Server-Side Template Injection in Model Metadata by @retr0reg in b454f40a9a1787b2b5659cd2cb00819d983185df
31+
- fix(security): Update remaining jinja chat templates to use immutable sandbox by @CISC in #1441
32+
33+
## [0.2.71]
34+
35+
- feat: Update llama.cpp to ggerganov/llama.cpp@911b3900dded9a1cfe0f0e41b82c7a29baf3a217
36+
- fix: Make leading bos_token optional for image chat formats, fix nanollava system message by @abetlen in 77122638b4153e31d9f277b3d905c2900b536632
37+
- fix: free last image embed in llava chat handler by @abetlen in 3757328b703b2cd32dcbd5853271e3a8c8599fe7
38+
39+
## [0.2.70]
40+
41+
- feat: Update llama.cpp to ggerganov/llama.cpp@c0e6fbf8c380718102bd25fcb8d2e55f8f9480d1
42+
- feat: fill-in-middle support by @CISC in #1386
43+
- fix: adding missing args in create_completion for functionary chat handler by @skalade in #1430
44+
- docs: update README.md @eltociear in #1432
45+
- fix: chat_format log where auto-detected format prints None by @balvisio in #1434
46+
- feat(server): Add support for setting root_path by @abetlen in 0318702cdc860999ee70f277425edbbfe0e60419
47+
- feat(ci): Add docker checks and check deps more frequently by @Smartappli in #1426
48+
- fix: detokenization case where first token does not start with a leading space by @noamgat in #1375
49+
- feat: Implement streaming for Functionary v2 + Bug fixes by @jeffrey-fong in #1419
50+
- fix: Use memmove to copy str_value kv_override by @abetlen in 9f7a85571ae80d3b6ddbd3e1bae407b9f1e3448a
51+
- feat(server): Remove temperature bounds checks for server by @abetlen in 0a454bebe67d12a446981eb16028c168ca5faa81
52+
- fix(server): Propagate flash_attn to model load by @dthuerck in #1424
53+
54+
## [0.2.69]
55+
56+
- feat: Update llama.cpp to ggerganov/llama.cpp@6ecf3189e00a1e8e737a78b6d10e1d7006e050a2
57+
- feat: Add llama-3-vision-alpha chat format by @abetlen in 31b1d95a6c19f5b615a3286069f181a415f872e8
58+
- fix: Change default verbose value of verbose in image chat format handlers to True to match Llama by @abetlen in 4f01c452b6c738dc56eacac3758119b12c57ea94
59+
- fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks by @abetlen in f116175a5a7c84569c88cad231855c1e6e59ff6e
60+
- fix: UTF-8 handling with grammars by @jsoma in #1415
61+
1062
## [0.2.68]
1163

12-
- feat: Update llama.cpp to ggerganov/llama.cpp@
64+
- feat: Update llama.cpp to ggerganov/llama.cpp@77e15bec6217a39be59b9cc83d6b9afb6b0d8167
1365
- feat: Add option to enable flash_attn to Lllama params and ModelSettings by @abetlen in 22d77eefd2edaf0148f53374d0cac74d0e25d06e
1466
- fix(ci): Fix build-and-release.yaml by @Smartappli in #1413
1567

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ build.debug:
1616
CMAKE_ARGS="-DCMAKE_BUILD_TYPE=Debug" python3 -m pip install --verbose --config-settings=cmake.verbose=true --config-settings=logging.level=INFO --config-settings=install.strip=false --editable .
1717

1818
build.cuda:
19-
CMAKE_ARGS="-DLLAMA_CUBLAS=on" python3 -m pip install --verbose -e .
19+
CMAKE_ARGS="-DLLAMA_CUDA=on" python3 -m pip install --verbose -e .
2020

2121
build.opencl:
2222
CMAKE_ARGS="-DLLAMA_CLBLAST=on" python3 -m pip install --verbose -e .

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -516,7 +516,7 @@ chat_handler = Llava15ChatHandler(clip_model_path="path/to/llava/mmproj.bin")
516516
llm = Llama(
517517
model_path="./path/to/llava/llama-model.gguf",
518518
chat_handler=chat_handler,
519-
n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
519+
n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
520520
)
521521
llm.create_chat_completion(
522522
messages = [
@@ -547,10 +547,10 @@ llm = Llama.from_pretrained(
547547
repo_id="vikhyatk/moondream2",
548548
filename="*text-model*",
549549
chat_handler=chat_handler,
550-
n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
550+
n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
551551
)
552552

553-
respoonse = llm.create_chat_completion(
553+
response = llm.create_chat_completion(
554554
messages = [
555555
{
556556
"role": "user",

llama_cpp/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
from .llama_cpp import *
22
from .llama import *
33

4-
__version__ = "0.2.68a1"
4+
__version__ = "0.2.75"

llama_cpp/_internals.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515

1616
from .llama_types import *
1717
from .llama_grammar import LlamaGrammar
18+
from ._utils import suppress_stdout_stderr
1819

1920
import llama_cpp.llama_cpp as llama_cpp
2021

@@ -47,9 +48,10 @@ def __init__(
4748
if not os.path.exists(path_model):
4849
raise ValueError(f"Model path does not exist: {path_model}")
4950

50-
self.model = llama_cpp.llama_load_model_from_file(
51-
self.path_model.encode("utf-8"), self.params
52-
)
51+
with suppress_stdout_stderr(disable=verbose):
52+
self.model = llama_cpp.llama_load_model_from_file(
53+
self.path_model.encode("utf-8"), self.params
54+
)
5355

5456
if self.model is None:
5557
raise ValueError(f"Failed to load model from file: {path_model}")
@@ -201,7 +203,7 @@ def detokenize(self, tokens: List[int], special: bool = False) -> bytes:
201203
# NOTE: Llama1 models automatically added a space at the start of the prompt
202204
# this line removes a leading space if the first token is a beginning of sentence token
203205
return (
204-
output[1:] if len(tokens) > 0 and tokens[0] == self.token_bos() else output
206+
output[1:] if len(tokens) > 0 and tokens[0] == self.token_bos() and output[0:1] == b' ' else output
205207
)
206208

207209
# Extra
@@ -810,4 +812,4 @@ def sample(
810812
def accept(self, ctx_main: _LlamaContext, id: int, apply_grammar: bool):
811813
if apply_grammar and self.grammar is not None:
812814
ctx_main.grammar_accept_token(self.grammar, id)
813-
self.prev.append(id)
815+
self.prev.append(id)

llama_cpp/_utils.py

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
import os
22
import sys
33

4-
import sys
54
from typing import Any, Dict
65

76
# Avoid "LookupError: unknown encoding: ascii" when open() called in a destructor
87
outnull_file = open(os.devnull, "w")
98
errnull_file = open(os.devnull, "w")
109

10+
STDOUT_FILENO = 1
11+
STDERR_FILENO = 2
12+
1113
class suppress_stdout_stderr(object):
1214
# NOTE: these must be "saved" here to avoid exceptions when using
1315
# this context manager inside of a __del__ method
@@ -22,12 +24,8 @@ def __enter__(self):
2224
if self.disable:
2325
return self
2426

25-
# Check if sys.stdout and sys.stderr have fileno method
26-
if not hasattr(self.sys.stdout, 'fileno') or not hasattr(self.sys.stderr, 'fileno'):
27-
return self # Return the instance without making changes
28-
29-
self.old_stdout_fileno_undup = self.sys.stdout.fileno()
30-
self.old_stderr_fileno_undup = self.sys.stderr.fileno()
27+
self.old_stdout_fileno_undup = STDOUT_FILENO
28+
self.old_stderr_fileno_undup = STDERR_FILENO
3129

3230
self.old_stdout_fileno = self.os.dup(self.old_stdout_fileno_undup)
3331
self.old_stderr_fileno = self.os.dup(self.old_stderr_fileno_undup)
@@ -47,15 +45,14 @@ def __exit__(self, *_):
4745
return
4846

4947
# Check if sys.stdout and sys.stderr have fileno method
50-
if hasattr(self.sys.stdout, 'fileno') and hasattr(self.sys.stderr, 'fileno'):
51-
self.sys.stdout = self.old_stdout
52-
self.sys.stderr = self.old_stderr
48+
self.sys.stdout = self.old_stdout
49+
self.sys.stderr = self.old_stderr
5350

54-
self.os.dup2(self.old_stdout_fileno, self.old_stdout_fileno_undup)
55-
self.os.dup2(self.old_stderr_fileno, self.old_stderr_fileno_undup)
51+
self.os.dup2(self.old_stdout_fileno, self.old_stdout_fileno_undup)
52+
self.os.dup2(self.old_stderr_fileno, self.old_stderr_fileno_undup)
5653

57-
self.os.close(self.old_stdout_fileno)
58-
self.os.close(self.old_stderr_fileno)
54+
self.os.close(self.old_stdout_fileno)
55+
self.os.close(self.old_stderr_fileno)
5956

6057

6158
class MetaSingleton(type):

0 commit comments

Comments
 (0)