You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Append the '--gpu_split auto' flag for multi-GPU inference
91
+
python examples/chat.py -m <path_to_model> -mode llama -gs auto
55
92
```
56
93
57
94
58
-
The `-mode` argument chooses the prompt format to use. `llama` is for the Llama(2)-chat finetunes, while `codellama`
59
-
probably works better for CodeLlama-instruct. `raw` will produce a simple chatlog-style chat that works with base
95
+
The `-mode` argument chooses the prompt format to use. `raw` will produce a simple chatlog-style chat that works with base
60
96
models and various other finetunes. Run with `-modes` for a list of all available prompt formats. You can also provide
61
97
a custom system prompt with `-sp`.
62
98
@@ -100,8 +136,11 @@ C++ extension in the process. Instead, the extension will be built the first tim
100
136
101
137
### Method 2: Install from release (with prebuilt extension)
102
138
103
-
Releases are available [here](https://github.com/turboderp/exllamav2/releases), with prebuilt wheels that contain the
104
-
extension binaries. Make sure to grab the right version, matching your platform, Python version (`cp`) and CUDA version.
139
+
Releases are available [here](https://github.com/turboderp/exllamav2/releases), with prebuilt wheels that contain the extension binaries. Make sure to grab
140
+
the right version, matching your platform, Python version (`cp`) and CUDA version. Crucially, you must also match
141
+
the prebuilt wheel with your PyTorch version, since the Torch C++ extension ABI breaks with every new version of
142
+
PyTorch.
143
+
105
144
Either download an appropriate wheel or install directly from the appropriate URL:
106
145
107
146
```
@@ -113,15 +152,12 @@ can also be installed this way, and it will build the extension while installing
113
152
114
153
### Method 3: Install from PyPI
115
154
116
-
A PyPI package is available as well. It can be installed with:
155
+
A PyPI package is available as well. This is the same as the JIT version (see above). It can be installed with:
117
156
118
157
```
119
158
pip install exllamav2
120
159
```
121
160
122
-
The version available through PyPI is the JIT version (see above). Still working on a solution for distributing
0 commit comments