Skip to content

feat: fundamental codebase refactor #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 87 commits into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
c785fd5
nn.
b4rtaz Feb 4, 2025
e13e4f3
ci.
b4rtaz Feb 4, 2025
c08751a
wvla fix.
b4rtaz Feb 4, 2025
9880c42
fix include.
b4rtaz Feb 4, 2025
37d62c0
fix include.
b4rtaz Feb 4, 2025
834b017
fix include.
b4rtaz Feb 4, 2025
5f33bc7
unique_ptr fix.
b4rtaz Feb 4, 2025
e9ca45a
include fix.
b4rtaz Feb 4, 2025
c3698ac
fix include.
b4rtaz Feb 4, 2025
4b5ba02
fix rms test.
b4rtaz Feb 4, 2025
c92554f
fix.
b4rtaz Feb 4, 2025
5590577
fix.
b4rtaz Feb 4, 2025
eb26560
build fix.
b4rtaz Feb 4, 2025
448e2a2
fix build.
b4rtaz Feb 4, 2025
a3f74f5
fix test.
b4rtaz Feb 4, 2025
540c199
convertF16toF32 for arm.
b4rtaz Feb 4, 2025
482a8b6
rope fix.
b4rtaz Feb 5, 2025
cd5b732
continuous memory flag.
b4rtaz Feb 5, 2025
8230191
optimization.
b4rtaz Feb 5, 2025
789a7eb
merge.
b4rtaz Feb 6, 2025
9ced6b8
move timer.
b4rtaz Feb 6, 2025
7a6f380
sgemm.
b4rtaz Feb 6, 2025
810093a
include.
b4rtaz Feb 6, 2025
ec0b27d
fix test.
b4rtaz Feb 6, 2025
acaece0
fixes.
b4rtaz Feb 7, 2025
f97fef6
silu arm.
b4rtaz Feb 7, 2025
52aa2d9
debug.
b4rtaz Feb 7, 2025
19613bb
rename.
b4rtaz Feb 7, 2025
44a99b4
fix.
b4rtaz Feb 7, 2025
9831f62
rope cache.
b4rtaz Feb 7, 2025
fc2f72a
matmul_Q80_Q40_F32.
b4rtaz Feb 7, 2025
c0e1d94
quantizeF32toQ80.
b4rtaz Feb 7, 2025
fb38158
fix debug.
b4rtaz Feb 7, 2025
6ecfb36
mul_F32.
b4rtaz Feb 7, 2025
738ec27
softmax_F32.
b4rtaz Feb 8, 2025
8ccbb07
refactor.
b4rtaz Feb 8, 2025
ff80ee1
stats.
b4rtaz Feb 8, 2025
0d9e7ed
fix.
b4rtaz Feb 9, 2025
abf9d9f
refactor.
b4rtaz Feb 9, 2025
0ca0ab8
chat.
b4rtaz Feb 9, 2025
969848a
api server (#158)
myan-o Feb 9, 2025
67e48c5
build fix.
b4rtaz Feb 9, 2025
ca7a7c2
fix build.
b4rtaz Feb 9, 2025
9228d98
inv rms.
b4rtaz Feb 9, 2025
85a1700
fix.
b4rtaz Feb 9, 2025
b9ac96c
fix test.
b4rtaz Feb 9, 2025
807dd58
rename.
b4rtaz Feb 9, 2025
d9883ed
maxPos fix.
b4rtaz Feb 9, 2025
050d7ad
fix max pos.
b4rtaz Feb 9, 2025
bf91008
fix pointer.
b4rtaz Feb 9, 2025
9cfdb0f
special vocab.
b4rtaz Feb 9, 2025
0cec4c9
fix api.
b4rtaz Feb 9, 2025
2b162c1
fix api restored position.
b4rtaz Feb 10, 2025
d2c9efe
rmsNorm_F32 avx2.
b4rtaz Feb 10, 2025
80af625
fixes.
b4rtaz Feb 10, 2025
0f56e84
cleaning.
b4rtaz Feb 10, 2025
8908ac3
matmul_Q80_Q40_F32 avx2.
b4rtaz Feb 11, 2025
0236e4a
fix.
b4rtaz Feb 11, 2025
c969589
fix.
b4rtaz Feb 11, 2025
0576a7f
fix.
b4rtaz Feb 11, 2025
577ef14
fix.
b4rtaz Feb 11, 2025
bbbf6f2
fix.
b4rtaz Feb 11, 2025
b9fb50e
mul_F32 avx2.
b4rtaz Feb 11, 2025
3eb6b78
Merge commit 'bbbf6f2ae724b9a0992592ab61bc10a12e626b61' into feat/nn
b4rtaz Feb 11, 2025
291122e
fix.
b4rtaz Feb 11, 2025
71a31bd
add_Q80_F32 avx2.
b4rtaz Feb 11, 2025
45a3ed1
fix test.
b4rtaz Feb 11, 2025
5d75f23
softmax_F32 avx2.
b4rtaz Feb 11, 2025
5ee4cdc
silu_F32 avx2.
b4rtaz Feb 11, 2025
e220039
matmul_Q80_Q40_F32 avx512.
b4rtaz Feb 11, 2025
d8511a4
fix avx512.
b4rtaz Feb 11, 2025
21b4619
fix softmax_F32 avx2.
b4rtaz Feb 11, 2025
6eb62a0
Merge branch 'main' into feat/nn
b4rtaz Feb 11, 2025
e17eaa2
max threads error.
b4rtaz Feb 11, 2025
f60a435
fix avx512.
b4rtaz Feb 11, 2025
29eb0ed
fix.
b4rtaz Feb 11, 2025
39a104f
network non-blocking mode.
b4rtaz Feb 12, 2025
37dc301
Merge branch 'main' into feat/nn
b4rtaz Feb 12, 2025
6b67aad
Merge branch 'main' into feat/nn
b4rtaz Feb 12, 2025
9724186
fix rope scaling.
b4rtaz Feb 12, 2025
683e143
fix memory leak.
b4rtaz Feb 12, 2025
46cac38
fix memory leak.
b4rtaz Feb 12, 2025
b90dd84
fix memory leak.
b4rtaz Feb 12, 2025
1f6b616
fix.
b4rtaz Feb 12, 2025
cd0203a
simplifying logic.
b4rtaz Feb 12, 2025
2663230
include fix.
b4rtaz Feb 12, 2025
3d39a26
fixes.
b4rtaz Feb 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 14 additions & 32 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@ on:
pull_request:
branches:
- main
- feat/nn
push:
branches:
- main
- feat/nn
jobs:
build-linux:
name: Linux
Expand All @@ -27,22 +29,12 @@ jobs:
id: build
run: |
make dllama
make dllama-api
make funcs-test
make quants-test
make tokenizer-test
make commands-test
make llama2-tasks-test
- name: funcs-test
run: ./funcs-test
- name: quants-test
run: ./quants-test
- name: tokenizer-test
run: ./tokenizer-test
- name: commands-test
run: ./commands-test
- name: llama2-tasks-test
run: ./llama2-tasks-test
make nn-cpu-test
make nn-cpu-ops-test
- name: nn-cpu-test
run: ./nn-cpu-test
- name: nn-cpu-ops-test
run: ./nn-cpu-ops-test

build-windows:
name: Windows
Expand All @@ -57,19 +49,9 @@ jobs:
id: build
run: |
make dllama
make dllama-api
make funcs-test
make quants-test
make tokenizer-test
make commands-test
make llama2-tasks-test
- name: funcs-test
run: ./funcs-test
- name: quants-test
run: ./quants-test
- name: tokenizer-test
run: ./tokenizer-test
- name: commands-test
run: ./commands-test
- name: llama2-tasks-test
run: ./llama2-tasks-test
make nn-cpu-test
make nn-cpu-ops-test
- name: nn-cpu-test
run: ./nn-cpu-test
- name: nn-cpu-ops-test
run: ./nn-cpu-ops-test
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
*.dSYM
*.data
*.temp
*.tmp
__pycache__

*-test
/socket-benchmark
/models
main
run*.sh
server
Expand Down
104 changes: 54 additions & 50 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,59 +1,63 @@
CXX = g++
CXXFLAGS = -std=c++11 -Werror -O3 -march=native -mtune=native -Wformat -Werror=format-security
CXXFLAGS = -std=c++11 -Werror -Wformat -Werror=format-security

ifndef TERMUX_VERSION
CXXFLAGS += -march=native -mtune=native
endif

ifdef DEBUG
CXXFLAGS += -g
else
CXXFLAGS += -O3
endif

ifdef WVLA
CXXFLAGS += -Wvla-extension
endif

# Conditional settings for Windows
ifeq ($(OS),Windows_NT)
LIBS = -lws2_32 # or -lpthreadGC2 if needed
DELETECMD = del /f
LIBS = -lws2_32
DELETE_CMD = del /f
else
LIBS = -lpthread
DELETECMD = rm -fv
DELETE_CMD = rm -fv
endif

.PHONY: all apps clean tests
.PHONY: clean dllama

apps: dllama dllama-api socket-benchmark
tests: funcs-test quants-test tokenizer-test commands-test llama2-tasks-test
all: apps tests
clean:
$(DELETECMD) *.o dllama dllama-* socket-benchmark mmap-buffer-* *-test *.exe
utils: src/utils.cpp
$(CXX) $(CXXFLAGS) -c src/utils.cpp -o utils.o
quants: src/quants.cpp
$(CXX) $(CXXFLAGS) -c src/quants.cpp -o quants.o
funcs: src/funcs.cpp
$(CXX) $(CXXFLAGS) -c src/funcs.cpp -o funcs.o
commands: src/commands.cpp
$(CXX) $(CXXFLAGS) -c src/commands.cpp -o commands.o
socket: src/socket.cpp
$(CXX) $(CXXFLAGS) -c src/socket.cpp -o socket.o
transformer: src/utils.cpp
$(CXX) $(CXXFLAGS) -c src/transformer.cpp -o transformer.o
tasks: src/tasks.cpp
$(CXX) $(CXXFLAGS) -c src/tasks.cpp -o tasks.o
llama2-tasks: src/llama2-tasks.cpp
$(CXX) $(CXXFLAGS) -c src/llama2-tasks.cpp -o llama2-tasks.o
mixtral-tasks: src/mixtral-tasks.cpp
$(CXX) $(CXXFLAGS) -c src/mixtral-tasks.cpp -o mixtral-tasks.o
tokenizer: src/tokenizer.cpp
$(CXX) $(CXXFLAGS) -c src/tokenizer.cpp -o tokenizer.o
app: src/app.cpp
$(CXX) $(CXXFLAGS) -c src/app.cpp -o app.o

dllama: src/apps/dllama/dllama.cpp utils quants funcs commands socket transformer tasks llama2-tasks mixtral-tasks tokenizer app
$(CXX) $(CXXFLAGS) src/apps/dllama/dllama.cpp -o dllama utils.o quants.o funcs.o commands.o socket.o transformer.o tasks.o llama2-tasks.o mixtral-tasks.o tokenizer.o app.o $(LIBS)
dllama-api: src/apps/dllama-api/dllama-api.cpp utils quants funcs commands socket transformer tasks llama2-tasks mixtral-tasks tokenizer app
$(CXX) $(CXXFLAGS) src/apps/dllama-api/dllama-api.cpp -o dllama-api utils.o quants.o funcs.o commands.o socket.o transformer.o tasks.o llama2-tasks.o mixtral-tasks.o tokenizer.o app.o $(LIBS)
socket-benchmark: src/apps/socket-benchmark/socket-benchmark.cpp socket
$(CXX) $(CXXFLAGS) src/apps/socket-benchmark/socket-benchmark.cpp -o socket-benchmark socket.o $(LIBS)

funcs-test: src/funcs-test.cpp funcs utils quants
$(CXX) $(CXXFLAGS) src/funcs-test.cpp -o funcs-test funcs.o utils.o quants.o $(LIBS)
quants-test: src/quants.cpp utils quants
$(CXX) $(CXXFLAGS) src/quants-test.cpp -o quants-test utils.o quants.o $(LIBS)
tokenizer-test: src/tokenizer-test.cpp tokenizer funcs commands utils quants
$(CXX) $(CXXFLAGS) src/tokenizer-test.cpp -o tokenizer-test tokenizer.o funcs.o commands.o utils.o quants.o $(LIBS)
commands-test: src/commands-test.cpp funcs commands utils quants transformer socket
$(CXX) $(CXXFLAGS) src/commands-test.cpp -o commands-test funcs.o commands.o utils.o quants.o transformer.o socket.o $(LIBS)
llama2-tasks-test: src/llama2-tasks-test.cpp utils quants funcs commands socket transformer tasks llama2-tasks tokenizer
$(CXX) $(CXXFLAGS) src/llama2-tasks-test.cpp -o llama2-tasks-test utils.o quants.o funcs.o commands.o socket.o transformer.o tasks.o llama2-tasks.o tokenizer.o $(LIBS)
$(DELETE_CMD) *.o dllama dllama-* socket-benchmark mmap-buffer-* *-test *.exe

# nn
nn-quants.o: src/nn/nn-quants.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
nn-core.o: src/nn/nn-core.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
nn-executor.o: src/nn/nn-executor.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
nn-network.o: src/nn/nn-network.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
llamafile-sgemm.o: src/nn/llamafile/sgemm.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
nn-cpu-ops.o: src/nn/nn-cpu-ops.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
nn-cpu.o: src/nn/nn-cpu.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
nn-cpu-test: src/nn/nn-cpu-test.cpp nn-quants.o nn-core.o nn-executor.o llamafile-sgemm.o nn-cpu-ops.o nn-cpu.o
$(CXX) $(CXXFLAGS) $^ -o $@ $(LIBS)
nn-cpu-ops-test: src/nn/nn-cpu-ops-test.cpp nn-quants.o nn-core.o nn-executor.o llamafile-sgemm.o nn-cpu.o
$(CXX) $(CXXFLAGS) $^ -o $@ $(LIBS)

# llm
tokenizer.o: src/tokenizer.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
llm.o: src/llm.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
app.o: src/app.cpp
$(CXX) $(CXXFLAGS) -c $^ -o $@
tokenizer-test: src/tokenizer-test.cpp tokenizer.o
$(CXX) $(CXXFLAGS) $^ -o $@ $(LIBS)
dllama: src/dllama.cpp nn-quants.o nn-core.o nn-executor.o nn-network.o llamafile-sgemm.o nn-cpu-ops.o nn-cpu.o tokenizer.o llm.o app.o
$(CXX) $(CXXFLAGS) $^ -o $@ $(LIBS)
dllama-api: src/dllama-api.cpp nn-quants.o nn-core.o nn-executor.o nn-network.o llamafile-sgemm.o nn-cpu-ops.o nn-cpu.o tokenizer.o llm.o app.o
$(CXX) $(CXXFLAGS) $^ -o $@ $(LIBS)
8 changes: 4 additions & 4 deletions src/apps/dllama-api/types.hpp → src/api-types.hpp
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#ifndef DLLAMA_API_TYPES_HPP
#define DLLAMA_API_TYPES_HPP
#ifndef API_TYPES_HPP
#define API_TYPES_HPP

#include <string>

#include "../../common/json.hpp"
#include "json.hpp"

using json = nlohmann::json;

Expand Down Expand Up @@ -145,4 +145,4 @@ std::vector<ChatMessage> parseChatMessages(json &json){
return messages;
}

#endif
#endif
Loading