Branches · Till-Ole Herbst / Llama.Cpp · GitLab

cuda-70b-2

f7bb5e91 · CUDA: GQA implementation · Jul 22, 2023
ggml-backends

d273bfd2 · allocator: cleanup, more comments · Jul 22, 2023
ggml-backends-metal

d45c1631 · metal : rewrite to fit new backend interface correctly (WIP) · Jul 20, 2023
refactor-mpi

04923631 · mpi : fix after master merge · Jul 09, 2023
llama_server_completions

26cc1bd7 · llama : uniform variable names + struct init · Jul 05, 2023
llama_server_timings

ff6e39f1 · use javascript generators as much cleaner API · Jul 05, 2023
test-mac-os-ci

f46db27e · ci : disable FMA on Mac OS · Jul 05, 2023
try-fix-metal

5cc672a9 · metal : try to utilize more of the shared memory using smaller views · Jun 26, 2023
avoid-gnu-source

78fafcaf · ggml : do not use _GNU_SOURCE gratuitously · Jun 25, 2023
fix_clblast

20054a38 · Fix directory name · May 27, 2023
chunks

a1cdd29c · ggml : rms_norm in chunks · May 20, 2023
steering

95dc4d72 · Merge 'origin/master' into steering · May 19, 2023
f16c

40ec4882 · ggml : use F16C conversion when available · May 17, 2023
dequantize-matmul-3-gg

a3e6d622 · cuda : alternative q4_q8 kernel · May 12, 2023
remove-vzip

e116eb63 · ggml : speed-up Q5_0 + Q5_1 at 4 threads · May 11, 2023
jed/spm-clblast

4baa8563 · Fix build · May 06, 2023
ci_cublas

31ff9e2e · ci : add cublas to windows release · May 03, 2023
q4_3-range-fix

102cd980 · ggml : Q4_3c using 2x "Full range" approach · Apr 23, 2023
q4_0-q4_2-range-fix

71e6ae37 · ggml : continue from #729 (wip) · Apr 22, 2023
gg/rmse_quantization

a0242a83 · Minor, plus rebase on master · Apr 22, 2023

Prev
1
…
8
9
10
11
12
13
Next