Branches · Till-Ole Herbst / Llama.Cpp · GitLab

ceb/bert

7286b83d · BERT WIP · Feb 06, 2024
gg/convert-fix-byte-tokens

adcf16fd · py : fix empty bytes arg · Feb 05, 2024
ik/ggml-quants-cpp

91c453fb · One cannot possibly be defining static_assert in a C++ compilation · Feb 05, 2024
gg/flash-attn-interleave-cc

49a483e0 · wip · Feb 04, 2024
gg/flash-attn-32x8

a647257b · cuda : express strides with helper constants · Feb 04, 2024
gg/flash-attn-cuda

b957b8f5 · cuda : add flash_attn kernel (wip) · Feb 01, 2024
flash-attn-cuda

ac26f270 · cuda : increase C to 128 for better performance · Feb 01, 2024
gg/flash-attn-mask-f16

1ad42b1f · ggml : ggml_soft_max uses F16 mask · Jan 31, 2024
ik/fix_iq3xxs_metal

719a0871 · iq3_xxs: forgotten update of the grid points · Jan 30, 2024
gg/flash-attn-simd

2bf91c53 · metal : clean up · Jan 25, 2024
gg/flash-attn-wip3

6ccbd177 · wip · Jan 24, 2024
gg/flash-attn-wip4

da23b56f · wip : no ic 8 step · Jan 24, 2024
gg/flash-attn-wip2

06c2d0d1 · wip · Jan 23, 2024
gg/flash-attn-online

a9681feb · ggml : online attention (CPU) · Jan 20, 2024
ceb/fix-msvc-build

32a392fe · try a differerent fix · Jan 19, 2024
ceb/restore-convert

4a3bc152 · py : linting with mypy and isort · Jan 19, 2024
ceb/nomic-vulkan-fix-add

14532151 · kompute : fix ggml_add kernel · Jan 19, 2024
ik/faster_hellaswag

ccc78a20 · hellaswag: speed up even more by parallelizing log-prob evaluation · Jan 18, 2024
gg/imatrix-gpu-4931

2917e6b5 · Merge branch 'master' into gg/imatrix-gpu-4931 · Jan 17, 2024
gg/fix-spm-added-tokens-dict-4958

23742deb · py : fix padded dummy tokens (I hope) · Jan 17, 2024

Prev
1
2
3
4
5
6
7
8
9
10
…
13
Next