Branches · Till-Ole Herbst / Llama.Cpp · GitLab

ik/fix_iq3xxs_metal

719a0871 · iq3_xxs: forgotten update of the grid points · Jan 30, 2024
gg/flash-attn-mask-f16

1ad42b1f · ggml : ggml_soft_max uses F16 mask · Jan 31, 2024
flash-attn-cuda

ac26f270 · cuda : increase C to 128 for better performance · Feb 01, 2024
gg/flash-attn-cuda

b957b8f5 · cuda : add flash_attn kernel (wip) · Feb 01, 2024
gg/flash-attn-32x8

a647257b · cuda : express strides with helper constants · Feb 04, 2024
gg/flash-attn-interleave-cc

49a483e0 · wip · Feb 04, 2024
ik/ggml-quants-cpp

91c453fb · One cannot possibly be defining static_assert in a C++ compilation · Feb 05, 2024
gg/convert-fix-byte-tokens

adcf16fd · py : fix empty bytes arg · Feb 05, 2024
ceb/bert

7286b83d · BERT WIP · Feb 06, 2024
ik/fix_warnings

4246b71a · Fix compiler warnings (shadow variable) · Feb 13, 2024
ik/iq1_s

5c977221 · iq1_s: slightly faster dot product · Feb 13, 2024
ceb/nomic-bert

ccd757a1 · convert : fix mistakes from refactoring · Feb 13, 2024
gg/hf

e856bfed · hf : add support for --repo and --file · Feb 15, 2024
gg/fix-android

974e3cad · ggml : try another fix · Feb 17, 2024
gg/rename-n_ctx

47c662b0 · fix some spaces added by IDE in math op · Feb 18, 2024
gg/metal-batched

412735ec · Merge branch 'master' into gg/metal-batched · Feb 19, 2024
gg/flash-attn-sync

f249c997 · llama : adapt to F16 KQ_pos · Feb 19, 2024
sl/fix-quant-kv-shift

5271c756 · llama : fix K-shift with quantized K (wip) · Feb 22, 2024
gg/py-minor-fixes

56c04715 · py : minor fixes · Feb 22, 2024
gg/float-pos

608f4498 · swift : fix build · Feb 23, 2024

Prev
1
…
4
5
6
7
8
9
10
11
12
13
Next