Branches · Till-Ole Herbst / Llama.Cpp · GitLab

jg/flash-attn-4

82ae7f33 · fused attention kernel for batch size 1 · Mar 20, 2024
ik/fix_k_cache_backend_tests

68e4fed4 · Now fix test-quantize-fns · Mar 21, 2024
ik/try_fix_rocm_k_cache

a710d58d · Try fix quantized k-cache on ROCm · Mar 21, 2024
gg/metal-dequant-align

072c56fc · metal : fix the fix · Mar 22, 2024
gg/enable-cb-default

31f2d03f · server : enable continuous batching by default · Mar 22, 2024
patch-1

12aa74ba · minor : spacing · Mar 22, 2024
gg/hf-args

8c3d5b5a · common : remove defaults · Mar 22, 2024
ik/quantize_not_repeating

0e826d12 · quantize: be able to specify the token embedding tensor type · Mar 22, 2024
gg/flash-attn-rebase

3a468e6f · llama : fix type of KQ_mask and KQ_pos · Mar 22, 2024
ceb/fix-win-unicode-fpaths

d05c13b3 · llama : fix BPE LF token on MSVC · Mar 23, 2024
sl/cuda-f16-fix3

210e4691 · cuda : fix LLAMA_CUDA_F16 build · Mar 25, 2024
ik/test_quantize_fns

6f20e267 · Include IQ2_XXS and IQ2_XS in teet-quantize-fns · Mar 25, 2024
ik/quantize_with_kv_overrides

9c5fd6be · minor : spacing · Mar 26, 2024
ceb/wpm-portable-tolower

87a6088f · rename unicodedata.{cpp,h} to unicode-data.{cpp,h} · Mar 26, 2024
gg/flash-attn-wip

6be02b59 · cuda : fix build · Mar 27, 2024
compilade/fix-command-r

64b7d858 · llama : fix command-r inference · Mar 28, 2024
gg/flash-attn-a

4c190ba6 · cuda : reduce registers · Mar 28, 2024
ceb/bert-tokenizer-fixes

a37696d4 · speculative : more robust tokenizer comparison · Apr 04, 2024
gg/authors

072e0a4d · scipts : add LICENSE and gen-authors.sh to sync · Apr 09, 2024
gg/imatrix-remove-assert

8b495540 · imatrix : remove invalid assert · Apr 12, 2024

Prev
1
…
6
7
8
9
10
11
12
13
Next