Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
jg/flash-attn-4
82ae7f33
·
fused attention kernel for batch size 1
·
Mar 20, 2024
ik/fix_k_cache_backend_tests
68e4fed4
·
Now fix test-quantize-fns
·
Mar 21, 2024
ik/try_fix_rocm_k_cache
a710d58d
·
Try fix quantized k-cache on ROCm
·
Mar 21, 2024
gg/metal-dequant-align
072c56fc
·
metal : fix the fix
·
Mar 22, 2024
gg/enable-cb-default
31f2d03f
·
server : enable continuous batching by default
·
Mar 22, 2024
patch-1
12aa74ba
·
minor : spacing
·
Mar 22, 2024
gg/hf-args
8c3d5b5a
·
common : remove defaults
·
Mar 22, 2024
ik/quantize_not_repeating
0e826d12
·
quantize: be able to specify the token embedding tensor type
·
Mar 22, 2024
gg/flash-attn-rebase
3a468e6f
·
llama : fix type of KQ_mask and KQ_pos
·
Mar 22, 2024
ceb/fix-win-unicode-fpaths
d05c13b3
·
llama : fix BPE LF token on MSVC
·
Mar 23, 2024
sl/cuda-f16-fix3
210e4691
·
cuda : fix LLAMA_CUDA_F16 build
·
Mar 25, 2024
ik/test_quantize_fns
6f20e267
·
Include IQ2_XXS and IQ2_XS in teet-quantize-fns
·
Mar 25, 2024
ik/quantize_with_kv_overrides
9c5fd6be
·
minor : spacing
·
Mar 26, 2024
ceb/wpm-portable-tolower
87a6088f
·
rename unicodedata.{cpp,h} to unicode-data.{cpp,h}
·
Mar 26, 2024
gg/flash-attn-wip
6be02b59
·
cuda : fix build
·
Mar 27, 2024
compilade/fix-command-r
64b7d858
·
llama : fix command-r inference
·
Mar 28, 2024
gg/flash-attn-a
4c190ba6
·
cuda : reduce registers
·
Mar 28, 2024
ceb/bert-tokenizer-fixes
a37696d4
·
speculative : more robust tokenizer comparison
·
Apr 04, 2024
gg/authors
072e0a4d
·
scipts : add LICENSE and gen-authors.sh to sync
·
Apr 09, 2024
gg/imatrix-remove-assert
8b495540
·
imatrix : remove invalid assert
·
Apr 12, 2024
Prev
1
…
6
7
8
9
10
11
12
13
Next