Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
gg/flash-attn-rebase
3a468e6f
·
llama : fix type of KQ_mask and KQ_pos
·
Mar 22, 2024
ik/quantize_not_repeating
0e826d12
·
quantize: be able to specify the token embedding tensor type
·
Mar 22, 2024
gg/hf-args
8c3d5b5a
·
common : remove defaults
·
Mar 22, 2024
patch-1
12aa74ba
·
minor : spacing
·
Mar 22, 2024
gg/enable-cb-default
31f2d03f
·
server : enable continuous batching by default
·
Mar 22, 2024
gg/metal-dequant-align
072c56fc
·
metal : fix the fix
·
Mar 22, 2024
ik/try_fix_rocm_k_cache
a710d58d
·
Try fix quantized k-cache on ROCm
·
Mar 21, 2024
ik/fix_k_cache_backend_tests
68e4fed4
·
Now fix test-quantize-fns
·
Mar 21, 2024
jg/flash-attn-4
82ae7f33
·
fused attention kernel for batch size 1
·
Mar 20, 2024
compilade/fix-server-tests-penalty
9a424a38
·
server : fix tests expecting old repeat penalty
·
Mar 19, 2024
jg/flash-attn
7fca4586
·
pragma unroll, use_mask template parameter
·
Mar 19, 2024
gg/repeng
0a9bc301
·
control-vectors : minor code style updates
·
Mar 14, 2024
gg/metal-embed
abf0afd0
·
ci : fix iOS builds to use embedded library
·
Mar 14, 2024
ik/try_fix_iq1s_sycl
9f805264
·
Attempt 2
·
Mar 12, 2024
ik/even_better_iq1s
5440a127
·
iq1_s: fix dequantize on the CPU
·
Mar 11, 2024
gg/try-fix-sycl-iq1_s
76be02ae
·
sycl : fix grid type
·
Mar 11, 2024
sycl_q3s_q1s
989e15b3
·
Merge branch 'master' into sycl_q3s_q1s
·
Mar 11, 2024
gritlm-pr
b54afce9
·
mostly style fixes; fix KQ_mask comment
·
Mar 09, 2024
gg/bert-f16
0ba20ed9
·
llama : compute BERT graph with F16 K, V
·
Mar 07, 2024
revert-5901-fix_set_gpu
b5b02703
·
Revert "[SYCL] fix error when set main gpu to non-zero (#5901)"
·
Mar 07, 2024
Prev
1
2
3
4
5
6
7
8
…
13
Next