Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
ik/fix_iq3xxs_metal
719a0871
·
iq3_xxs: forgotten update of the grid points
·
Jan 30, 2024
gg/flash-attn-mask-f16
1ad42b1f
·
ggml : ggml_soft_max uses F16 mask
·
Jan 31, 2024
flash-attn-cuda
ac26f270
·
cuda : increase C to 128 for better performance
·
Feb 01, 2024
gg/flash-attn-cuda
b957b8f5
·
cuda : add flash_attn kernel (wip)
·
Feb 01, 2024
gg/flash-attn-32x8
a647257b
·
cuda : express strides with helper constants
·
Feb 04, 2024
gg/flash-attn-interleave-cc
49a483e0
·
wip
·
Feb 04, 2024
ik/ggml-quants-cpp
91c453fb
·
One cannot possibly be defining static_assert in a C++ compilation
·
Feb 05, 2024
gg/convert-fix-byte-tokens
adcf16fd
·
py : fix empty bytes arg
·
Feb 05, 2024
ceb/bert
7286b83d
·
BERT WIP
·
Feb 06, 2024
ik/fix_warnings
4246b71a
·
Fix compiler warnings (shadow variable)
·
Feb 13, 2024
ik/iq1_s
5c977221
·
iq1_s: slightly faster dot product
·
Feb 13, 2024
ceb/nomic-bert
ccd757a1
·
convert : fix mistakes from refactoring
·
Feb 13, 2024
gg/hf
e856bfed
·
hf : add support for --repo and --file
·
Feb 15, 2024
gg/fix-android
974e3cad
·
ggml : try another fix
·
Feb 17, 2024
gg/rename-n_ctx
47c662b0
·
fix some spaces added by IDE in math op
·
Feb 18, 2024
gg/metal-batched
412735ec
·
Merge branch 'master' into gg/metal-batched
·
Feb 19, 2024
gg/flash-attn-sync
f249c997
·
llama : adapt to F16 KQ_pos
·
Feb 19, 2024
sl/fix-quant-kv-shift
5271c756
·
llama : fix K-shift with quantized K (wip)
·
Feb 22, 2024
gg/py-minor-fixes
56c04715
·
py : minor fixes
·
Feb 22, 2024
gg/float-pos
608f4498
·
swift : fix build
·
Feb 23, 2024
Prev
1
…
4
5
6
7
8
9
10
11
12
13
Next