Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
ik/better_q2_k_s
9fd1e83f
·
Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4
·
Jan 17, 2024
gg/iq2-refactor-and-tests
49bafe09
·
tests : avoid creating RNGs for each tensor
·
Jan 17, 2024
ik/imatrix_legacy_quants
bb9abb5c
·
imatrix: guard Q4_0/Q5_0 against ffn_down craziness
·
Jan 16, 2024
gg/add-phixtral
9998ecd1
·
llama : add phixtral support (wip)
·
Jan 13, 2024
gg/update-phi2-convert
1fb563eb
·
py : try to fix flake stuff
·
Jan 13, 2024
ik/iq2_2.31bpw
9bfcb16f
·
Add llama enum for IQ2_XS
·
Jan 11, 2024
gg/server-infill-empty-prompt-4027
24096933
·
server : try to fix infill when prompt is empty
·
Jan 09, 2024
gg/fix-vld1q_s8_x4-4872
7216af5c
·
ggml : fix 32-bit ARM compat (cont)
·
Jan 09, 2024
passkey
d57cb9c2
·
passkey : add readme
·
Jan 08, 2024
gg/remove-gqa-check-4657
7cfde781
·
llama : remove redundant GQA check
·
Jan 06, 2024
gg/metal-opt-mul-mat-id
9f51f3e6
·
metal : opt mul_mm_id
·
Jan 02, 2024
cuda-cublas-opts
4cc78d38
·
ggml : force F32 precision for ggml_mul_mat
·
Jan 02, 2024
gg/avoid-mutex
b5af7ad8
·
llama : refactor quantization to avoid <mutex> header
·
Jan 02, 2024
gg/hf-auto-dl
120a1a55
·
llama : auto download HF models if URL provided
·
Jan 02, 2024
gg/gpu-prec-tests
f64e4f04
·
ggml : testing GPU FP precision via quantized CPY
·
Dec 30, 2023
gg/test-arm
f32f30bc
·
test
·
Dec 26, 2023
gg/ggml_scale
ab1b7516
·
Merge branch 'master' into gg/ggml_scale
·
Dec 21, 2023
ceb/fix-draft-model-default
7c87353e
·
common : remove incorrect --model-draft default
·
Dec 21, 2023
gg/cublas-f32
a40f6110
·
ggml : force F32 precision for ggml_mul_mat
·
Dec 19, 2023
gg/plamo-test
3c734f49
·
plamo : testing
·
Dec 18, 2023
Prev
1
…
3
4
5
6
7
8
9
10
11
…
13
Next