Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
b1821
43f76bf1
·
main : print total token count and tokens consumed so far (#4874)
·
Jan 11, 2024
b1820
2f043328
·
server : fix typo in model name (#4876)
·
Jan 11, 2024
b1819
2a7c94db
·
metal : put encoder debug group behind a define (#4873)
·
Jan 11, 2024
b1818
64802ec0
·
sync : ggml
·
Jan 11, 2024
b1810
5c1980d8
·
server : fix build + rename enums (#4870)
·
Jan 11, 2024
b1808
57d016ba
·
llama : add additional suffixes for model params (#4834)
·
Jan 10, 2024
b1807
329ff615
·
llama : recognize 1B phi models (#4847)
·
Jan 10, 2024
b1806
d34633d8
·
clip : support more quantization types (#4846)
·
Jan 10, 2024
b1803
36e5a08b
·
llava-cli : don't crash if --image flag is invalid (#4835)
·
Jan 09, 2024
b1796
18c2e175
·
ggml : fix vld1q_s8_x4 32-bit compat (#4828)
·
Jan 09, 2024
b1795
8f900abf
·
CUDA: faster softmax via shared memory + fp16 math (#4742)
·
Jan 09, 2024
b1794
1fc2f265
·
common : fix the short form of `--grp-attn-w`, not `-gat` (#4825)
·
Jan 08, 2024
b1792
dd5ae064
·
SOTA 2-bit quants (#4773)
·
Jan 08, 2024
b1791
668b31fc
·
swift : exclude ggml-metal.metal from the package (#4822)
·
Jan 08, 2024
b1789
52531fdf
·
main : add self-extend support (#4815)
·
Jan 08, 2024
b1788
b0034d93
·
examples : add passkey test (#3856)
·
Jan 08, 2024
b1786
226460cc
·
llama-bench : add no-kv-offload parameter (#4812)
·
Jan 07, 2024
b1785
d5a410e8
·
CUDA: fixed redundant value dequantization (#4809)
·
Jan 07, 2024
b1784
9dede37d
·
llama : remove unused vars (#4796)
·
Jan 07, 2024
b1783
3c36213d
·
llama : remove redundant GQA check (#4796)
·
Jan 07, 2024
Prev
1
…
34
35
36
37
38
39
40
41
42
…
98
Next