Cloud GPU selection

Tested hardware & providers

We ran our tests on the following hardware:

The laptop hardware setup includes an Intel(R) Core(TM) i7-12700H for the CPU

The results are available for the following LLMs (cf. Ollama hub):

and the following quantization formats: q3_K_M, q4_K_M, q5_K_M.

Model	Ingestion mean (std)	Generation mean (std)
deepseek-coder:6.7b-instruct-q5_K_M	35.43 tok/s (±3.46)	23.68 tok/s (±0.74)
deepseek-coder:6.7b-instruct-q4_K_M	72.27 tok/s (±10.69)	36.82 toks/s (±1.25)
deepseek-coder:6.7b-instruct-q3_K_M	90.1 tok/s (±32.43)	50.34 toks/s (±1.28)
pxlksr/opencodeinterpreter-ds:6.7b-Q4_K_M	78.94 tok/s (±10.2)	37.95 toks/s (±1.65)
dolphin-mistral:7b-v2.6-dpo-laser-q4_K_M	126.75 tok/s (±31.5)	50.05 toks/s (±0.84)
dolphin-mistral:7b-v2.6-dpo-laser-q3_K_M	89.47 tok/s (±29.91)	47.09 toks/s (±0.67)
codeqwen:7b-chat-v1.5-q4_1	171.72 tok/s (±53.37)	54.74 toks/s (±0.82)
dolphin-llama3:8b-v2.9-q4_K_M	131.89 tok/s (±33.37)	50.81 toks/s (±0.66)
phi3:3.8b-mini-instruct-4k-q4_K_M	271.40 tok/s (±52.48)	88.43 toks/s (±13.22)

Model	Ingestion mean (std)	Generation mean (std)
deepseek-coder:6.7b-instruct-q4_K_M	266.98 tok/s (±95.63)	75.53 toks/s (±1.56)
deepseek-coder:6.7b-instruct-q3_K_M	141.43 tok/s (±50.4)	73.69 toks/s (±1.61)
pxlksr/opencodeinterpreter-ds:6.7b-Q4_K_M	285.81 tok/s (±73.55)	75.14 toks/s (±3.13)
dolphin-mistral:7b-v2.6-dpo-laser-q4_K_M	234.2 tok/s (±79.38)	71.54 toks/s (±1.0)
dolphin-mistral:7b-v2.6-dpo-laser-q3_K_M	114.54 tok/s (±38.24)	69.29 toks/s (±0.98)

Model	Ingestion mean (std)	Generation mean (std)
deepseek-coder:6.7b-instruct-q4_K_M	208.65 tok/s (±74.02)	78.68 toks/s (±1.64)
deepseek-coder:6.7b-instruct-q3_K_M	111.84 tok/s (±39.9)	71.66 toks/s (±1.75)
pxlksr/opencodeinterpreter-ds:6.7b-Q4_K_M	226.66 tok/s (±65.65)	77.26 toks/s (±2.72)
dolphin-mistral:7b-v2.6-dpo-laser-q4_K_M	202.43 tok/s (±69.55)	73.9 toks/s (±0.87)
dolphin-mistral:7b-v2.6-dpo-laser-q3_K_M	112.82 tok/s (±38.46)	66.98 toks/s (±0.79)

Model	Ingestion mean (std)	Generation mean (std)
deepseek-coder:6.7b-instruct-q4_K_M	186.61 tok/s (±66.03)	79.62 toks/s (±1.52)
deepseek-coder:6.7b-instruct-q3_K_M	99.83 tok/s (±35.41)	84.47 toks/s (±1.69)
pxlksr/opencodeinterpreter-ds:6.7b-Q4_K_M	212.08 tok/s (±86.58)	79.02 toks/s (±3.35)
dolphin-mistral:7b-v2.6-dpo-laser-q4_K_M	187.2 tok/s (±62.24)	75.91 toks/s (±1.0)
dolphin-mistral:7b-v2.6-dpo-laser-q3_K_M	102.36 tok/s (±34.29)	81.23 toks/s (±1.02)

Model	Ingestion mean (std)	Generation mean (std)
deepseek-coder:6.7b-instruct-q4_K_M	213.46 tok/s (±76.24)	49.97 toks/s (±1.01)
deepseek-coder:6.7b-instruct-q3_K_M	118.87 tok/s (±43.35)	54.72 toks/s (±1.31)
pxlksr/opencodeinterpreter-ds:6.7b-Q4_K_M	225.62 tok/s (±60.21)	49.39 toks/s (±1.9)
dolphin-mistral:7b-v2.6-dpo-laser-q4_K_M	211.52 tok/s (±72.76)	47.27 toks/s (±0.58)
dolphin-mistral:7b-v2.6-dpo-laser-q3_K_M	120.13 tok/s (±41.09)	51.9 toks/s (±0.71)

If you’re looking for the latest benchmark results, head over here