最終更新:2026-02-26 (木) 22:17:17 (10d)
llama-bench
Top / llama-bench
https://github.com/ggml-org/llama.cpp/tree/master/tools/llama-bench
usage
usage: llama-bench [options] options: -h, --help --numa <distribute|isolate|numactl> numa mode (default: disabled) -r, --repetitions <n> number of times to repeat each test (default: 5) --prio <0|1|2|3> process/thread priority (default: 0) --delay <0...N> (seconds) delay between each test (default: 0) -o, --output <csv|json|jsonl|md|sql> output format printed to stdout (default: md) -oe, --output-err <csv|json|jsonl|md|sql> output format printed to stderr (default: none) --list-devices list available devices and exit -v, --verbose verbose output --progress print test progress indicators -rpc, --rpc <rpc_servers> register RPC devices (comma separated) test parameters: -m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf) -p, --n-prompt <n> (default: 512) -n, --n-gen <n> (default: 128) -pg <pp,tg> (default: ) -d, --n-depth <n> (default: 0) -b, --batch-size <n> (default: 2048) -ub, --ubatch-size <n> (default: 512) -ctk, --cache-type-k <t> (default: f16) -ctv, --cache-type-v <t> (default: f16) -t, --threads <n> (default: system dependent) -C, --cpu-mask <hex,hex> (default: 0x0) --cpu-strict <0|1> (default: 0) --poll <0...100> (default: 50) -ngl, --n-gpu-layers <n> (default: 99) -ncmoe, --n-cpu-moe <n> (default: 0) -sm, --split-mode <none|layer|row> (default: layer) -mg, --main-gpu <i> (default: 0) -nkvo, --no-kv-offload <0|1> (default: 0) -fa, --flash-attn <0|1> (default: 0) -dev, --device <dev0/dev1/...> (default: auto) -mmp, --mmap <0|1> (default: 1) -embd, --embeddings <0|1> (default: 0) -ts, --tensor-split <ts0/ts1/..> (default: 0) -ot --override-tensors <tensor name pattern>=<buffer type>;... (default: disabled) -nopo, --no-op-offload <0|1> (default: 0) Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times. Ranges can be given as 'first-last' or 'first-last+step' or 'first-last*mult'.

