最終更新:2025-01-17 (金) 02:25:11 (8d)
pulsar2 llm_build
Top / pulsar2 llm_build
ドキュメント
パラメータ
--input_path? INPUT_PATH path of model or npy path --output_path? OUTPUT_PATH path of dumpped ax_model --prefill_len? PREFILL_LEN token length of prefill --parallel? PARALLEL build parallel --model_config? MODEL_CONFIG config file --kv_cache_len? KV_CACHE_LEN length of kv_cache --post_topk? POST_TOPK post model output indices and prob --post_weight_type? {bf16,s8} post weight type -t? {fp16,bf16,fp32}, --hidden_state_type? {fp16,bf16,fp32} hidden_state dtype -w? {fp16,bf16,fp32,s8,s4}, --weight_type? {fp16,bf16,fp32,s8,s4} weight dtype -c? CHECK_LEVEL, --check_level? CHECK_LEVEL check level 0:run 1:layer_check 2: cal 1+1 --chip? {AX620E,AX650} chip --prompt? PROMPT prompt for check_level==2
Qwen2-0.5B-Instruct?
- pulsar2 llm_build
- --input_path Qwen/Qwen2-0.5B-Instruct/
- --output_path Qwen/Qwen2-0.5B-w8a16/
- --kv_cache_len 1023
- --hidden_state_type bf16
- --prefill_len 128
- --chip AX650
- pulsar2 llm_build
- --input_path Qwen/Qwen2-0.5B-Instruct/
- --output_path Qwen/Qwen2-0.5B-w8a16/
- --kv_cache_len 1023
- --model_config config/qwen2-0.5B.json
- --hidden_state_type bf16
- --weight_type s8
- --parallel 8
- https://github.com/AXERA-TECH/ax-llm-build/blob/main/config/qwen2-0.5B.json
{ "model_name": "Qwen/Qwen2-0.5B-Instruct", "model_type": "qwen", "num_hidden_layers": 24, "num_attention_heads": 14, "num_key_value_heads": 2, "hidden_size": 896, "intermediate_size": 4864, "vocab_size": 151936, "rope_theta_base": 1000000.0, "max_position_embedings": 32768, "rope_partial_factor": 1.0, "norm_eps": 1e-6, "norm_type": "rms_norm", "hidden_act": "silu" }
InternVL2
- https://qiita.com/nnn112358/items/c9cc3a8cc23bc34e7c7d
- pulsar2 llm_build
- --input_path OpenGVLab/InternVL2-1B/
- --output_path OpenGVLab/InternVL2-1B-AX620E
- --kv_cache_len 1023
- --hidden_state_type bf16
- --prefill_len 128
- --chip AX620E
実際にHugging FaceにあるTinyLlamaモデルをModule LLM上で動かすまでの流れ
- pulsar2 llm_build