検索

クイックアクセス

チラ裏

おなかすいた族！

リンク

人気の50件

最終更新:2025-02-26 (水) 03:58:28 (143d)

pulsar2
pulsar2 llm_build
Top / pulsar2 llm_build

ドキュメント

https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html

パラメータ

--input_path? INPUT_PATH	path of model or npy path
--output_path? OUTPUT_PATH	path of dumpped ax_model
--prefill_len? PREFILL_LEN	token length of prefill
--parallel? PARALLEL	build parallel
--model_config? MODEL_CONFIG	config file
--kv_cache_len? KV_CACHE_LEN	length of kv_cache
--post_topk? POST_TOPK	post model output indices and prob
--post_weight_type? {bf16,s8}	post weight type
-t? {fp16,bf16,fp32}, --hidden_state_type? {fp16,bf16,fp32}	hidden_state dtype
-w? {fp16,bf16,fp32,s8,s4}, --weight_type? {fp16,bf16,fp32,s8,s4}	weight dtype
-c? CHECK_LEVEL, --check_level? CHECK_LEVEL	check level 0:run 1:layer_check 2: cal 1+1
--chip? {AX620E,AX650}	chip
--prompt? PROMPT	prompt for check_level==2

Qwen2-0.5B-Instruct?

pulsar2 llm_build
- --input_path Qwen/Qwen2-0.5B-Instruct/
- --output_path Qwen/Qwen2-0.5B-w8a16/
- --kv_cache_len 1023
- --hidden_state_type bf16
- --prefill_len 128
- --chip AX650

pulsar2 llm_build
- --input_path Qwen/Qwen2-0.5B-Instruct/
- --output_path Qwen/Qwen2-0.5B-w8a16/
- --kv_cache_len 1023
- --model_config config/qwen2-0.5B.json
- --hidden_state_type bf16
- --weight_type s8
- --parallel 8

https://github.com/AXERA-TECH/ax-llm-build/blob/main/config/qwen2-0.5B.json

{
    "model_name": "Qwen/Qwen2-0.5B-Instruct",
    "model_type": "qwen",
    "num_hidden_layers": 24,
    "num_attention_heads": 14,
    "num_key_value_heads": 2,
    "hidden_size": 896,
    "intermediate_size": 4864,
    "vocab_size": 151936,
  
    "rope_theta_base": 1000000.0,
    "max_position_embedings": 32768,
    "rope_partial_factor": 1.0,
  
    "norm_eps": 1e-6,
    "norm_type": "rms_norm",
    "hidden_act": "silu"
  }

InternVL2

https://qiita.com/nnn112358/items/c9cc3a8cc23bc34e7c7d
pulsar2 llm_build
- --input_path OpenGVLab/InternVL2-1B/
- --output_path OpenGVLab/InternVL2-1B-AX620E
- --kv_cache_len 1023
- --hidden_state_type bf16
- --prefill_len 128
- --chip AX620E

実際にHugging FaceにあるTinyLlamaモデルをModule LLM上で動かすまでの流れ

https://x.com/T_taisyou/status/1879830993940742609

pulsar2 llm_build
- --input_path TinyLlama/TinyLlama-1.1B-Chat-v1.0/
- --output_path TinyLlama/TinyLlama-1.1B-Chat-v1.0-output/
- --kv_cache_len 1023
- --hidden_state_type bf16
- --prefill_len 128
- --chip AX620E

変換できるmodel_type

Hugging Face/モデル/config.jsonのmodel_type

OK: llama, qwen2, gemma2 (トークナイザが非対応？)
NG: mistral, smolvlm
https://x.com/UtaAoya/status/1893108595769680119

	config.model	model_type
Qwen2.5-0.5B-Instruct	qwen2.5-0.5B-prefill-20e
Llama 3.2 1B	llama3.2-1B-prefill-ax630c
InternVL2-1B?
Qwen2.5-1.5B-Instruct	qwen2.5-1.5B-ax630c?
	openbuddy-llama3.2-1B-ax630c
TinySwallow-1.5B	-	qwen2?
LLM-jp-3	-	llama?

関連

ax-llm-build

参考

https://qiita.com/nnn112358/items/c9cc3a8cc23bc34e7c7d