最終更新:2025-01-17 (金) 02:25:11 (8d)  

pulsar2 llm_build
Top / pulsar2 llm_build

ドキュメント

パラメータ

  • --input_path? INPUT_PATHpath of model or npy path
    --output_path? OUTPUT_PATHpath of dumpped ax_model
    --prefill_len? PREFILL_LENtoken length of prefill
    --parallel? PARALLELbuild parallel
    --model_config? MODEL_CONFIGconfig file
    --kv_cache_len? KV_CACHE_LENlength of kv_cache
    --post_topk? POST_TOPKpost model output indices and prob
    --post_weight_type? {bf16,s8}post weight type
    -t? {fp16,bf16,fp32}, --hidden_state_type? {fp16,bf16,fp32}hidden_state dtype
    -w? {fp16,bf16,fp32,s8,s4}, --weight_type? {fp16,bf16,fp32,s8,s4}weight dtype
    -c? CHECK_LEVEL, --check_level? CHECK_LEVELcheck level 0:run 1:layer_check 2: cal 1+1
    --chip? {AX620E,AX650}chip
    --prompt? PROMPTprompt for check_level==2

Qwen2-0.5B-Instruct?

  • pulsar2 llm_build
    • --input_path Qwen/Qwen2-0.5B-Instruct/
    • --output_path Qwen/Qwen2-0.5B-w8a16/
    • --kv_cache_len 1023
    • --hidden_state_type bf16
    • --prefill_len 128
    • --chip AX650
  • pulsar2 llm_build
    • --input_path Qwen/Qwen2-0.5B-Instruct/
    • --output_path Qwen/Qwen2-0.5B-w8a16/
    • --kv_cache_len 1023
    • --model_config config/qwen2-0.5B.json
    • --hidden_state_type bf16
    • --weight_type s8
    • --parallel 8
  • https://github.com/AXERA-TECH/ax-llm-build/blob/main/config/qwen2-0.5B.json
    {
        "model_name": "Qwen/Qwen2-0.5B-Instruct",
        "model_type": "qwen",
        "num_hidden_layers": 24,
        "num_attention_heads": 14,
        "num_key_value_heads": 2,
        "hidden_size": 896,
        "intermediate_size": 4864,
        "vocab_size": 151936,
      
        "rope_theta_base": 1000000.0,
        "max_position_embedings": 32768,
        "rope_partial_factor": 1.0,
      
        "norm_eps": 1e-6,
        "norm_type": "rms_norm",
        "hidden_act": "silu"
      }

InternVL2

実際にHugging FaceにあるTinyLlamaモデルをModule LLM上で動かすまでの流れ

  • pulsar2 llm_build
    • --input_path TinyLlama/TinyLlama-1.1B-Chat-v1.0/
    • --output_path TinyLlama/TinyLlama-1.1B-Chat-v1.0-output/
    • --kv_cache_len 1023
    • --hidden_state_type bf16
    • --prefill_len 128
    • --chip AX620E

関連

参考