最終更新:2025-02-02 (日) 21:37:15 (5d)  

whisper.axera/3.3-temp
Top / whisper.axera / 3.3-temp

概要

  • Pulsar2の3.3-tempがWhisper baseに対応したらしいので試したら動いた
  • smallも行けそうと思って変換したらメモリが128GBくらい必要だったがギリギリ変換できた (RAM128GBのVM)

まとめ

  • モデルn_audio_state.onnx.axmodel変換時メモリ使用量動作時
    encoderdecoder-maindecoder-loopencoderdecoder-maindecoder-loopencoderdecoder-maindecoder-loopCMM
    tiny38437.6MB118.3MB112.8MB27.8MB91.8MB89.1MB21.9GB6.5GB8.4GB
    base51295MB205.5MB194.6MB56.0MB135.7MB130.4MB45.1GB8GB13.3GB461MB
    small768409.4MB589.2MB556.4MB212.2MB280.7MB264.8MB128.2GB14.3GB19.7GB1178MB
    medium10241.4GB1.7GB1.6GB(未チャレンジ)622.9MB580.8MB(たぶん404GB以上必要)26GB43.7GB
    large12803.0GB3.2GB3.2GB(未チャレンジ)1.1GB1.0GB(たぶん850GB以上必要)49GB75.0GB

ビルドされたモデルのミラー (by @610t)

Pulsar2の3.3-temp (2024/12/27)

  • This release is only intended to temporarily support the following functions, which will be integrated in the next official release:
  • ax_pulsar2_temp_f0b32d03.tar.gz

変換の準備

ホストのUbuntu上で実行

  • python export_onnx.py
  • python generate_data.py
    $ python generate_data.py --model base
    ./datasets/aishell_S0764/BAC009S0764W0123.wav
    0: 
    但因为聚集了过多公共资源
    但因为聚集了过多公共思缘
    CER: 16.666666666666664%
    ./datasets/aishell_S0764/BAC009S0764W0122.wav
    1: 
    一二线城市虽然也处于调整中
    12线尝试虽然也处于调整中
    CER: 30.76923076923077%
    ./datasets/aishell_S0764/BAC009S0764W0121.wav
    2: 
    甚至出现交易几乎停滞的情况
    甚至出现交易几乎停制的情况
    CER: 7.6923076923076925%
    ./datasets/aishell_S0764/BAC009S0764W0125.wav
    3: 
    标杆房企必然调整市场战略
    标杆防起必然调整市场大略
    CER: 25.0%
    ./datasets/aishell_S0764/BAC009S0764W0124.wav
    4: 
    为了规避三四线城市明显过剩的市场风险
    为了规避3次线成是明显郭盛的市场风险
    CER: 33.33333333333333%
    
    total CER: 23.52941176470588%
    Save calibrations_base/encoder/mel/../mel.tar.gz
    Save calibrations_base/decoder_main/tokens/../tokens.tar.gz
    Save calibrations_base/decoder_main/n_layer_cross_k/../n_layer_cross_k.tar.gz
    Save calibrations_base/decoder_main/n_layer_cross_v/../n_layer_cross_v.tar.gz
    Save calibrations_base/decoder_loop/tokens/../tokens.tar.gz
    Save calibrations_base/decoder_loop/n_layer_self_k_cache/../n_layer_self_k_cache.tar.gz
    Save calibrations_base/decoder_loop/n_layer_self_v_cache/../n_layer_self_v_cache.tar.gz
    Save calibrations_base/decoder_loop/n_layer_cross_k/../n_layer_cross_k.tar.gz
    Save calibrations_base/decoder_loop/n_layer_cross_v/../n_layer_cross_v.tar.gz
    Save calibrations_base/decoder_loop/positional_embedding/../positional_embedding.tar.gz
    Save calibrations_base/decoder_loop/mask/../mask.tar.gz

コンテナの起動

  • cd pulsar2
    sudo docker load -i ax_pulsar2_temp_f0b32d03.tar.gz
    sudo docker run -it --net host --rm -v $PWD:/data pulsar2:temp-f0b32d03

config_whisper_*.jsonの書き換え

  • calibration_datasetのcalibrations_smallをcalibrations_baseに
  • npu_modeをNPU2に

変換

  • pulsar2 build --input base-encoder.onnx --config config_whisper_encoder_u16.json --output_dir base_encoder --output_name base-encoder.axmodel --target_hardware AX620E --compiler.check 0
    pulsar2 build --input base-decoder-main.onnx --config config_whisper_decoder_main_u16.json --output_dir base_decoder_main --output_name base-decoder-main.axmodel --target_hardware AX620E --compiler.check 0
    pulsar2 build --input base-decoder-loop.onnx --config config_whisper_decoder_loop_u16.json --output_dir base_decoder_loop --output_name base-decoder-loop.axmodel --target_hardware AX620E --compiler.check 0

実行

base

  • https://github.com/ml-inory/whisper.axera#python-api-运行 の手順でNumPyとか入れる
  • https://gist.github.com/nnn112358/0601b3a1dee256789f28efb486fe5519
    └── python
        ├── info-girl1-goseichouarigatou1.mp3
        ├── info-girl1-goseichouarigatou1.wav
        └── whisper.py
    ├── models
    │   ├── base-decoder-loop.axmodel
    │   ├── base-decoder-main.axmodel
    │   ├── base-encoder.axmodel
    │   ├── base-positional_embedding.bin
    │   ├── base-tokens.txt
    │   ├── small-positional_embedding.bin
    │   ├── small-tokens.txt
    │   ├── tiny-positional_embedding.bin
    │   └── tiny-tokens.txt
  • /root/.bashrc
    export PYTHONPATH=$PYTHONPATH:/opt/site-packages/local/lib/python3.10/dist-packages  
    export PATH=$PATH:/opt/site-packages/local/bin
  • cd /root/whisper.axera/python
    $ . /root/.bashrc
    root@m5stack-LLM:/root/whisper.axera/python# python3 whisper.py --wav info-girl1-goseichouarigatou1.wav --model_type base --language ja
    wav: info-girl1-goseichouarigatou1.wav
    model_type: base
    model_path: ../models
    language: ja
    Run encoder take 941.7006969451904ms
    Run decoder_main take 278.8667678833008ms
    First token: 9991
    Run decoder_loop take 311.3243579864502ms
    Iter 0   Token: 11336
    Run decoder_loop take 368.3631420135498ms
    Iter 1   Token: 15353
    Run decoder_loop take 243.71886253356934ms
    Iter 2   Token: 1231
    Run decoder_loop take 235.215425491333ms
    Iter 3   Token: 38538
    Run decoder_loop take 243.26157569885254ms
    Iter 4   Token: 0
    Run decoder_loop take 235.04376411437988ms
    Iter 5   Token: 50257
    Result: ご成長、ありがとうございました!
  • メモリ使用量
    sh-5.1# cat /proc/ax_proc/mem_cmm_info
    --------------------SDK VERSION-------------------
    [Axera version]: ax_cmm V2.0.0_P7_20240513101106 May 13 2024 10:22:59 JK
    +---PARTITION: Phys(0x80000000, 0x13FFFFFFF), Size=3145728KB(3072MB),    NAME="anonymous"
     nBlock(Max=42, Cur=42, New=29, Free=0)  nbytes(Max=483405824B(472076KB,461MB), Cur=483405824B(472076KB,461MB), New=482947072B(471628KB,460MB), Free=0B(0KB,0MB))  Block(Max=134078464B(130936KB,127MB), Min=4096B(4KB,0MB), Avg=4639608B(4530KB,4MB))
       |-Block: phys(0x80000000, 0x80000FFF), cache =non-cacheable, length=4KB(0MB),    name="dma"
       |-Block: phys(0x80001000, 0x80006FFF), cache =non-cacheable, length=24KB(0MB),    name="VPP_CMD0"
       |-Block: phys(0x80007000, 0x80007FFF), cache =non-cacheable, length=4KB(0MB),    name="VPP_CMD3"
       |-Block: phys(0x80008000, 0x80008FFF), cache =non-cacheable, length=4KB(0MB),    name="GDC_CMD3"
       |-Block: phys(0x80009000, 0x8000BFFF), cache =non-cacheable, length=12KB(0MB),    name="GDC_CMD0"
       |-Block: phys(0x8000C000, 0x8002CFFF), cache =non-cacheable, length=132KB(0MB),    name="TDP_CMD0"
       |-Block: phys(0x8002D000, 0x8002DFFF), cache =non-cacheable, length=4KB(0MB),    name="TDP_CMD3"
       |-Block: phys(0x8002E000, 0x8003DFFF), cache =non-cacheable, length=64KB(0MB),    name="venc_ko"
       |-Block: phys(0x8003E000, 0x8004DFFF), cache =non-cacheable, length=64KB(0MB),    name="venc_ko"
       |-Block: phys(0x8004E000, 0x8004EFFF), cache =non-cacheable, length=4KB(0MB),    name="venc_ko"
       |-Block: phys(0x8004F000, 0x8005EFFF), cache =non-cacheable, length=64KB(0MB),    name="jenc_ko"
       |-Block: phys(0x8005F000, 0x8006EFFF), cache =non-cacheable, length=64KB(0MB),    name="jenc_ko"
       |-Block: phys(0x8006F000, 0x8006FFFF), cache =non-cacheable, length=4KB(0MB),    name="jenc_ko"
       |-Block: phys(0x80070000, 0x81821FFF), cache =non-cacheable, length=24264KB(23MB),    name="engine"
       |-Block: phys(0x81822000, 0x835DDFFF), cache =non-cacheable, length=30448KB(29MB),    name="npu_m_../models/base-encoder.ax"
       |-Block: phys(0x835DE000, 0x84457FFF), cache =non-cacheable, length=14824KB(14MB),    name="npu_swap_0"
       |-Block: phys(0x84458000, 0x84458FFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
       |-Block: phys(0x84459000, 0x84543FFF), cache =non-cacheable, length=940KB(0MB),    name="toolkit"
       |-Block: phys(0x84544000, 0x856D8FFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
       |-Block: phys(0x856D9000, 0x8686DFFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
       |-Block: phys(0x8686E000, 0x8E84BFFF), cache =non-cacheable, length=130936KB(127MB),    name="engine"
       |-Block: phys(0x8E84C000, 0x8E9D1FFF), cache =non-cacheable, length=1560KB(1MB),    name="npu_m_../models/base-decoder-ma"
       |-Block: phys(0x8E9D2000, 0x8E9D2FFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
       |-Block: phys(0x8E9D3000, 0x8E9D3FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
       |-Block: phys(0x8E9D4000, 0x8FB68FFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
       |-Block: phys(0x8FB69000, 0x90CFDFFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
       |-Block: phys(0x90CFE000, 0x90DC8FFF), cache =non-cacheable, length=812KB(0MB),    name="toolkit"
       |-Block: phys(0x90DC9000, 0x91309FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
       |-Block: phys(0x9130A000, 0x9184AFFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
       |-Block: phys(0x9184B000, 0x992F3FFF), cache =non-cacheable, length=125604KB(122MB),    name="engine"
       |-Block: phys(0x992F4000, 0x9949DFFF), cache =non-cacheable, length=1704KB(1MB),    name="npu_m_../models/base-decoder-lo"
       |-Block: phys(0x9949E000, 0x9949EFFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
       |-Block: phys(0x9949F000, 0x9949FFFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
       |-Block: phys(0x994A0000, 0x999E0FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
       |-Block: phys(0x999E1000, 0x99F21FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
       |-Block: phys(0x99F22000, 0x9B0B6FFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
       |-Block: phys(0x9B0B7000, 0x9C24BFFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
       |-Block: phys(0x9C24C000, 0x9C24CFFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
       |-Block: phys(0x9C24D000, 0x9C24DFFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
       |-Block: phys(0x9C24E000, 0x9C280FFF), cache =non-cacheable, length=204KB(0MB),    name="toolkit"
       |-Block: phys(0x9C281000, 0x9C7C1FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
       |-Block: phys(0x9C7C2000, 0x9CD02FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
    
    ---CMM_USE_INFO:
     total size=3145728KB(3072MB),used=472076KB(461MB + 12KB),remain=2673652KB(2610MB + 1012KB),partition_number=1,block_number=42

small

  • 同様に変換 (encoderの変換にメモリが128GBくらい必要。128GBのマシンじゃギリギリかも?)
    root@m5stack-LLM:/root/whisper.axera/python# python3 whisper.py --wav info-girl1-goseichouarigatou1.wav --model_type small --language ja
    wav: info-girl1-goseichouarigatou1.wav
    model_type: small
    model_path: ../models
    language: ja
    Run encoder take 2951.171875ms
    Run decoder_main take 1029.0307998657227ms
    First token: 9991
    Run decoder_loop take 1070.4200267791748ms
    Iter 0   Token: 21784
    Run decoder_loop take 952.1105289459229ms
    Iter 1   Token: 8171
    Run decoder_loop take 904.6242237091064ms
    Iter 2   Token: 112
    Run decoder_loop take 805.9420585632324ms
    Iter 3   Token: 38538
    Run decoder_loop take 861.4728450775146ms
    Iter 4   Token: 50257
    Result: ご清聴ありがとうございました
  • メモリ使用量
    sh-5.1# cat /proc/ax_proc/mem_cmm_info
    --------------------SDK VERSION-------------------
    [Axera version]: ax_cmm V2.0.0_P7_20240513101106 May 13 2024 10:22:59 JK
    +---PARTITION: Phys(0x80000000, 0x13FFFFFFF), Size=3145728KB(3072MB),    NAME="anonymous"
     nBlock(Max=42, Cur=42, New=29, Free=0)  nbytes(Max=1235664896B(1206704KB,1178MB), Cur=1235664896B(1206704KB,1178MB), New=1235206144B(1206256KB,1177MB), Free=0B(0KB,0MB))  Block(Max=275918848B(269452KB,263MB), Min=4096B(4KB,0MB), Avg=13843958B(13519KB,13MB))
       |-Block: phys(0x80000000, 0x80000FFF), cache =non-cacheable, length=4KB(0MB),    name="dma"
       |-Block: phys(0x80001000, 0x80006FFF), cache =non-cacheable, length=24KB(0MB),    name="VPP_CMD0"
       |-Block: phys(0x80007000, 0x80007FFF), cache =non-cacheable, length=4KB(0MB),    name="VPP_CMD3"
       |-Block: phys(0x80008000, 0x80008FFF), cache =non-cacheable, length=4KB(0MB),    name="GDC_CMD3"
       |-Block: phys(0x80009000, 0x8000BFFF), cache =non-cacheable, length=12KB(0MB),    name="GDC_CMD0"
       |-Block: phys(0x8000C000, 0x8002CFFF), cache =non-cacheable, length=132KB(0MB),    name="TDP_CMD0"
       |-Block: phys(0x8002D000, 0x8002DFFF), cache =non-cacheable, length=4KB(0MB),    name="TDP_CMD3"
       |-Block: phys(0x8002E000, 0x8003DFFF), cache =non-cacheable, length=64KB(0MB),    name="venc_ko"
       |-Block: phys(0x8003E000, 0x8004DFFF), cache =non-cacheable, length=64KB(0MB),    name="venc_ko"
       |-Block: phys(0x8004E000, 0x8004EFFF), cache =non-cacheable, length=4KB(0MB),    name="venc_ko"
       |-Block: phys(0x8004F000, 0x8005EFFF), cache =non-cacheable, length=64KB(0MB),    name="jenc_ko"
       |-Block: phys(0x8005F000, 0x8006EFFF), cache =non-cacheable, length=64KB(0MB),    name="jenc_ko"
       |-Block: phys(0x8006F000, 0x8006FFFF), cache =non-cacheable, length=4KB(0MB),    name="jenc_ko"
       |-Block: phys(0x80070000, 0x87936FFF), cache =non-cacheable, length=123676KB(120MB),    name="engine"
       |-Block: phys(0x87937000, 0x8CACBFFF), cache =non-cacheable, length=83540KB(81MB),    name="npu_m_../models/small-encoder.a"
       |-Block: phys(0x8CACC000, 0x8F54EFFF), cache =non-cacheable, length=43532KB(42MB),    name="npu_swap_0"
       |-Block: phys(0x8F54F000, 0x8F54FFFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
       |-Block: phys(0x8F550000, 0x8F63AFFF), cache =non-cacheable, length=940KB(0MB),    name="toolkit"
       |-Block: phys(0x8F63B000, 0x92AF7FFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
       |-Block: phys(0x92AF8000, 0x95FB4FFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
       |-Block: phys(0x95FB5000, 0xA66D7FFF), cache =non-cacheable, length=269452KB(263MB),    name="engine"
       |-Block: phys(0xA66D8000, 0xA6B6FFFF), cache =non-cacheable, length=4704KB(4MB),    name="npu_m_../models/small-decoder-m"
       |-Block: phys(0xA6B70000, 0xA6B70FFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
       |-Block: phys(0xA6B71000, 0xA6B71FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
       |-Block: phys(0xA6B72000, 0xAA02EFFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
       |-Block: phys(0xAA02F000, 0xAD4EBFFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
       |-Block: phys(0xAD4EC000, 0xAD5B6FFF), cache =non-cacheable, length=812KB(0MB),    name="toolkit"
       |-Block: phys(0xAD5B7000, 0xAE577FFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
       |-Block: phys(0xAE578000, 0xAF538FFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
       |-Block: phys(0xAF539000, 0xBECBDFFF), cache =non-cacheable, length=253460KB(247MB),    name="engine"
       |-Block: phys(0xBECBE000, 0xBF1B6FFF), cache =non-cacheable, length=5092KB(4MB),    name="npu_m_../models/small-decoder-l"
       |-Block: phys(0xBF1B7000, 0xBF1B7FFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
       |-Block: phys(0xBF1B8000, 0xBF1B8FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
       |-Block: phys(0xBF1B9000, 0xC0179FFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
       |-Block: phys(0xC017A000, 0xC113AFFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
       |-Block: phys(0xC113B000, 0xC45F7FFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
       |-Block: phys(0xC45F8000, 0xC7AB4FFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
       |-Block: phys(0xC7AB5000, 0xC7AB5FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
       |-Block: phys(0xC7AB6000, 0xC7AB6FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
       |-Block: phys(0xC7AB7000, 0xC7AE9FFF), cache =non-cacheable, length=204KB(0MB),    name="toolkit"
       |-Block: phys(0xC7AEA000, 0xC8AAAFFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
       |-Block: phys(0xC8AAB000, 0xC9A6BFFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
    
    ---CMM_USE_INFO:
     total size=3145728KB(3072MB),used=1206704KB(1178MB + 432KB),remain=1939024KB(1893MB + 592KB),partition_number=1,block_number=42
    sh-5.1# cat /proc/ax_proc/mem_cmm_info

git diff

  • diff --git a/python/whisper.py b/python/whisper.py
    index 45e4b13..d8e01b5 100644
    --- a/python/whisper.py
    +++ b/python/whisper.py
    @@ -30,6 +30,7 @@ NEG_INF = float("-inf")
     SOT_SEQUENCE = np.array([WHISPER_SOT,WHISPER_SOT + 1 + tuple(WHISPER_LANGUAGES).index("zh"),WHISPER_TRANSCRIBE,WHISPER_NO_TIMESTAMPS], dtype=np.int32)
     WHISPER_N_TEXT_STATE_MAP = {
         "tiny": 384,
    +    "base": 512,
         "small": 768
     }
    
    @@ -40,7 +41,7 @@ def get_args():
             description="Run Whisper on input audio file"
         )
         parser.add_argument("--wav", "-w", type=str, required=True, help="Input audio file")
    -    parser.add_argument("--model_type", "-t", type=str, choices=["tiny", "small"], required=True, help="model type, only support tiny or small currently")
    +    parser.add_argument("--model_type", "-t", type=str, choices=["tiny", "base", "small"], required=True, help="model type, only support tiny or small currently")

export

  • tiny
    ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=384, n_audio_head=6, n_audio_layer=4, n_vocab=51865, n_text_ctx=448, n_text_state=384, n_text_head=6, n_text_layer=4)
    number of model parameters: tiny 37184640
    number of encoder parameters: tiny 7632384
    number of decoder parameters: tiny 29552256
  • base
    ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=512, n_audio_head=8, n_audio_layer=6, n_vocab=51865, n_text_ctx=448, n_text_state=512, n_text_head=8, n_text_layer=6)
    number of model parameters: base 71825920
    number of encoder parameters: base 19822592
    number of decoder parameters: base 52003328
  • small
    ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=768, n_audio_head=12, n_audio_layer=12, n_vocab=51865, n_text_ctx=448, n_text_state=768, n_text_head=12, n_text_layer=12)
    number of model parameters: small 240582912
    number of encoder parameters: small 87002112
    number of decoder parameters: small 153580800
  • medium
    ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=1024, n_audio_head=16, n_audio_layer=24, n_vocab=51865, n_text_ctx=448, n_text_state=1024, n_text_head=16, n_text_layer=24)
    number of model parameters: medium 762321920
    number of encoder parameters: medium 305680384
    number of decoder parameters: medium 456641536
  • large
    ModelDimensions(n_mels=128, n_audio_ctx=1500, n_audio_state=1280, n_audio_head=20, n_audio_layer=32, n_vocab=51866, n_text_ctx=448, n_text_state=1280, n_text_head=20, n_text_layer=32)
    number of model parameters: large 1541570560
    number of encoder parameters: large 635048960
    number of decoder parameters: large 906521600
  • export_onnx.pyでonnx.loadしてonnx.saveする処理をdecoder-mainにも追加しないと途中の細かいファイルがdecoder-loopのほうの処理の途中で上書きされてgenerateのときに読み込めなくなるので追加
    diff --git a/model_convert/export_onnx.py b/model_convert/export_onnx.py
    index eec7e0d..ec61bfc 100644
    --- a/model_convert/export_onnx.py
    +++ b/model_convert/export_onnx.py
    @@ -592,6 +592,17 @@ def main():
             # },
         )
     
    +    if "large" in args.model:
    +        decoder_external_filename = decoder_filename.split(".onnx")[0]
    +        decoder_model = onnx.load(decoder_filename)
    +        onnx.save(
    +            decoder_model,
    +            decoder_filename,
    +            save_as_external_data=True,
    +            all_tensors_to_one_file=True,
    +            location=decoder_external_filename + ".weights",
    +        )
    +
         logits, n_layer_self_k_cache, n_layer_self_v_cache = decoder(
             tokens,
             n_layer_self_k_cache,
    

generate_dataで生成されるフォルダ

  • calibrations_tiny3.3GB
    calibrations_base7.6GB
    calibrations_small21.1GB
    calibrations_medium60.2GB
    calibrations_large100.2GB

処理時間

tiny

  • 変換対象変換時間必要なメモリ
    tiny-encoder42分 (7940HX)21.9GBくらい
    tiny-decoder-main5分くらい (7940HX)6.5GBくらい
    tiny-decoder-loop5分くらい (7940HX)8.4GB

base

  • 変換対象変換時間必要なメモリ
    base-encoder1時間36分 (7940HX)45.1GBくらい
    base-decoder-main6分くらい (7940HX)8GBくらい
    base-decoder-loop10分くらい (7940HX)13.3GBくらい

small

  • 変換対象変換時間必要なメモリ
    small-encoder11時間 (EPYC 7742 ES)128.2GB
    small-decoder-main18分くらい (7940HX)14.3GB
    small-decoder-loop30分くらい (7940HX)19.7GB

medium

  • 変換対象変換時間必要なメモリ
    medium-encoder(未チャレンジ)(たぶん404GB以上必要)
    medium-decoder-main50分くらい (7940HX)26GBくらい
    medium-decoder-loop1時間25分くらい (7940HX)43.7GB

large

  • 変換対象変換時間必要なメモリ
    large-encoder(未チャレンジ)(たぶん850GB以上必要)
    large-decoder-main1時間40分くらい (7940HX)49.0GB
    large-decoder-loop2時間40分くらい (7940HX)75.0GB

変換時のメモリ使用量監視用

  • while true; do echo "$(date '+%Y-%m-%d %H:%M:%S') Memory Usage: $(free -m | awk '/^Mem:/ {print $3}') MB"; sleep 10; done
    while true; do echo "$(date '+%Y-%m-%d %H:%M:%S') Memory Usage: $(free -m | awk '/^Mem:/ {print $3}') MB, Cache Usage: $(free -m | awk '/^Mem:/ {print $6}') MB"; sleep 10; done