最終更新:2025-02-02 (日) 21:37:15 (166d)

3.3-temp

概要

Pulsar2の3.3-tempがWhisper baseに対応したらしいので試したら動いた
smallも行けそうと思って変換したらメモリが128GBくらい必要だったがギリギリ変換できた (RAM128GBのVM)
- Ubuntu 22.04 on VirtualBox

まとめ

モデル	n_audio_state	.onnx			.axmodel			変換時メモリ使用量			動作時
モデル	n_audio_state	encoder	decoder-main	decoder-loop	encoder	decoder-main	decoder-loop	encoder	decoder-main	decoder-loop	CMM
tiny	384	37.6MB	118.3MB	112.8MB	27.8MB	91.8MB	89.1MB	21.9GB	6.5GB	8.4GB
base	512	95MB	205.5MB	194.6MB	56.0MB	135.7MB	130.4MB	45.1GB	8GB	13.3GB	461MB
small	768	409.4MB	589.2MB	556.4MB	212.2MB	280.7MB	264.8MB	128.2GB	14.3GB	19.7GB	1178MB
medium	1024	1.4GB	1.7GB	1.6GB	(未チャレンジ)	622.9MB	580.8MB	(たぶん404GB以上必要)	26GB	43.7GB
large	1280	3.0GB	3.2GB	3.2GB	(未チャレンジ)	1.1GB	1.0GB	(たぶん850GB以上必要)	49GB	75.0GB

↑

ビルドされたモデルのミラー (by @610t)

https://sacraya.610t.org/ModuleLLM/models/whisper/

↑

Pulsar2の3.3-temp (2024/12/27)

https://drive.google.com/drive/folders/1teeYnXrjq5eDyYn-Ak4iwaO13wuVFuiV
Support Whisper Base and MobileCLIP with AX630C

This release is only intended to temporarily support the following functions, which will be integrated in the next official release:

ax_pulsar2_temp_f0b32d03.tar.gz

AX650
- Qwen2.5 Int8 GPTQ

AX620E
- Whisper Base with NPU2 mode
- MobileCLIP with NPU2&NPU1 mode
https://x.com/qqc1989/status/1871892669112480247

↑

変換の準備

condaのためにMiniconda入れる
https://github.com/ml-inory/whisper.axera/tree/main/model_convert

↑

ホストのUbuntu上で実行

python export_onnx.py

python generate_data.py

$ python generate_data.py --model base
./datasets/aishell_S0764/BAC009S0764W0123.wav
0: 
但因为聚集了过多公共资源
但因为聚集了过多公共思缘
CER: 16.666666666666664%
./datasets/aishell_S0764/BAC009S0764W0122.wav
1: 
一二线城市虽然也处于调整中
12线尝试虽然也处于调整中
CER: 30.76923076923077%
./datasets/aishell_S0764/BAC009S0764W0121.wav
2: 
甚至出现交易几乎停滞的情况
甚至出现交易几乎停制的情况
CER: 7.6923076923076925%
./datasets/aishell_S0764/BAC009S0764W0125.wav
3: 
标杆房企必然调整市场战略
标杆防起必然调整市场大略
CER: 25.0%
./datasets/aishell_S0764/BAC009S0764W0124.wav
4: 
为了规避三四线城市明显过剩的市场风险
为了规避3次线成是明显郭盛的市场风险
CER: 33.33333333333333%

total CER: 23.52941176470588%
Save calibrations_base/encoder/mel/../mel.tar.gz
Save calibrations_base/decoder_main/tokens/../tokens.tar.gz
Save calibrations_base/decoder_main/n_layer_cross_k/../n_layer_cross_k.tar.gz
Save calibrations_base/decoder_main/n_layer_cross_v/../n_layer_cross_v.tar.gz
Save calibrations_base/decoder_loop/tokens/../tokens.tar.gz
Save calibrations_base/decoder_loop/n_layer_self_k_cache/../n_layer_self_k_cache.tar.gz
Save calibrations_base/decoder_loop/n_layer_self_v_cache/../n_layer_self_v_cache.tar.gz
Save calibrations_base/decoder_loop/n_layer_cross_k/../n_layer_cross_k.tar.gz
Save calibrations_base/decoder_loop/n_layer_cross_v/../n_layer_cross_v.tar.gz
Save calibrations_base/decoder_loop/positional_embedding/../positional_embedding.tar.gz
Save calibrations_base/decoder_loop/mask/../mask.tar.gz

↑

コンテナの起動

cd pulsar2

sudo docker load -i ax_pulsar2_temp_f0b32d03.tar.gz
sudo docker run -it --net host --rm -v $PWD:/data pulsar2:temp-f0b32d03

↑

config_whisper_*.jsonの書き換え

calibration_datasetのcalibrations_smallをcalibrations_baseに
npu_modeをNPU2に

↑

変換

pulsar2 build --input base-encoder.onnx --config config_whisper_encoder_u16.json --output_dir base_encoder --output_name base-encoder.axmodel --target_hardware AX620E --compiler.check 0
pulsar2 build --input base-decoder-main.onnx --config config_whisper_decoder_main_u16.json --output_dir base_decoder_main --output_name base-decoder-main.axmodel --target_hardware AX620E --compiler.check 0
pulsar2 build --input base-decoder-loop.onnx --config config_whisper_decoder_loop_u16.json --output_dir base_decoder_loop --output_name base-decoder-loop.axmodel --target_hardware AX620E --compiler.check 0

↑

実行

↑

base

https://github.com/ml-inory/whisper.axera#python-api-运行の手順でNumPyとか入れる

https://gist.github.com/nnn112358/0601b3a1dee256789f28efb486fe5519

└── python
    ├── info-girl1-goseichouarigatou1.mp3
    ├── info-girl1-goseichouarigatou1.wav
    └── whisper.py
├── models
│   ├── base-decoder-loop.axmodel
│   ├── base-decoder-main.axmodel
│   ├── base-encoder.axmodel
│   ├── base-positional_embedding.bin
│   ├── base-tokens.txt
│   ├── small-positional_embedding.bin
│   ├── small-tokens.txt
│   ├── tiny-positional_embedding.bin
│   └── tiny-tokens.txt

/root/.bashrc

export PYTHONPATH=$PYTHONPATH:/opt/site-packages/local/lib/python3.10/dist-packages  
export PATH=$PATH:/opt/site-packages/local/bin

cd /root/whisper.axera/python

$ . /root/.bashrc
root@m5stack-LLM:/root/whisper.axera/python# python3 whisper.py --wav info-girl1-goseichouarigatou1.wav --model_type base --language ja
wav: info-girl1-goseichouarigatou1.wav
model_type: base
model_path: ../models
language: ja
Run encoder take 941.7006969451904ms
Run decoder_main take 278.8667678833008ms
First token: 9991
Run decoder_loop take 311.3243579864502ms
Iter 0   Token: 11336
Run decoder_loop take 368.3631420135498ms
Iter 1   Token: 15353
Run decoder_loop take 243.71886253356934ms
Iter 2   Token: 1231
Run decoder_loop take 235.215425491333ms
Iter 3   Token: 38538
Run decoder_loop take 243.26157569885254ms
Iter 4   Token: 0
Run decoder_loop take 235.04376411437988ms
Iter 5   Token: 50257
Result: ご成長、ありがとうございました!

メモリ使用量

sh-5.1# cat /proc/ax_proc/mem_cmm_info
--------------------SDK VERSION-------------------
[Axera version]: ax_cmm V2.0.0_P7_20240513101106 May 13 2024 10:22:59 JK
+---PARTITION: Phys(0x80000000, 0x13FFFFFFF), Size=3145728KB(3072MB),    NAME="anonymous"
 nBlock(Max=42, Cur=42, New=29, Free=0)  nbytes(Max=483405824B(472076KB,461MB), Cur=483405824B(472076KB,461MB), New=482947072B(471628KB,460MB), Free=0B(0KB,0MB))  Block(Max=134078464B(130936KB,127MB), Min=4096B(4KB,0MB), Avg=4639608B(4530KB,4MB))
   |-Block: phys(0x80000000, 0x80000FFF), cache =non-cacheable, length=4KB(0MB),    name="dma"
   |-Block: phys(0x80001000, 0x80006FFF), cache =non-cacheable, length=24KB(0MB),    name="VPP_CMD0"
   |-Block: phys(0x80007000, 0x80007FFF), cache =non-cacheable, length=4KB(0MB),    name="VPP_CMD3"
   |-Block: phys(0x80008000, 0x80008FFF), cache =non-cacheable, length=4KB(0MB),    name="GDC_CMD3"
   |-Block: phys(0x80009000, 0x8000BFFF), cache =non-cacheable, length=12KB(0MB),    name="GDC_CMD0"
   |-Block: phys(0x8000C000, 0x8002CFFF), cache =non-cacheable, length=132KB(0MB),    name="TDP_CMD0"
   |-Block: phys(0x8002D000, 0x8002DFFF), cache =non-cacheable, length=4KB(0MB),    name="TDP_CMD3"
   |-Block: phys(0x8002E000, 0x8003DFFF), cache =non-cacheable, length=64KB(0MB),    name="venc_ko"
   |-Block: phys(0x8003E000, 0x8004DFFF), cache =non-cacheable, length=64KB(0MB),    name="venc_ko"
   |-Block: phys(0x8004E000, 0x8004EFFF), cache =non-cacheable, length=4KB(0MB),    name="venc_ko"
   |-Block: phys(0x8004F000, 0x8005EFFF), cache =non-cacheable, length=64KB(0MB),    name="jenc_ko"
   |-Block: phys(0x8005F000, 0x8006EFFF), cache =non-cacheable, length=64KB(0MB),    name="jenc_ko"
   |-Block: phys(0x8006F000, 0x8006FFFF), cache =non-cacheable, length=4KB(0MB),    name="jenc_ko"
   |-Block: phys(0x80070000, 0x81821FFF), cache =non-cacheable, length=24264KB(23MB),    name="engine"
   |-Block: phys(0x81822000, 0x835DDFFF), cache =non-cacheable, length=30448KB(29MB),    name="npu_m_../models/base-encoder.ax"
   |-Block: phys(0x835DE000, 0x84457FFF), cache =non-cacheable, length=14824KB(14MB),    name="npu_swap_0"
   |-Block: phys(0x84458000, 0x84458FFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
   |-Block: phys(0x84459000, 0x84543FFF), cache =non-cacheable, length=940KB(0MB),    name="toolkit"
   |-Block: phys(0x84544000, 0x856D8FFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
   |-Block: phys(0x856D9000, 0x8686DFFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
   |-Block: phys(0x8686E000, 0x8E84BFFF), cache =non-cacheable, length=130936KB(127MB),    name="engine"
   |-Block: phys(0x8E84C000, 0x8E9D1FFF), cache =non-cacheable, length=1560KB(1MB),    name="npu_m_../models/base-decoder-ma"
   |-Block: phys(0x8E9D2000, 0x8E9D2FFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
   |-Block: phys(0x8E9D3000, 0x8E9D3FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
   |-Block: phys(0x8E9D4000, 0x8FB68FFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
   |-Block: phys(0x8FB69000, 0x90CFDFFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
   |-Block: phys(0x90CFE000, 0x90DC8FFF), cache =non-cacheable, length=812KB(0MB),    name="toolkit"
   |-Block: phys(0x90DC9000, 0x91309FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
   |-Block: phys(0x9130A000, 0x9184AFFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
   |-Block: phys(0x9184B000, 0x992F3FFF), cache =non-cacheable, length=125604KB(122MB),    name="engine"
   |-Block: phys(0x992F4000, 0x9949DFFF), cache =non-cacheable, length=1704KB(1MB),    name="npu_m_../models/base-decoder-lo"
   |-Block: phys(0x9949E000, 0x9949EFFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
   |-Block: phys(0x9949F000, 0x9949FFFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
   |-Block: phys(0x994A0000, 0x999E0FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
   |-Block: phys(0x999E1000, 0x99F21FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
   |-Block: phys(0x99F22000, 0x9B0B6FFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
   |-Block: phys(0x9B0B7000, 0x9C24BFFF), cache =non-cacheable, length=18004KB(17MB),    name="toolkit"
   |-Block: phys(0x9C24C000, 0x9C24CFFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
   |-Block: phys(0x9C24D000, 0x9C24DFFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
   |-Block: phys(0x9C24E000, 0x9C280FFF), cache =non-cacheable, length=204KB(0MB),    name="toolkit"
   |-Block: phys(0x9C281000, 0x9C7C1FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"
   |-Block: phys(0x9C7C2000, 0x9CD02FFF), cache =non-cacheable, length=5380KB(5MB),    name="toolkit"

---CMM_USE_INFO:
 total size=3145728KB(3072MB),used=472076KB(461MB + 12KB),remain=2673652KB(2610MB + 1012KB),partition_number=1,block_number=42

↑

small

同様に変換 (encoderの変換にメモリが128GBくらい必要。128GBのマシンじゃギリギリかも？)

root@m5stack-LLM:/root/whisper.axera/python# python3 whisper.py --wav info-girl1-goseichouarigatou1.wav --model_type small --language ja
wav: info-girl1-goseichouarigatou1.wav
model_type: small
model_path: ../models
language: ja
Run encoder take 2951.171875ms
Run decoder_main take 1029.0307998657227ms
First token: 9991
Run decoder_loop take 1070.4200267791748ms
Iter 0   Token: 21784
Run decoder_loop take 952.1105289459229ms
Iter 1   Token: 8171
Run decoder_loop take 904.6242237091064ms
Iter 2   Token: 112
Run decoder_loop take 805.9420585632324ms
Iter 3   Token: 38538
Run decoder_loop take 861.4728450775146ms
Iter 4   Token: 50257
Result: ご清聴ありがとうございました

メモリ使用量

sh-5.1# cat /proc/ax_proc/mem_cmm_info
--------------------SDK VERSION-------------------
[Axera version]: ax_cmm V2.0.0_P7_20240513101106 May 13 2024 10:22:59 JK
+---PARTITION: Phys(0x80000000, 0x13FFFFFFF), Size=3145728KB(3072MB),    NAME="anonymous"
 nBlock(Max=42, Cur=42, New=29, Free=0)  nbytes(Max=1235664896B(1206704KB,1178MB), Cur=1235664896B(1206704KB,1178MB), New=1235206144B(1206256KB,1177MB), Free=0B(0KB,0MB))  Block(Max=275918848B(269452KB,263MB), Min=4096B(4KB,0MB), Avg=13843958B(13519KB,13MB))
   |-Block: phys(0x80000000, 0x80000FFF), cache =non-cacheable, length=4KB(0MB),    name="dma"
   |-Block: phys(0x80001000, 0x80006FFF), cache =non-cacheable, length=24KB(0MB),    name="VPP_CMD0"
   |-Block: phys(0x80007000, 0x80007FFF), cache =non-cacheable, length=4KB(0MB),    name="VPP_CMD3"
   |-Block: phys(0x80008000, 0x80008FFF), cache =non-cacheable, length=4KB(0MB),    name="GDC_CMD3"
   |-Block: phys(0x80009000, 0x8000BFFF), cache =non-cacheable, length=12KB(0MB),    name="GDC_CMD0"
   |-Block: phys(0x8000C000, 0x8002CFFF), cache =non-cacheable, length=132KB(0MB),    name="TDP_CMD0"
   |-Block: phys(0x8002D000, 0x8002DFFF), cache =non-cacheable, length=4KB(0MB),    name="TDP_CMD3"
   |-Block: phys(0x8002E000, 0x8003DFFF), cache =non-cacheable, length=64KB(0MB),    name="venc_ko"
   |-Block: phys(0x8003E000, 0x8004DFFF), cache =non-cacheable, length=64KB(0MB),    name="venc_ko"
   |-Block: phys(0x8004E000, 0x8004EFFF), cache =non-cacheable, length=4KB(0MB),    name="venc_ko"
   |-Block: phys(0x8004F000, 0x8005EFFF), cache =non-cacheable, length=64KB(0MB),    name="jenc_ko"
   |-Block: phys(0x8005F000, 0x8006EFFF), cache =non-cacheable, length=64KB(0MB),    name="jenc_ko"
   |-Block: phys(0x8006F000, 0x8006FFFF), cache =non-cacheable, length=4KB(0MB),    name="jenc_ko"
   |-Block: phys(0x80070000, 0x87936FFF), cache =non-cacheable, length=123676KB(120MB),    name="engine"
   |-Block: phys(0x87937000, 0x8CACBFFF), cache =non-cacheable, length=83540KB(81MB),    name="npu_m_../models/small-encoder.a"
   |-Block: phys(0x8CACC000, 0x8F54EFFF), cache =non-cacheable, length=43532KB(42MB),    name="npu_swap_0"
   |-Block: phys(0x8F54F000, 0x8F54FFFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
   |-Block: phys(0x8F550000, 0x8F63AFFF), cache =non-cacheable, length=940KB(0MB),    name="toolkit"
   |-Block: phys(0x8F63B000, 0x92AF7FFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
   |-Block: phys(0x92AF8000, 0x95FB4FFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
   |-Block: phys(0x95FB5000, 0xA66D7FFF), cache =non-cacheable, length=269452KB(263MB),    name="engine"
   |-Block: phys(0xA66D8000, 0xA6B6FFFF), cache =non-cacheable, length=4704KB(4MB),    name="npu_m_../models/small-decoder-m"
   |-Block: phys(0xA6B70000, 0xA6B70FFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
   |-Block: phys(0xA6B71000, 0xA6B71FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
   |-Block: phys(0xA6B72000, 0xAA02EFFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
   |-Block: phys(0xAA02F000, 0xAD4EBFFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
   |-Block: phys(0xAD4EC000, 0xAD5B6FFF), cache =non-cacheable, length=812KB(0MB),    name="toolkit"
   |-Block: phys(0xAD5B7000, 0xAE577FFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
   |-Block: phys(0xAE578000, 0xAF538FFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
   |-Block: phys(0xAF539000, 0xBECBDFFF), cache =non-cacheable, length=253460KB(247MB),    name="engine"
   |-Block: phys(0xBECBE000, 0xBF1B6FFF), cache =non-cacheable, length=5092KB(4MB),    name="npu_m_../models/small-decoder-l"
   |-Block: phys(0xBF1B7000, 0xBF1B7FFF), cache =non-cacheable, length=4KB(0MB),    name="engine"
   |-Block: phys(0xBF1B8000, 0xBF1B8FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
   |-Block: phys(0xBF1B9000, 0xC0179FFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
   |-Block: phys(0xC017A000, 0xC113AFFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
   |-Block: phys(0xC113B000, 0xC45F7FFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
   |-Block: phys(0xC45F8000, 0xC7AB4FFF), cache =non-cacheable, length=54004KB(52MB),    name="toolkit"
   |-Block: phys(0xC7AB5000, 0xC7AB5FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
   |-Block: phys(0xC7AB6000, 0xC7AB6FFF), cache =non-cacheable, length=4KB(0MB),    name="toolkit"
   |-Block: phys(0xC7AB7000, 0xC7AE9FFF), cache =non-cacheable, length=204KB(0MB),    name="toolkit"
   |-Block: phys(0xC7AEA000, 0xC8AAAFFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"
   |-Block: phys(0xC8AAB000, 0xC9A6BFFF), cache =non-cacheable, length=16132KB(15MB),    name="toolkit"

---CMM_USE_INFO:
 total size=3145728KB(3072MB),used=1206704KB(1178MB + 432KB),remain=1939024KB(1893MB + 592KB),partition_number=1,block_number=42
sh-5.1# cat /proc/ax_proc/mem_cmm_info

↑

git diff

diff --git a/python/whisper.py b/python/whisper.py
index 45e4b13..d8e01b5 100644
--- a/python/whisper.py
+++ b/python/whisper.py
@@ -30,6 +30,7 @@ NEG_INF = float("-inf")
 SOT_SEQUENCE = np.array([WHISPER_SOT,WHISPER_SOT + 1 + tuple(WHISPER_LANGUAGES).index("zh"),WHISPER_TRANSCRIBE,WHISPER_NO_TIMESTAMPS], dtype=np.int32)
 WHISPER_N_TEXT_STATE_MAP = {
     "tiny": 384,
+    "base": 512,
     "small": 768
 }

@@ -40,7 +41,7 @@ def get_args():
         description="Run Whisper on input audio file"
     )
     parser.add_argument("--wav", "-w", type=str, required=True, help="Input audio file")
-    parser.add_argument("--model_type", "-t", type=str, choices=["tiny", "small"], required=True, help="model type, only support tiny or small currently")
+    parser.add_argument("--model_type", "-t", type=str, choices=["tiny", "base", "small"], required=True, help="model type, only support tiny or small currently")

↑

export

tiny

ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=384, n_audio_head=6, n_audio_layer=4, n_vocab=51865, n_text_ctx=448, n_text_state=384, n_text_head=6, n_text_layer=4)
number of model parameters: tiny 37184640
number of encoder parameters: tiny 7632384
number of decoder parameters: tiny 29552256

base

ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=512, n_audio_head=8, n_audio_layer=6, n_vocab=51865, n_text_ctx=448, n_text_state=512, n_text_head=8, n_text_layer=6)
number of model parameters: base 71825920
number of encoder parameters: base 19822592
number of decoder parameters: base 52003328

small

ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=768, n_audio_head=12, n_audio_layer=12, n_vocab=51865, n_text_ctx=448, n_text_state=768, n_text_head=12, n_text_layer=12)
number of model parameters: small 240582912
number of encoder parameters: small 87002112
number of decoder parameters: small 153580800

medium

ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=1024, n_audio_head=16, n_audio_layer=24, n_vocab=51865, n_text_ctx=448, n_text_state=1024, n_text_head=16, n_text_layer=24)
number of model parameters: medium 762321920
number of encoder parameters: medium 305680384
number of decoder parameters: medium 456641536

large

ModelDimensions(n_mels=128, n_audio_ctx=1500, n_audio_state=1280, n_audio_head=20, n_audio_layer=32, n_vocab=51866, n_text_ctx=448, n_text_state=1280, n_text_head=20, n_text_layer=32)
number of model parameters: large 1541570560
number of encoder parameters: large 635048960
number of decoder parameters: large 906521600

export_onnx.pyでonnx.loadしてonnx.saveする処理をdecoder-mainにも追加しないと途中の細かいファイルがdecoder-loopのほうの処理の途中で上書きされてgenerateのときに読み込めなくなるので追加

diff --git a/model_convert/export_onnx.py b/model_convert/export_onnx.py
index eec7e0d..ec61bfc 100644
--- a/model_convert/export_onnx.py
+++ b/model_convert/export_onnx.py
@@ -592,6 +592,17 @@ def main():
         # },
     )
 
+    if "large" in args.model:
+        decoder_external_filename = decoder_filename.split(".onnx")[0]
+        decoder_model = onnx.load(decoder_filename)
+        onnx.save(
+            decoder_model,
+            decoder_filename,
+            save_as_external_data=True,
+            all_tensors_to_one_file=True,
+            location=decoder_external_filename + ".weights",
+        )
+
     logits, n_layer_self_k_cache, n_layer_self_v_cache = decoder(
         tokens,
         n_layer_self_k_cache,

↑

generate_dataで生成されるフォルダ

calibrations_tiny 3.3GB
calibrations_base 7.6GB
calibrations_small 21.1GB
calibrations_medium 60.2GB
calibrations_large 100.2GB

↑

処理時間

↑

tiny

変換対象変換時間必要なメモリ
tiny-encoder 42分 (7940HX) 21.9GBくらい
tiny-decoder-main 5分くらい (7940HX) 6.5GBくらい
tiny-decoder-loop 5分くらい (7940HX) 8.4GB

変換対象	変換時間	必要なメモリ
tiny-encoder	42分 (7940HX)	21.9GBくらい
tiny-decoder-main	5分くらい (7940HX)	6.5GBくらい
tiny-decoder-loop	5分くらい (7940HX)	8.4GB

↑

base

変換対象変換時間必要なメモリ
base-encoder 1時間36分 (7940HX) 45.1GBくらい
base-decoder-main 6分くらい (7940HX) 8GBくらい
base-decoder-loop 10分くらい (7940HX) 13.3GBくらい

変換対象	変換時間	必要なメモリ
base-encoder	1時間36分 (7940HX)	45.1GBくらい
base-decoder-main	6分くらい (7940HX)	8GBくらい
base-decoder-loop	10分くらい (7940HX)	13.3GBくらい

↑

small

変換対象変換時間必要なメモリ
small-encoder 11時間 (EPYC 7742 ES) 128.2GB
small-decoder-main 18分くらい (7940HX) 14.3GB
small-decoder-loop 30分くらい (7940HX) 19.7GB

変換対象	変換時間	必要なメモリ
small-encoder	11時間 (EPYC 7742 ES)	128.2GB
small-decoder-main	18分くらい (7940HX)	14.3GB
small-decoder-loop	30分くらい (7940HX)	19.7GB

↑

medium

変換対象	変換時間	必要なメモリ
medium-encoder	(未チャレンジ)	(たぶん404GB以上必要)
medium-decoder-main	50分くらい (7940HX)	26GBくらい
medium-decoder-loop	1時間25分くらい (7940HX)	43.7GB

↑

large

変換対象	変換時間	必要なメモリ
large-encoder	(未チャレンジ)	(たぶん850GB以上必要)
large-decoder-main	1時間40分くらい (7940HX)	49.0GB
large-decoder-loop	2時間40分くらい (7940HX)	75.0GB

↑

変換時のメモリ使用量監視用

while true; do echo "$(date '+%Y-%m-%d %H:%M:%S') Memory Usage: $(free -m | awk '/^Mem:/ {print $3}') MB"; sleep 10; done

while true; do echo "$(date '+%Y-%m-%d %H:%M:%S') Memory Usage: $(free -m | awk '/^Mem:/ {print $3}') MB, Cache Usage: $(free -m | awk '/^Mem:/ {print $6}') MB"; sleep 10; done

検索

クイックアクセス

チラ裏

リンク

人気の50件

whisper.axera
whisper.axera/3.3-temp
Top / whisper.axera / 3.3-temp

概要

まとめ

ビルドされたモデルのミラー (by @610t)

Pulsar2の3.3-temp (2024/12/27)

変換の準備

ホストのUbuntu上で実行

コンテナの起動

config_whisper_*.jsonの書き換え

変換

実行

base

small

git diff

export

generate_dataで生成されるフォルダ

処理時間

tiny

base

small

medium

large

変換時のメモリ使用量監視用

最新の100件

calibrations_tiny	3.3GB
calibrations_base	7.6GB
calibrations_small	21.1GB
calibrations_medium	60.2GB
calibrations_large	100.2GB

検索

クイックアクセス

チラ裏

リンク

人気の50件

whisper.axerawhisper.axera/3.3-temp Top / whisper.axera / 3.3-temp

概要

まとめ

ビルドされたモデルのミラー (by @610t)

Pulsar2の3.3-temp (2024/12/27)

変換の準備

ホストのUbuntu上で実行

コンテナの起動

config_whisper_*.jsonの書き換え

変換

実行

base

small

git diff

export

generate_dataで生成されるフォルダ

処理時間

tiny

base

small

medium

large

変換時のメモリ使用量監視用

最新の100件

whisper.axera
whisper.axera/3.3-temp
Top / whisper.axera / 3.3-temp