最終更新:2025-02-14 (金) 11:12:25 (138d)
Hugging Face/モデル
Top / Hugging Face / モデル
設定ファイル
Hugging Face/モデル/config.json
- モデルのアーキテクチャや設定情報
- model_type
- hidden_size
- num_hidden_layers
- vocab_size
Hugging Face/モデル/tokenizer.json
- Tokenizer 全体の設定やトークン ID マッピング
- transformers.PreTrainedTokenizerFast
Hugging Face/モデル/tokenizer_config.json
- トークナイザーの設定を保存したファイル
- transformers.PreTrainedTokenizer
- PreTrainedTokenizer.from_pretrained
- AutoTokenizer.from_pretrained
メモ
https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L144
https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L1962
# Slow tokenizers used to be saved in three separated files SPECIAL_TOKENS_MAP_FILE = "special_tokens_map.json" ADDED_TOKENS_FILE = "added_tokens.json" TOKENIZER_CONFIG_FILE = "tokenizer_config.json" CHAT_TEMPLATE_FILE = "chat_template.jinja" # Fast tokenizers (provided by HuggingFace tokenizer's library) can be saved in a single file FULL_TOKENIZER_FILE = "tokenizer.json"