最終更新:2025-01-24 (金) 15:14:52 (14d)
transformers.AutoTokenizer
Top / transformers.AutoTokenizer
事前トレーニング済みのモデル語彙からライブラリのトークナイザー クラスの 1 つをインスタンス化します。
https://huggingface.co/docs/transformers/ja/model_doc/auto#transformers.AutoTokenizer
https://huggingface.co/docs/transformers/v4.46.3/ja/model_doc/auto#transformers.AutoTokenizer
AutoTokenizer.apply_chat_template
AutoTokenizer.from_pretrained
パラメータ
pretrained_model_name_or_path
- A string, the model id of a predefined tokenizer hosted inside a model repo on huggingface.co.
- A path to a directory containing vocabulary files required by the tokenizer, for instance saved using the save_pretrained() method, e.g., ./my_model_directory/.
- A path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (like Bert or XLNet), e.g.: ./my_model_directory/vocab.txt. (Not applicable to all derived classes)
パターンマッチ対象
albert AlbertTokenizer? or AlbertTokenizerFast? (ALBERT model) align BertTokenizer? or BertTokenizerFast? (ALIGN model) bark BertTokenizer? or BertTokenizerFast? (Bark model) bart BartTokenizer? or BartTokenizerFast? (BART model) barthez BarthezTokenizer? or BarthezTokenizerFast? (BARThez model) bartpho BartphoTokenizer? (BARTpho model) bert BertTokenizer? or BertTokenizerFast? (BERT model) bert-generation BertGenerationTokenizer? (Bert Generation model) bert-japanese BertJapaneseTokenizer? (BertJapanese? model) bertweet BertweetTokenizer? (BERTweet model) big_bird BigBirdTokenizer? or BigBirdTokenizerFast? (BigBird? model) bigbird_pegasus PegasusTokenizer? or PegasusTokenizerFast? (BigBird?-Pegasus model) biogpt BioGptTokenizer? (BioGpt? model) blenderbot BlenderbotTokenizer? or BlenderbotTokenizerFast? (Blenderbot model) blenderbot-small BlenderbotSmallTokenizer? (BlenderbotSmall? model) blip BertTokenizer? or BertTokenizerFast? (BLIP model) blip-2 GPT2Tokenizer or GPT2TokenizerFast? (BLIP-2 model) bloom BloomTokenizerFast? (BLOOM model) bridgetower RobertaTokenizer? or RobertaTokenizerFast? (BridgeTower? model) bros BertTokenizer? or BertTokenizerFast? (BROS model) byt5 ByT5Tokenizer (ByT5 model) camembert CamembertTokenizer? or CamembertTokenizerFast? (CamemBERT model) canine CanineTokenizer? (CANINE model) chameleon LlamaTokenizer? or LlamaTokenizerFast? (Chameleon model) chinese_clip BertTokenizer? or BertTokenizerFast? (Chinese-CLIP model) clap RobertaTokenizer? or RobertaTokenizerFast? (CLAP model) clip CLIPTokenizer or CLIPTokenizerFast? (CLIP model) clipseg CLIPTokenizer or CLIPTokenizerFast? (CLIPSeg model) clvp ClvpTokenizer? (CLVP model) code_llama CodeLlamaTokenizer? or CodeLlamaTokenizerFast? (CodeLlama? model) codegen CodeGenTokenizer? or CodeGenTokenizerFast? (CodeGen? model) cohere CohereTokenizerFast? (Cohere model) convbert ConvBertTokenizer? or ConvBertTokenizerFast? (ConvBERT model) cpm CpmTokenizer? or CpmTokenizerFast? (CPM model) cpmant CpmAntTokenizer? (CPM-Ant model) ctrl CTRLTokenizer (CTRL model) data2vec-audio Wav2Vec2CTCTokenizer (Data2VecAudio? model) data2vec-text RobertaTokenizer? or RobertaTokenizerFast? (Data2VecText? model) dbrx GPT2Tokenizer or GPT2TokenizerFast? (DBRX model) deberta DebertaTokenizer? or DebertaTokenizerFast? (DeBERTa model) deberta-v2 DebertaV2Tokenizer or DebertaV2TokenizerFast? (DeBERTa-v2 model) distilbert DistilBertTokenizer? or DistilBertTokenizerFast? (DistilBERT model) dpr DPRQuestionEncoderTokenizer? or DPRQuestionEncoderTokenizerFast? (DPR model) electra ElectraTokenizer? or ElectraTokenizerFast? (ELECTRA model) ernie BertTokenizer? or BertTokenizerFast? (ERNIE model) ernie_m ErnieMTokenizer (ErnieM model) esm EsmTokenizer? (ESM model) falcon PreTrainedTokenizerFast? (Falcon model) falcon_mamba GPTNeoXTokenizerFast? (FalconMamba? model) fastspeech2_conformer (FastSpeech2Conformer model) flaubert FlaubertTokenizer? (FlauBERT model) fnet FNetTokenizer? or FNetTokenizerFast? (FNet model) fsmt FSMTTokenizer (FairSeq? Machine-Translation model) funnel FunnelTokenizer? or FunnelTokenizerFast? (Funnel Transformer model) gemma GemmaTokenizer? or GemmaTokenizerFast? (Gemma model) gemma2 GemmaTokenizer? or GemmaTokenizerFast? (Gemma2 model) git BertTokenizer? or BertTokenizerFast? (GIT model) glm PreTrainedTokenizerFast? (GLM model) gpt-sw3 GPTSw3Tokenizer (GPT-Sw3 model) gpt2 GPT2Tokenizer or GPT2TokenizerFast? (OpenAI GPT-2 model) gpt_bigcode GPT2Tokenizer or GPT2TokenizerFast? (GPTBigCode? model) gpt_neo GPT2Tokenizer or GPT2TokenizerFast? (GPT Neo model) gpt_neox GPTNeoXTokenizerFast? (GPT NeoX model) gpt_neox_japanese GPTNeoXJapaneseTokenizer? (GPT NeoX Japanese model) gptj GPT2Tokenizer or GPT2TokenizerFast? (GPT-J model) gptsan-japanese GPTSanJapaneseTokenizer? (GPTSAN-japanese model) grounding-dino BertTokenizer? or BertTokenizerFast? (Grounding DINO model) groupvit CLIPTokenizer or CLIPTokenizerFast? (GroupViT model) herbert HerbertTokenizer? or HerbertTokenizerFast? (HerBERT model) hubert Wav2Vec2CTCTokenizer (Hubert model) ibert RobertaTokenizer? or RobertaTokenizerFast? (I-BERT model) idefics LlamaTokenizerFast? (IDEFICS model) idefics2 LlamaTokenizer? or LlamaTokenizerFast? (Idefics2 model) idefics3 LlamaTokenizer? or LlamaTokenizerFast? (Idefics3 model) instructblip GPT2Tokenizer or GPT2TokenizerFast? (InstructBLIP model) instructblipvideo GPT2Tokenizer or GPT2TokenizerFast? (InstructBlipVideo? model) jamba LlamaTokenizer? or LlamaTokenizerFast? (Jamba model) jetmoe LlamaTokenizer? or LlamaTokenizerFast? (JetMoe? model) jukebox JukeboxTokenizer? (Jukebox model) kosmos-2 XLMRobertaTokenizer? or XLMRobertaTokenizerFast? (KOSMOS-2 model) layoutlm LayoutLMTokenizer or LayoutLMTokenizerFast? (LayoutLM model) layoutlmv2 LayoutLMv2Tokenizer or LayoutLMv2TokenizerFast? (LayoutLMv2 model) layoutlmv3 LayoutLMv3Tokenizer or LayoutLMv3TokenizerFast? (LayoutLMv3 model) layoutxlm LayoutXLMTokenizer or LayoutXLMTokenizerFast? (LayoutXLM model) led LEDTokenizer or LEDTokenizerFast? (LED model) lilt LayoutLMv3Tokenizer or LayoutLMv3TokenizerFast? (LiLT model) llama LlamaTokenizer? or LlamaTokenizerFast? (LLaMA model) llava LlamaTokenizer? or LlamaTokenizerFast? (LLaVa? model) llava_next LlamaTokenizer? or LlamaTokenizerFast? (LLaVA-NeXT model) llava_next_video LlamaTokenizer? or LlamaTokenizerFast? (LLaVa?-NeXT-Video model) llava_onevision LlamaTokenizer? or LlamaTokenizerFast? (LLaVA-Onevision model) longformer LongformerTokenizer? or LongformerTokenizerFast? (Longformer model) longt5 T5Tokenizer or T5TokenizerFast? (LongT5 model) luke LukeTokenizer? (LUKE model) lxmert LxmertTokenizer? or LxmertTokenizerFast? (LXMERT model) m2m_100 M2M100Tokenizer (M2M100 model) mamba GPTNeoXTokenizerFast? (Mamba model) mamba2 GPTNeoXTokenizerFast? (mamba2 model) marian MarianTokenizer? (Marian model) mbart MBartTokenizer? or MBartTokenizerFast? (mBART model) mbart50 MBart50Tokenizer or MBart50TokenizerFast? (mBART-50 model) mega RobertaTokenizer? or RobertaTokenizerFast? (MEGA model) megatron-bert BertTokenizer? or BertTokenizerFast? (Megatron-BERT model) mgp-str MgpstrTokenizer? (MGP-STR model) mistral LlamaTokenizer? or LlamaTokenizerFast? (Mistral model) mixtral LlamaTokenizer? or LlamaTokenizerFast? (Mixtral model) mllama LlamaTokenizer? or LlamaTokenizerFast? (Mllama model) mluke MLukeTokenizer? (mLUKE model) mobilebert MobileBertTokenizer? or MobileBertTokenizerFast? (MobileBERT model) moshi PreTrainedTokenizerFast? (Moshi model) mpnet MPNetTokenizer? or MPNetTokenizerFast? (MPNet model) mpt GPTNeoXTokenizerFast? (MPT model) mra RobertaTokenizer? or RobertaTokenizerFast? (MRA model) mt5 MT5Tokenizer or MT5TokenizerFast? (MT5 model) musicgen T5Tokenizer or T5TokenizerFast? (MusicGen? model) musicgen_melody T5Tokenizer or T5TokenizerFast? (MusicGen? Melody model) mvp MvpTokenizer? or MvpTokenizerFast? (MVP model) myt5 MyT5Tokenizer (myt5 model) nezha BertTokenizer? or BertTokenizerFast? (Nezha model) nllb NllbTokenizer? or NllbTokenizerFast? (NLLB model) nllb-moe NllbTokenizer? or NllbTokenizerFast? (NLLB-MOE model) nystromformer AlbertTokenizer? or AlbertTokenizerFast? (Nyströmformer model) olmo GPTNeoXTokenizerFast? (OLMo model) olmoe GPTNeoXTokenizerFast? (OLMoE model) omdet-turbo CLIPTokenizer or CLIPTokenizerFast? (OmDet?-Turbo model) oneformer CLIPTokenizer or CLIPTokenizerFast? (OneFormer? model) openai-gpt OpenAIGPTTokenizer or OpenAIGPTTokenizerFast? (OpenAI GPT model) opt GPT2Tokenizer or GPT2TokenizerFast? (OPT model) owlv2 CLIPTokenizer or CLIPTokenizerFast? (OWLv2 model) owlvit CLIPTokenizer or CLIPTokenizerFast? (OWL-ViT model) paligemma LlamaTokenizer? or LlamaTokenizerFast? (PaliGemma? model) pegasus PegasusTokenizer? or PegasusTokenizerFast? (Pegasus model) pegasus_x PegasusTokenizer? or PegasusTokenizerFast? (PEGASUS-X model) perceiver PerceiverTokenizer? (Perceiver model) persimmon LlamaTokenizer? or LlamaTokenizerFast? (Persimmon model) phi CodeGenTokenizer? or CodeGenTokenizerFast? (Phi model) phi3 LlamaTokenizer? or LlamaTokenizerFast? (Phi3 model) phimoe LlamaTokenizer? or LlamaTokenizerFast? (Phimoe model) phobert PhobertTokenizer? (PhoBERT model) pix2struct T5Tokenizer or T5TokenizerFast? (Pix2Struct model) pixtral PreTrainedTokenizerFast? (Pixtral model) plbart PLBartTokenizer? (PLBart model) prophetnet ProphetNetTokenizer? (ProphetNet? model) qdqbert BertTokenizer? or BertTokenizerFast? (QDQBert model) qwen2 Qwen2Tokenizer or Qwen2TokenizerFast? (Qwen2 model) qwen2_audio Qwen2Tokenizer or Qwen2TokenizerFast? (Qwen2Audio model) qwen2_moe Qwen2Tokenizer or Qwen2TokenizerFast? (Qwen2MoE model) qwen2_vl Qwen2Tokenizer or Qwen2TokenizerFast? (Qwen2VL model) rag RagTokenizer? (RAG model) realm RealmTokenizer? or RealmTokenizerFast? (REALM model) recurrent_gemma GemmaTokenizer? or GemmaTokenizerFast? (RecurrentGemma? model) reformer ReformerTokenizer? or ReformerTokenizerFast? (Reformer model) rembert RemBertTokenizer? or RemBertTokenizerFast? (RemBERT model) retribert RetriBertTokenizer? or RetriBertTokenizerFast? (RetriBERT model) roberta RobertaTokenizer? or RobertaTokenizerFast? (RoBERTa model) roberta-prelayernorm RobertaTokenizer? or RobertaTokenizerFast? (RoBERTa-PreLayerNorm? model) roc_bert RoCBertTokenizer? (RoCBert model) roformer RoFormerTokenizer? or RoFormerTokenizerFast? (RoFormer? model) rwkv GPTNeoXTokenizerFast? (RWKV model) seamless_m4t SeamlessM4TTokenizer or SeamlessM4TTokenizerFast? (SeamlessM4T model) seamless_m4t_v2 SeamlessM4TTokenizer or SeamlessM4TTokenizerFast? (SeamlessM4Tv2 model) siglip SiglipTokenizer? (SigLIP model) speech_to_text Speech2TextTokenizer? (Speech2Text model) speech_to_text_2 Speech2Text2Tokenizer (Speech2Text2 model) speecht5 SpeechT5Tokenizer (SpeechT5 model) splinter SplinterTokenizer? or SplinterTokenizerFast? (Splinter model) squeezebert SqueezeBertTokenizer? or SqueezeBertTokenizerFast? (SqueezeBERT model) stablelm GPTNeoXTokenizerFast? (StableLm? model) starcoder2 GPT2Tokenizer or GPT2TokenizerFast? (Starcoder2 model) switch_transformers T5Tokenizer or T5TokenizerFast? (SwitchTransformers? model) t5 T5Tokenizer or T5TokenizerFast? (T5 model) tapas TapasTokenizer? (TAPAS model) tapex TapexTokenizer? (TAPEX model) transfo-xl TransfoXLTokenizer (Transformer-XL model) tvp BertTokenizer? or BertTokenizerFast? (TVP model) udop UdopTokenizer? or UdopTokenizerFast? (UDOP model) umt5 T5Tokenizer or T5TokenizerFast? (UMT5 model) video_llava LlamaTokenizer? or LlamaTokenizerFast? (VideoLlava? model) vilt BertTokenizer? or BertTokenizerFast? (ViLT model) vipllava LlamaTokenizer? or LlamaTokenizerFast? (VipLlava? model) visual_bert BertTokenizer? or BertTokenizerFast? (VisualBERT model) vits VitsTokenizer? (VITS model) wav2vec2 Wav2Vec2CTCTokenizer (Wav2Vec2 model) wav2vec2-bert Wav2Vec2CTCTokenizer (Wav2Vec2-BERT model) wav2vec2-conformer Wav2Vec2CTCTokenizer (Wav2Vec2-Conformer model) wav2vec2_phoneme Wav2Vec2PhonemeCTCTokenizer (Wav2Vec2Phoneme model) whisper WhisperTokenizer? or WhisperTokenizerFast? (Whisper model) xclip CLIPTokenizer or CLIPTokenizerFast? (X-CLIP model) xglm XGLMTokenizer or XGLMTokenizerFast? (XGLM model) xlm XLMTokenizer (XLM model) xlm-prophetnet XLMProphetNetTokenizer? (XLM-ProphetNet? model) xlm-roberta XLMRobertaTokenizer? or XLMRobertaTokenizerFast? (XLM-RoBERTa model) xlm-roberta-xl XLMRobertaTokenizer? or XLMRobertaTokenizerFast? (XLM-RoBERTa-XL model) xlnet XLNetTokenizer? or XLNetTokenizerFast? (XLNet model) xmod XLMRobertaTokenizer? or XLMRobertaTokenizerFast? (X-MOD model) yoso AlbertTokenizer? or AlbertTokenizerFast? (YOSO model) zamba LlamaTokenizer? or LlamaTokenizerFast? (Zamba model)