最終更新:2025-01-24 (金) 15:14:52 (14d)  

transformers.AutoTokenizer
Top / transformers.AutoTokenizer

事前トレーニング済みのモデル語彙からライブラリのトークナイザー クラスの 1 つをインスタンス化します。

https://huggingface.co/docs/transformers/ja/model_doc/auto#transformers.AutoTokenizer

https://huggingface.co/docs/transformers/v4.46.3/ja/model_doc/auto#transformers.AutoTokenizer

AutoTokenizer.apply_chat_template

AutoTokenizer.from_pretrained

パラメータ

pretrained_model_name_or_path

  • A string, the model id of a predefined tokenizer hosted inside a model repo on huggingface.co.
  • A path to a directory containing vocabulary files required by the tokenizer, for instance saved using the save_pretrained() method, e.g., ./my_model_directory/.
  • A path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (like Bert or XLNet), e.g.: ./my_model_directory/vocab.txt. (Not applicable to all derived classes)

パターンマッチ対象

  • albertAlbertTokenizer? or AlbertTokenizerFast? (ALBERT model)
    alignBertTokenizer? or BertTokenizerFast? (ALIGN model)
    barkBertTokenizer? or BertTokenizerFast? (Bark model)
    bartBartTokenizer? or BartTokenizerFast? (BART model)
    barthezBarthezTokenizer? or BarthezTokenizerFast? (BARThez model)
    bartphoBartphoTokenizer? (BARTpho model)
    bertBertTokenizer? or BertTokenizerFast? (BERT model)
    bert-generationBertGenerationTokenizer? (Bert Generation model)
    bert-japaneseBertJapaneseTokenizer? (BertJapanese? model)
    bertweetBertweetTokenizer? (BERTweet model)
    big_birdBigBirdTokenizer? or BigBirdTokenizerFast? (BigBird? model)
    bigbird_pegasusPegasusTokenizer? or PegasusTokenizerFast? (BigBird?-Pegasus model)
    biogptBioGptTokenizer? (BioGpt? model)
    blenderbotBlenderbotTokenizer? or BlenderbotTokenizerFast? (Blenderbot model)
    blenderbot-smallBlenderbotSmallTokenizer? (BlenderbotSmall? model)
    blipBertTokenizer? or BertTokenizerFast? (BLIP model)
    blip-2GPT2Tokenizer or GPT2TokenizerFast? (BLIP-2 model)
    bloomBloomTokenizerFast? (BLOOM model)
    bridgetowerRobertaTokenizer? or RobertaTokenizerFast? (BridgeTower? model)
    brosBertTokenizer? or BertTokenizerFast? (BROS model)
    byt5ByT5Tokenizer (ByT5 model)
    camembertCamembertTokenizer? or CamembertTokenizerFast? (CamemBERT model)
    canineCanineTokenizer? (CANINE model)
    chameleonLlamaTokenizer? or LlamaTokenizerFast? (Chameleon model)
    chinese_clipBertTokenizer? or BertTokenizerFast? (Chinese-CLIP model)
    clapRobertaTokenizer? or RobertaTokenizerFast? (CLAP model)
    clipCLIPTokenizer or CLIPTokenizerFast? (CLIP model)
    clipsegCLIPTokenizer or CLIPTokenizerFast? (CLIPSeg model)
    clvpClvpTokenizer? (CLVP model)
    code_llamaCodeLlamaTokenizer? or CodeLlamaTokenizerFast? (CodeLlama? model)
    codegenCodeGenTokenizer? or CodeGenTokenizerFast? (CodeGen? model)
    cohereCohereTokenizerFast? (Cohere model)
    convbertConvBertTokenizer? or ConvBertTokenizerFast? (ConvBERT model)
    cpmCpmTokenizer? or CpmTokenizerFast? (CPM model)
    cpmantCpmAntTokenizer? (CPM-Ant model)
    ctrlCTRLTokenizer (CTRL model)
    data2vec-audioWav2Vec2CTCTokenizer (Data2VecAudio? model)
    data2vec-textRobertaTokenizer? or RobertaTokenizerFast? (Data2VecText? model)
    dbrxGPT2Tokenizer or GPT2TokenizerFast? (DBRX model)
    debertaDebertaTokenizer? or DebertaTokenizerFast? (DeBERTa model)
    deberta-v2DebertaV2Tokenizer or DebertaV2TokenizerFast? (DeBERTa-v2 model)
    distilbertDistilBertTokenizer? or DistilBertTokenizerFast? (DistilBERT model)
    dprDPRQuestionEncoderTokenizer? or DPRQuestionEncoderTokenizerFast? (DPR model)
    electraElectraTokenizer? or ElectraTokenizerFast? (ELECTRA model)
    ernieBertTokenizer? or BertTokenizerFast? (ERNIE model)
    ernie_mErnieMTokenizer (ErnieM model)
    esmEsmTokenizer? (ESM model)
    falconPreTrainedTokenizerFast? (Falcon model)
    falcon_mambaGPTNeoXTokenizerFast? (FalconMamba? model)
    fastspeech2_conformer(FastSpeech2Conformer model)
    flaubertFlaubertTokenizer? (FlauBERT model)
    fnetFNetTokenizer? or FNetTokenizerFast? (FNet model)
    fsmtFSMTTokenizer (FairSeq? Machine-Translation model)
    funnelFunnelTokenizer? or FunnelTokenizerFast? (Funnel Transformer model)
    gemmaGemmaTokenizer? or GemmaTokenizerFast? (Gemma model)
    gemma2GemmaTokenizer? or GemmaTokenizerFast? (Gemma2 model)
    gitBertTokenizer? or BertTokenizerFast? (GIT model)
    glmPreTrainedTokenizerFast? (GLM model)
    gpt-sw3GPTSw3Tokenizer (GPT-Sw3 model)
    gpt2GPT2Tokenizer or GPT2TokenizerFast? (OpenAI GPT-2 model)
    gpt_bigcodeGPT2Tokenizer or GPT2TokenizerFast? (GPTBigCode? model)
    gpt_neoGPT2Tokenizer or GPT2TokenizerFast? (GPT Neo model)
    gpt_neoxGPTNeoXTokenizerFast? (GPT NeoX model)
    gpt_neox_japaneseGPTNeoXJapaneseTokenizer? (GPT NeoX Japanese model)
    gptjGPT2Tokenizer or GPT2TokenizerFast? (GPT-J model)
    gptsan-japaneseGPTSanJapaneseTokenizer? (GPTSAN-japanese model)
    grounding-dinoBertTokenizer? or BertTokenizerFast? (Grounding DINO model)
    groupvitCLIPTokenizer or CLIPTokenizerFast? (GroupViT model)
    herbertHerbertTokenizer? or HerbertTokenizerFast? (HerBERT model)
    hubertWav2Vec2CTCTokenizer (Hubert model)
    ibertRobertaTokenizer? or RobertaTokenizerFast? (I-BERT model)
    ideficsLlamaTokenizerFast? (IDEFICS model)
    idefics2LlamaTokenizer? or LlamaTokenizerFast? (Idefics2 model)
    idefics3LlamaTokenizer? or LlamaTokenizerFast? (Idefics3 model)
    instructblipGPT2Tokenizer or GPT2TokenizerFast? (InstructBLIP model)
    instructblipvideoGPT2Tokenizer or GPT2TokenizerFast? (InstructBlipVideo? model)
    jambaLlamaTokenizer? or LlamaTokenizerFast? (Jamba model)
    jetmoeLlamaTokenizer? or LlamaTokenizerFast? (JetMoe? model)
    jukeboxJukeboxTokenizer? (Jukebox model)
    kosmos-2XLMRobertaTokenizer? or XLMRobertaTokenizerFast? (KOSMOS-2 model)
    layoutlmLayoutLMTokenizer or LayoutLMTokenizerFast? (LayoutLM model)
    layoutlmv2LayoutLMv2Tokenizer or LayoutLMv2TokenizerFast? (LayoutLMv2 model)
    layoutlmv3LayoutLMv3Tokenizer or LayoutLMv3TokenizerFast? (LayoutLMv3 model)
    layoutxlmLayoutXLMTokenizer or LayoutXLMTokenizerFast? (LayoutXLM model)
    ledLEDTokenizer or LEDTokenizerFast? (LED model)
    liltLayoutLMv3Tokenizer or LayoutLMv3TokenizerFast? (LiLT model)
    llamaLlamaTokenizer? or LlamaTokenizerFast? (LLaMA model)
    llavaLlamaTokenizer? or LlamaTokenizerFast? (LLaVa? model)
    llava_nextLlamaTokenizer? or LlamaTokenizerFast? (LLaVA-NeXT model)
    llava_next_videoLlamaTokenizer? or LlamaTokenizerFast? (LLaVa?-NeXT-Video model)
    llava_onevisionLlamaTokenizer? or LlamaTokenizerFast? (LLaVA-Onevision model)
    longformerLongformerTokenizer? or LongformerTokenizerFast? (Longformer model)
    longt5T5Tokenizer or T5TokenizerFast? (LongT5 model)
    lukeLukeTokenizer? (LUKE model)
    lxmertLxmertTokenizer? or LxmertTokenizerFast? (LXMERT model)
    m2m_100M2M100Tokenizer (M2M100 model)
    mambaGPTNeoXTokenizerFast? (Mamba model)
    mamba2GPTNeoXTokenizerFast? (mamba2 model)
    marianMarianTokenizer? (Marian model)
    mbartMBartTokenizer? or MBartTokenizerFast? (mBART model)
    mbart50MBart50Tokenizer or MBart50TokenizerFast? (mBART-50 model)
    megaRobertaTokenizer? or RobertaTokenizerFast? (MEGA model)
    megatron-bertBertTokenizer? or BertTokenizerFast? (Megatron-BERT model)
    mgp-strMgpstrTokenizer? (MGP-STR model)
    mistralLlamaTokenizer? or LlamaTokenizerFast? (Mistral model)
    mixtralLlamaTokenizer? or LlamaTokenizerFast? (Mixtral model)
    mllamaLlamaTokenizer? or LlamaTokenizerFast? (Mllama model)
    mlukeMLukeTokenizer? (mLUKE model)
    mobilebertMobileBertTokenizer? or MobileBertTokenizerFast? (MobileBERT model)
    moshiPreTrainedTokenizerFast? (Moshi model)
    mpnetMPNetTokenizer? or MPNetTokenizerFast? (MPNet model)
    mptGPTNeoXTokenizerFast? (MPT model)
    mraRobertaTokenizer? or RobertaTokenizerFast? (MRA model)
    mt5MT5Tokenizer or MT5TokenizerFast? (MT5 model)
    musicgenT5Tokenizer or T5TokenizerFast? (MusicGen? model)
    musicgen_melodyT5Tokenizer or T5TokenizerFast? (MusicGen? Melody model)
    mvpMvpTokenizer? or MvpTokenizerFast? (MVP model)
    myt5MyT5Tokenizer (myt5 model)
    nezhaBertTokenizer? or BertTokenizerFast? (Nezha model)
    nllbNllbTokenizer? or NllbTokenizerFast? (NLLB model)
    nllb-moeNllbTokenizer? or NllbTokenizerFast? (NLLB-MOE model)
    nystromformerAlbertTokenizer? or AlbertTokenizerFast? (Nyströmformer model)
    olmoGPTNeoXTokenizerFast? (OLMo model)
    olmoeGPTNeoXTokenizerFast? (OLMoE model)
    omdet-turboCLIPTokenizer or CLIPTokenizerFast? (OmDet?-Turbo model)
    oneformerCLIPTokenizer or CLIPTokenizerFast? (OneFormer? model)
    openai-gptOpenAIGPTTokenizer or OpenAIGPTTokenizerFast? (OpenAI GPT model)
    optGPT2Tokenizer or GPT2TokenizerFast? (OPT model)
    owlv2CLIPTokenizer or CLIPTokenizerFast? (OWLv2 model)
    owlvitCLIPTokenizer or CLIPTokenizerFast? (OWL-ViT model)
    paligemmaLlamaTokenizer? or LlamaTokenizerFast? (PaliGemma? model)
    pegasusPegasusTokenizer? or PegasusTokenizerFast? (Pegasus model)
    pegasus_xPegasusTokenizer? or PegasusTokenizerFast? (PEGASUS-X model)
    perceiverPerceiverTokenizer? (Perceiver model)
    persimmonLlamaTokenizer? or LlamaTokenizerFast? (Persimmon model)
    phiCodeGenTokenizer? or CodeGenTokenizerFast? (Phi model)
    phi3LlamaTokenizer? or LlamaTokenizerFast? (Phi3 model)
    phimoeLlamaTokenizer? or LlamaTokenizerFast? (Phimoe model)
    phobertPhobertTokenizer? (PhoBERT model)
    pix2structT5Tokenizer or T5TokenizerFast? (Pix2Struct model)
    pixtralPreTrainedTokenizerFast? (Pixtral model)
    plbartPLBartTokenizer? (PLBart model)
    prophetnetProphetNetTokenizer? (ProphetNet? model)
    qdqbertBertTokenizer? or BertTokenizerFast? (QDQBert model)
    qwen2Qwen2Tokenizer or Qwen2TokenizerFast? (Qwen2 model)
    qwen2_audioQwen2Tokenizer or Qwen2TokenizerFast? (Qwen2Audio model)
    qwen2_moeQwen2Tokenizer or Qwen2TokenizerFast? (Qwen2MoE model)
    qwen2_vlQwen2Tokenizer or Qwen2TokenizerFast? (Qwen2VL model)
    ragRagTokenizer? (RAG model)
    realmRealmTokenizer? or RealmTokenizerFast? (REALM model)
    recurrent_gemmaGemmaTokenizer? or GemmaTokenizerFast? (RecurrentGemma? model)
    reformerReformerTokenizer? or ReformerTokenizerFast? (Reformer model)
    rembertRemBertTokenizer? or RemBertTokenizerFast? (RemBERT model)
    retribertRetriBertTokenizer? or RetriBertTokenizerFast? (RetriBERT model)
    robertaRobertaTokenizer? or RobertaTokenizerFast? (RoBERTa model)
    roberta-prelayernormRobertaTokenizer? or RobertaTokenizerFast? (RoBERTa-PreLayerNorm? model)
    roc_bertRoCBertTokenizer? (RoCBert model)
    roformerRoFormerTokenizer? or RoFormerTokenizerFast? (RoFormer? model)
    rwkvGPTNeoXTokenizerFast? (RWKV model)
    seamless_m4tSeamlessM4TTokenizer or SeamlessM4TTokenizerFast? (SeamlessM4T model)
    seamless_m4t_v2SeamlessM4TTokenizer or SeamlessM4TTokenizerFast? (SeamlessM4Tv2 model)
    siglipSiglipTokenizer? (SigLIP model)
    speech_to_textSpeech2TextTokenizer? (Speech2Text model)
    speech_to_text_2Speech2Text2Tokenizer (Speech2Text2 model)
    speecht5SpeechT5Tokenizer (SpeechT5 model)
    splinterSplinterTokenizer? or SplinterTokenizerFast? (Splinter model)
    squeezebertSqueezeBertTokenizer? or SqueezeBertTokenizerFast? (SqueezeBERT model)
    stablelmGPTNeoXTokenizerFast? (StableLm? model)
    starcoder2GPT2Tokenizer or GPT2TokenizerFast? (Starcoder2 model)
    switch_transformersT5Tokenizer or T5TokenizerFast? (SwitchTransformers? model)
    t5T5Tokenizer or T5TokenizerFast? (T5 model)
    tapasTapasTokenizer? (TAPAS model)
    tapexTapexTokenizer? (TAPEX model)
    transfo-xlTransfoXLTokenizer (Transformer-XL model)
    tvpBertTokenizer? or BertTokenizerFast? (TVP model)
    udopUdopTokenizer? or UdopTokenizerFast? (UDOP model)
    umt5T5Tokenizer or T5TokenizerFast? (UMT5 model)
    video_llavaLlamaTokenizer? or LlamaTokenizerFast? (VideoLlava? model)
    viltBertTokenizer? or BertTokenizerFast? (ViLT model)
    vipllavaLlamaTokenizer? or LlamaTokenizerFast? (VipLlava? model)
    visual_bertBertTokenizer? or BertTokenizerFast? (VisualBERT model)
    vitsVitsTokenizer? (VITS model)
    wav2vec2Wav2Vec2CTCTokenizer (Wav2Vec2 model)
    wav2vec2-bertWav2Vec2CTCTokenizer (Wav2Vec2-BERT model)
    wav2vec2-conformerWav2Vec2CTCTokenizer (Wav2Vec2-Conformer model)
    wav2vec2_phonemeWav2Vec2PhonemeCTCTokenizer (Wav2Vec2Phoneme model)
    whisperWhisperTokenizer? or WhisperTokenizerFast? (Whisper model)
    xclipCLIPTokenizer or CLIPTokenizerFast? (X-CLIP model)
    xglmXGLMTokenizer or XGLMTokenizerFast? (XGLM model)
    xlmXLMTokenizer (XLM model)
    xlm-prophetnetXLMProphetNetTokenizer? (XLM-ProphetNet? model)
    xlm-robertaXLMRobertaTokenizer? or XLMRobertaTokenizerFast? (XLM-RoBERTa model)
    xlm-roberta-xlXLMRobertaTokenizer? or XLMRobertaTokenizerFast? (XLM-RoBERTa-XL model)
    xlnetXLNetTokenizer? or XLNetTokenizerFast? (XLNet model)
    xmodXLMRobertaTokenizer? or XLMRobertaTokenizerFast? (X-MOD model)
    yosoAlbertTokenizer? or AlbertTokenizerFast? (YOSO model)
    zambaLlamaTokenizer? or LlamaTokenizerFast? (Zamba model)