最終更新:2025-01-26 (日) 09:00:34 (12d)  

Moshi
Top / Moshi

a speech-text foundation model for real time dialogue

https://arxiv.org/abs/2410.00037 https://github.com/kyutai-labs/moshi

推論スタック

関連

  • SpeechTokenizer?
  • SemantiCodec?
  • SoundStream?
  • EnCodec?
  • WavLM