最終更新:2023-03-15 (水) 00:37:34 (408d)  

FlexGen
Top / FlexGen

a high-throughput generation engine for running large language models with limited GPU memory.

https://github.com/FMInference/FlexGen#readme

https://github.com/Ying1123/FlexGen#readme

処理

  • OPT
    • OPT-6.7B?
    • OPT-30B?
    • OPT-175B?

参考