最終更新:2023-03-15 (水) 00:37:34 (266d)
FlexGen
Top / FlexGen
a high-throughput generation engine for running large language models with limited GPU memory.
https://github.com/FMInference/FlexGen#readme
https://github.com/Ying1123/FlexGen#readme