LLaVA-Mini
LLaVA-Mini:Efficient Image and Video Large Multimodal Models
์ถ์ฒ
5 ํฌํ

์ค๋ช
LLaVA-Mini๐is an efficient LMM for image/video understanding using 1 vision token, offering: (1)โฉfast response (40ms per image) (2)๐ฅ๏ธless VRAM usage (support 3-hour video understanding on 24GB GPU).