LLaVA-Mini

    LLaVA-Mini:Efficient Image and Video Large Multimodal Models

    ์ถ”์ฒœ
    5 ํˆฌํ‘œ
    LLaVA-Mini media 1

    ์„ค๋ช…

    LLaVA-Mini๐Ÿ‘is an efficient LMM for image/video understanding using 1 vision token, offering: (1)โฉfast response (40ms per image) (2)๐Ÿ–ฅ๏ธless VRAM usage (support 3-hour video understanding on 24GB GPU).

    ๊ถŒ์žฅ ์ œํ’ˆ