LLaVA-Mini

    LLaVA-Mini:Efficient Image and Video Large Multimodal Models

    рдкреНрд░рджрд░реНрд╢рд┐рдд
    5 рд╡реЛрдЯ
    LLaVA-Mini media 1

    рд╡рд┐рд╡рд░рдг

    LLaVA-MiniЁЯСПis an efficient LMM for image/video understanding using 1 vision token, offering: (1)тПйfast response (40ms per image) (2)ЁЯЦея╕Пless VRAM usage (support 3-hour video understanding on 24GB GPU).

    рдЕрдиреБрд╢рдВрд╕рд┐рдд рдЙрддреНрдкрд╛рдж