FlashMLA

    Faster LLM Inference on Hopper GPUs

    Featured
    5 Votes
    FlashMLA media 1
    FlashMLA media 2
    FlashMLA media 3

    Description

    FlashMLA, from DeepSeek, is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences. Achieves up to 3000 GB/s memory bandwidth and 580 TFLOPS.

    Categories

    Recommended Products