๐A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. ๐๐
-
Updated
Feb 24, 2025
๐A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. ๐๐
๐FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, 1.8x~3xโ๐vs SDPA EA.
Add a description, image, and links to the flash-mla topic page so that developers can more easily learn about it.
To associate your repository with the flash-mla topic, visit your repo's landing page and select "manage topics."