llm-infra

Here are 2 public repositories matching this topic...

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

cuda triton attention vit quantization video-generation mlsys inference-acceleration efficient-attention llm llm-infra video-generate

Updated Aug 5, 2025
Cuda

Testune-AI / express-template

Star

A lightweight Bun + Express template that connects to the Testune AI API and streams chat responses in real time using Server-Sent Events (SSE)

infrastructure ai llm llm-infra llm-infrastructure

Updated Aug 26, 2025
TypeScript

Improve this page

Add a description, image, and links to the llm-infra topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-infra topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly