Post-training in diffusion models is a very under-appreciated topic.
So, we're delighted to try to change that at ECCV'26. Announcing a dedicated tutorial for it with the best pack 🔥
We'll cover several tracks & check out the link below to know more!
Hila Chefer, Linoy Tsaban
I am bullish and biased, but the best way to use flash attention 3 or 4 is via 🤗 kernels:
```
from kernels import get_kernel
kernel_module = get_kernel("kernels-community/flash-attn3", version=1)
flash_attn_func = kernel_module.flash_attn_func
flash_attn_func(...)
````
“Local AI puts AI builders back in the driver’s seat.”
Hugging Face CEO Clem Delangue on why local-first AI matters, and how AMD and Hugging Face are helping developers build faster with open-source AI.
Post-training in diffusion models is a very under-appreciated topic.
So, we're delighted to try to change that at ECCV'26. Announcing a dedicated tutorial for it with the best pack 🔥
We'll cover several tracks & check out the link below to know more!
Hila Chefer, Linoy Tsaban
This #CVPR2026 paper from our research team is trending #1 on Hugging Face 🤗
Meet LocateAnything: a vision-language detection model that rethinks bounding box prediction. For AI agents and robots, “seeing” is only useful if a model can pinpoint where something is fast enough to act.
Trained on 138M high-quality samples, LocateAnything decodes bounding boxes in parallel instead of one coordinate at a time, improving localization accuracy while dramatically increasing throughput for visual grounding and detection.
Project page: https://nvda.ws/3RAJgTB
A hackathon called "Build Small"
max 32B params. the model fits on a laptop. somehow that pitch got us OpenAI, NVIDIA, OpenBMB and Cohere putting up the prizes 👀
$40k+ cash + 2 RTX 5080s + $100k codex credits for the first participants. Registrations OPEN ↓↓↓
Event page: https://lnkd.in/gKWh8j67
Registration app: https://lnkd.in/gK7EUGEq
We're releasing Paris 2.0, which to our knowledge, the first decentralized-trained video generation model.
We at Bagel Labs believe frontier models should not require homogeneous clusters of premium, supply constrained GPUs. Paris 1.0 proved that it's possible for image generation. Paris 2.0 expands that recipe into video generation and lays the substrate for global-scale world models.
To test the approach, we put two models head to head trained on the same data and compute budget. One was a monolithic model trained the usual way, on a single premium GPU cluster. The other was Paris 2.0, trained on an extreme mix of GPU types, generations, and vendors distributed around the globe. We aimed only to match the monolithic model on benchmarks. Paris 2.0 beat it.
More specifically, the results against the monolithic model under matched data and compute:
FVD: 561.04 → 279.01 (a ~2x improvement)
While CLIP text-video alignment and Aesthetic score improved.
To our knowledge, this is the first distributed training architecture to surpass its monolithic counterpart under matched data and compute, not just approach it. Congrats to the team Ali Rouzbayani, Marcos Villagra, Zhiying (Gin) Jiang.
Checkout the technical report and model below.
Report: https://lnkd.in/gpqfmZSe
Weights: https://lnkd.in/gDCc-YGF
RF-DETR just landed to Hugging Face transformers 🔥
sota real-time detection & segmentation models by Roboflow 💜
to celebrate this, we shipped real-time webcam streaming demo and fine-tuning tutorials on satellite imagery segmentation and mobile UI detection 🙌🏻
> play with our real-time demo
> fine-tune the models on your use case with our tutorials (takes a toaster's VRAM)
> or just hand them to your agents 😄
tutorials → https://lnkd.in/eCHKec-8
models and the demo → https://lnkd.in/e82MsJwv
docs → https://lnkd.in/edvKRrz8
I am bullish and biased, but the best way to use flash attention 3 or 4 is via 🤗 kernels:
```
from kernels import get_kernel
kernel_module = get_kernel("kernels-community/flash-attn3", version=1)
flash_attn_func = kernel_module.flash_attn_func
flash_attn_func(...)
````