Artificial Intelligence News

Artificial Intelligence News reposted this

3d

Liquid AI released LFM2.5-8B-A1B today. It's an on-device Mixture-of-Experts model that activates just 1.5B of 8.3B parameters per token. Here's what's actually interesting for anyone building local agents: 1. It's reasoning-only nowUnlike October's LFM2-8B-A1B, this version produces an explicit chain of thought before answering. The logic: in an MoE, a small active parameter count makes each reasoning token cheap. 2. The hallucination jump is the real story→ Non-Hallucination Rate: 7.46 → 63.47 → IFEval: 79.44 → 91.84 → MATH500: 74.80 → 88.76 → Tau² Telecom: 13.60 → 88.07 A targeted avg@k RL reward trains the model to abstain on questions beyond its knowledge. 3. It runs on hardware you already own→ 253 tok/s on an M5 Max, under 6 GB → ~30 tok/s on a phone → 18.5K tok/s and over 1.6B tokens/day on a single H100 4. Tool calling is the pointThe LocalCowork demo runs 67 tools across 13 MCP servers on one laptop. No cloud, no API keys, no data leaving the machine. Day-one support for llama.cpp, MLX, vLLM, and SGLang. Open weights, with base and post-trained checkpoints. Full analysis: https://lnkd.in/gXK5FN_Y Technical details: https://lnkd.in/gp3M7wJj Model weights: https://lnkd.in/gpgqeFcB Liquid AI Maxime Labonne

4 Comments

Artificial Intelligence News reposted this

Asif Razzaq

4d

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code Most RL systems require you to rewrite your agent harness to fit the training infrastructure. Polar flips that. It treats the harness as a black box and intercepts at the one boundary every LLM agent shares: the model API call. Here's what's actually interesting: 𝟭. The proxy design Polar places a provider-compatible proxy between the harness and the inference server. It accepts Anthropic Messages, OpenAI Chat Completions, OpenAI Responses, and Google generateContent — no harness code changes needed. The only configuration change is pointing the model base URL at the gateway. 𝟮. Token-faithful trajectory reconstruction Two strategies: per_request (every model call = one trace) and prefix_merging (reconstructs append-only conversation chains). The ablation is clear: → Trainer updates: 1,185 (per_request) vs. 218 (prefix_merging) → Wall-clock time: 189.5 min vs. 35.2 min → 5.39× speedup → Rollout GPU utilization: 20.4% vs. 87.7% 𝟯. SWE-Bench Verified results (Qwen3.5-4B, GRPO) → Codex: 3.8% → 26.4% (+22.6 pts) → Claude Code: 29.8% → 34.6% (+4.8 pts) → Qwen Code: 34.6% → 35.2% (+0.6 pts) → Pi: 34.2% → 40.4% (+6.2 pts) The Codex gain is the largest because Codex presents an unfamiliar action protocol and patch-submission style to a Qwen model not originally trained on it. Polar attaches the reward to the actual sampled tokens flowing through that execution path. 𝟰. Offline SFT use case Polar also works as a distributed data generation service. Using Qwen3.5-122B-A10B on 8×H100, NVIDIA generated 504 accepted SFT trajectories from 1,638 SWE-Gym attempts (30.8% acceptance) at ~64 GPU-hours. Released on HuggingFace under Apache-2.0. Full analysis: https://lnkd.in/diXqSSs7 Paper: https://lnkd.in/deCmewva Repo: https://lnkd.in/gEDpMJSH NVIDIA NVIDIA AI Binfeng Xu Hao Zhang Shaokun Zhang Songyang Han Mingjie Liu Jian Hu Shizhe Diao Zhenghui Jin Yunheng(Jackie) Zou Yunheng(Jackie) Zou Jan Kautz Yi Dong

2 Comments

Artificial Intelligence News reposted this

Asif Razzaq

1w Edited

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5% Most web agents today predict one browser action at a time: click, type, scroll, repeat. Webwright takes a different approach. It gives the model a terminal and lets it write Playwright code to control the browser. Here's what's actually interesting: 1. The architecture is unusually small ~1,000 lines of code. Three modules. No multi-agent orchestration. One agent loop. Most web agent frameworks bury the agent logic under layers of abstraction. Webwright doesn't. 2. The benchmark results are strong: → 86.7% on Online-Mind2Web (300 tasks, 136 live sites) — highest among open-sourced harnesses in the AutoEval category → 60.1% on Odysseys (long-horizon tasks) — up from 33.5% with base GPT-5.4 → That's a 26.6-point improvement using the same model, just a different interaction paradigm 3. Browsing history becomes code Every completed task produces a reusable CLI script. Instead of rediscovering a workflow each time, you build a library. The same scripts run in Claude Code, Codex, and OpenClaw. 4. Small models can compete with tool augmentation Qwen3.5-9B hits 66.2% on the hard split of Online-Mind2Web when given pre-built tool scripts. That's a practical finding for teams working with lower-cost inference. 5. Cost matters → GPT-5.4: $2.37 avg per task → Claude Opus 4.7: $6.09 avg per task Claude uses fewer steps (21.9 vs 26.3 mean) but the pricing difference flips the cost equation. Full analysis: https://lnkd.in/gPKMj2PJ Repo: https://lnkd.in/grueF7Nw Technical details: https://lnkd.in/gscQpU25 Microsoft Microsoft Research Yadong Lu Chao Huang Ahmed Awadallah

Artificial Intelligence News reposted this

Asif Razzaq

1w

Perplexity just open-sourced an internal security tool they've been running in production. It's called 'Bumblebee'. Here's what's actually interesting: 1. It solves a specific blind spot SBOMs cover build artifacts. EDR covers running processes. Neither tells you what's installed on a developer's laptop right now. Bumblebee does exactly that — and nothing more. 2. The read-only design is the key decision npm packages can carry postinstall scripts that execute automatically on install. Most recent supply-chain worms spread that way. A scanner that invokes npm to check exposure has already triggered the attack. Bumblebee reads metadata directly — lockfiles, manifests, extension manifests — and never runs any code. 3. Four surfaces in one scan → Language package managers: npm, pnpm, Yarn, Bun, PyPI, Go modules, RubyGems, Composer → AI agent configs: MCP JSON host files including claude_desktop_config.json and cline_mcp_settings.json → Editor extensions: VS Code, Cursor, Windsurf, VSCodium → Browser extensions: Chrome, Edge, Brave, Arc, Comet, Firefox 4. The internal workflow is worth noting Perplexity Computer drafts a catalog entry when a threat signal lands → human reviews and merges the PR → Bumblebee runs on endpoints → findings go to the security team. Human in the loop before anything hits machines. 5. Technical details → Written in Go 1.25+, zero non-stdlib dependencies → Single static binary, three scan profiles: baseline, project, deep → Outputs NDJSON records with confidence levels (high / medium / low) → Apache 2.0, current release v0.1.1 Full analysis: https://lnkd.in/gf6xXrBY Repo: https://lnkd.in/g4EcxbF2 Technical details: https://lnkd.in/gp6icukB. Perplexity

Artificial Intelligence News reposted this

Asif Razzaq

1w

Most LLM inference optimization forces a choice: fast drafting with a weak auxiliary model, or accurate generation with full Standard autoregressive (AR) decoding. NVIDIA Researchers just built a third option into the weights themselves. They released Nemotron-Labs-Diffusion — a 3B/8B/14B model family trained on a joint Autoregressive AR-diffusion objective that supports three decoding modes from one checkpoint: standard AR, parallel diffusion decoding, and self-speculation, where the same model drafts and verifies without any auxiliary head. Here's what's actually interesting: → Self-speculation achieves 5.99× tokens per forward over Qwen3-8B with comparable accuracy on a 10-task benchmark → Average acceptance length: 6.82 (with LoRA) vs. 2.75 for Eagle3 and 4.24 for Qwen3-9B-MTP — same draft length of 31 → AR and diffusion objectives peak at the same loss coefficient (α=0.3) and improve together — they don't compete for model capacity → Speed-of-light analysis shows a theoretical ceiling of 7.60× TPF at block length 32; current confidence-based sampling realizes only ~3×, leaving headroom for better samplers Full analysis: https://lnkd.in/gYvQTAKw Paper: https://lnkd.in/ga5VG36i Model weights: https://lnkd.in/gSbRPxQf Technical details: https://lnkd.in/gCH3j_SQ NVIDIA NVIDIA AI NVIDIA Robotics Yonggan Fu Lexington Whalen Chengyue Wu Maksim Khadkevich Nicolai Oswald Enze Xie Pavlo Molchanov Shizhe Diao Chenhan Yu Ye Yu Weijia Chen Sajad Norouzi Shiyi Lan Ligeng Zhu Jindong Jiang Morteza Mardani Mehran Maghoumi

4 Comments

Artificial Intelligence News reposted this

Asif Razzaq

2w

Most sparse attention methods make a quiet assumption: that you can ship a custom kernel for selection and deal with the inference consequences later. Nous Research's new paper just explained that assumption is optional. They released Lighthouse Attention — a selection-based hierarchical attention for long-context pretraining that pools Q, K, and V symmetrically across a multi-level pyramid, places selection entirely outside the attention kernel, and runs stock FlashAttention on a small dense sub-sequence. No custom sparse kernel. No auxiliary losses. No learnable scorer. No straight-through estimator. Here's what's actually interesting: → 21× faster forward pass and 17.3× faster forward+backward vs. cuDNN SDPA at 512K context on a single B200 → 1.40–1.69× end-to-end pretraining wall-clock speedup at 98K context at matched or lower final training loss → Brief dense-SDPA resumption after Lighthouse training recovers a full-attention model that beats dense-from-scratch (loss 0.6980 vs. 0.7237 baseline, same ~50.3B token budget) → Scales to 1M-token training across 32 Blackwell GPUs under standard ring attention — no sparse-aware collectives needed Train with hierarchical selection to move fast, then recover the dense model you actually need at inference. Analysis: https://lnkd.in/g9-n4xas Paper: https://lnkd.in/g2t9x3du Technical details: https://lnkd.in/gEYeMhSV GitHub Repo: https://lnkd.in/g87ykMXY Nous Research

1 Comment

Artificial Intelligence News reposted this

Asif Razzaq

2w

Most open-source world models either need 8 GPUs to run or drop to 480p to survive. That's not an efficiency problem — it's an architecture problem. NVIDIA just addressed it directly. They introduced SANA-WM — a 2.6B-parameter open-source world model natively trained for one-minute generation, synthesizing 720p video with precise 6-DoF camera control from a single image and a camera trajectory, running inference on a single GPU with no multi-GPU dependency anywhere in the pipeline. Here's what's actually interesting: → Hybrid Gated DeltaNet + softmax backbone keeps recurrent state at constant D×D size regardless of video length — solving the quadratic memory explosion that makes 961-frame sequences infeasible with standard softmax attention → Dual-branch camera control: UCPE at latent-frame rate for global trajectory + Plücker mixing at raw-frame rate for intra-stride motion — CamMC 0.2047, best among all compared methods → Second-stage refiner (17B LTX-2 + rank-384 LoRA, 3 Euler steps) cuts long-horizon visual drift ΔIQ from 3.09 to 0.31 on Hard trajectories → 22.0 videos/hour on 8 H100s — 36× higher throughput vs LingBot-World at 14B+14B parameters → Distilled variant: 34s per 60s 720p clip on a single RTX 5090 with NVFP4 quantization Full analysis: https://lnkd.in/gCrEE9fw Paper: https://lnkd.in/g3uMe5YT Project page: https://lnkd.in/g3qQTdyh GitHub Page: https://lnkd.in/g8iURsGk NVIDIA NVIDIA Robotics NVIDIA AI Haozhe Liu Yuyang Zhao Ye Tian Junsong Chen Jincheng YU Song Han Enze Xie

1 Comment

Artificial Intelligence News reposted this

Asif Razzaq

2w

Supertone just released Supertonic v3 — an on-device text-to-speech model that runs entirely via ONNX Runtime, no cloud, no API call. Here's what's actually interesting: 1. 31 languages, ~99M parameters→ v2 had 5 languages at 66M params → v3 adds 26 more languages including Japanese, Arabic, German, Hindi, Russian, Turkish, Vietnamese, and more → Total ONNX asset size: 404 MB → Still smaller than 0.7B–2B class open TTS systems 2. v2-compatible ONNX interface→ Existing integrations upgrade to v3 without changing inference code → Same 4 ONNX files: duration_predictor, text_encoder, vector_estimator, vocoder 3. New expression tags→ <laugh>, <breath>, <sigh> inline in text → No separate model, no preprocessing needed 4. Text normalization that actually works→ Tested against ElevenLabs Flash v2.5, OpenAI TTS-1, Gemini 2.5 Flash TTS, and Microsoft → All four failed on financial expressions ($5.2M), phone numbers, dates, and technical units (2.3h, 30kph) → Supertonic passed all four categories 5. Runs on CPU, no GPU required→ Competitive WER/CER range vs. VoxCPM2 across supported languages → Demo live on Hugging Face: Supertone/supertonic-3 Install: pip install supertonic Full analysis: https://lnkd.in/gB3NKcee GitHub Repo: https://lnkd.in/gwUJJGGa HF Space: https://lnkd.in/gSCV8JHm Supertone

Artificial Intelligence News reposted this

Asif Razzaq

2w

Most LLM pre-training efficiency work either changes the tokenizer, the architecture, or the inference behavior. Nous Research just showed you don't have to touch any of them. They released Token Superposition Training (TST) — a two-phase modification to the standard pre-training loop that averages s contiguous token embeddings into a single latent s-token in Phase 1, trains with a multi-hot cross-entropy loss against the next bag of tokens, then reverts to standard next-token prediction in Phase 2 from the same checkpoint, with the TST code fully removed. Here's what's actually interesting: → Each TST step is kept equal-FLOPs to baseline by increasing data sequence length by s× — not the batch size → 3B dense: loss 2.676 in 247 B200-hrs vs 443 B200-hrs for baseline at matched loss (~1.8x faster) → 10B-A1B MoE: 4,768 B200-hrs vs 12,311 B200-hrs at matched loss (~2.5x faster) → Optimal range: bag size s ∈ [3–8] at 270M, s ∈ [6–10] at 600M, s = 16 at 10B; step ratio r ∈ [0.2, 0.4] → Re-initializing the embedding or LM head at the phase boundary breaks it entirely — loss went from 2.676 to 2.938, worse than the 2.808 baseline Full analysis: https://lnkd.in/gMv6_SJK Paper: https://lnkd.in/gV_sbPdn Project page: https://lnkd.in/gE8c8iAp Nous Research

4 Comments

Artificial Intelligence News reposted this

Asif Razzaq

2w

Why are we still running 7B–27B autoregressive decoder models for what is fundamentally a text classification problem? Fastino Labs Open-Sources GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size It is a 300M parameter safety moderation model that runs 16x faster than the current generation of guardrail models. Here's what's actually is interesting to learn: 1. It's an encoder, not a decoder Most guardrail models (LlamaGuard4, WildGuard, ShieldGemma) generate safety verdicts autoregressively — one token at a time. That's slow by design. GLiGuard reframes the whole thing as a text classification problem. One forward pass. Done. 2. Four moderation tasks. Zero added latency. It evaluates all four simultaneously in a single pass: → Safety classification (safe / unsafe) → Jailbreak strategy detection (11 strategies) → Harm category detection (14 categories) → Refusal detection (compliance / refusal) More safety dimensions = no extra compute. That's the architectural win. 3. The benchmark numbers are hard to ignore → 87.7 avg F1 on prompt classification — within 1.7 points of the best model (PolyGuard-Qwen at 89.4) → 82.7 avg F1 on response classification — second only to Qwen3Guard-8B (84.1) → 26ms latency vs. 426ms for ShieldGemma-27B at sequence length 64 → 133 samples/sec throughput vs. 8.2 at batch size 4 → Outperforms LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B — all 23–90x larger 4. It runs on a single GPU At 0.3B parameters, individual developers and smaller teams can deploy and fine-tune it without heavy infrastructure. Full analysis: https://lnkd.in/gUGrgbSg Paper: https://lnkd.in/gqeb8XBA Model weights on HF: https://lnkd.in/gd3mFBiA GitHub Repo: https://lnkd.in/grNhQV6y Technical details: https://lnkd.in/g98nFXgy Fastino Labs Urchade Zaratiana Mary Newhauser George Hurn-Maloney Ash Lewis

3 Comments

Artificial Intelligence News

Writing and Editing

Laguna Beach, California 391 followers

About us

Updates

Join now to see what you are missing

Affiliated pages

Marktechpost AI Media Inc

AI Career/Scholarships

AI Tools Club

AI Dev Tools Club

AI Courses Hub

Open Source AI

AI Agents

Coding News

AINEWS.SH

Similar pages

AI Engineering

Hugging Face

VertexPlus

CopilotKit

VentureBeat

DeepLearning.AI

DK AI Lab

Amazon

Daily Dose of Data Science

ADGVIT