ASR Demo: Whisper and Ollama Gemma for transcription and text enhancement

This title was summarized by AI from the post below.

1mo

dhirajpatra/Automatic-Speech-Recognition-with-Gemma-SLM: ASR Demo - Automatic Speech Recognition A lightweight microservice demo for Automatic Speech Recognition using Whisper for transcription and Ollama Gemma for text enhancement. https://lnkd.in/g5aBTqUw

To view or add a comment, sign in

More Relevant Posts

Jung-Eun Chu
4w Edited
Report this post
Excited to share our new paper on AI-based tropical cyclone prediction! In this study, we leveraged a transformer-based architecture—the foundation of models like ChatGPT—to predict tropical cyclone tracks and intensity. By analyzing 15 key variables, we assessed their relative contributions to forecast accuracy. Check out the full paper in npj Artificial Intelligence: https://lnkd.in/gcZs3568

1 Comment
Like Comment
To view or add a comment, sign in
Software Architecture Gathering

2,387 followers
1mo
Report this post
𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝗶𝗻𝗴 𝘁𝗵𝗲 𝗙𝘂𝘁𝘂𝗿𝗲 𝘄𝗶𝘁𝗵 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗔𝗜 🤖 #SemanticAI is taking #GenerativeAI to the next level – adding true meaning to intelligence. In his #SAGconf session, Christian Weyer explores how to integrate language and embedding models like GPT or LLaMA into real-world software architectures. 💡 Discover practical patterns such as Semantic Routing, Semantic Search, and Lightweight RAG – and see how they enable modular, efficient, and intelligent system designs. 🧠⚙️ Learn more about Christian’s session 👉 https://lnkd.in/e-vyD6rP #SAG2025 #SoftwareArchitecture #MachineLearning #AIArchitecture #iSAQB

Session "Architecting with Semantic AI: A Language Model and an Embedding Model Walk Into a Bar…" by Christian Weyer
Like Comment
To view or add a comment, sign in
Marcin Ros
3w Edited
Report this post
For engineers that design AI systems - 7-layers architecture proposition by Bor-Sung Liang 1/ Application Layer 2/ Orchestrator Layer 3/ Agent Layer 4/ Context Layer - reasoning chains 5/ Neural Network Layer 6/ Link Layer - Software infrastructure , scaling, resiliency, availability 7/ Hardware - Physical Layer https://lnkd.in/d_2E2nPy
1 Comment
Like Comment
To view or add a comment, sign in
Eugene Gusarov
2w
Report this post
LangFast now supports full multimodality. 💬 🌁 📄 You can now attach images and other files directly to your prompts. Let’s say you’re building a JSON prompt generator for image generation. Your system prompt defines how to process images and return a structured JSON output. Now, you want to test if it’s consistent and reliable. You can attach files right to the user message, or, if you want more control, create file variables and populate them dynamically from your test cases. This makes it easy to test multimodal prompts at scale and keep everything organized and reproducible. Check it out and let me know what you think.
Like Comment
To view or add a comment, sign in
Gene Arguelles
3w
Report this post
I just posted a white paper: "Bridging Context and Action: Implementing the MCP–Selenium Bridge for Agentic AI Systems" This paper presents the MCP–Selenium Bridge, an open architecture connecting AI reasoning and browser automation through the Model Context Protocol (MCP). Demonstrates how large language models can execute verifiable, context-driven web tasks, bridging cognitive intent and automated action in the emerging field of Agentic Computing.
Like Comment
To view or add a comment, sign in
KB Chen
1mo
Report this post
It’s counterintuitive that, due to the efficiency of vision representations (patch/vision tokens) and high compression ratio, image inputs can be even more efficient than text inputs for LLMs. This suggests a broader architectural direction: integrating pure text inputs into the multimodal encoder stream—so that both visual and textual data are transformed into a unified representation before reaching the decoder-only LLM core. #DeepSeek-OCR
Like Comment
To view or add a comment, sign in
Aaron Lockhart
3w
Report this post
It may seem counterintuitive and definitely off-narrative, but "I cannot confidently answer this question," is the answer you've been needing more from LLMs.
Leon Chlon, PhD

Founder & CEO @ Reliably AI, the pre‑generation, training‑free trust layer for frontier APIs inc. GPT 5.1, Claude Sonnet 4.5 and many more. Build today with confidence.
3w Edited

One of my favourite things to do in demos is ask a client to give me a question that GPT-5 or Claude Sonnet 4.5 hallucinated on. Then show them how our reliability layer would never let it happen. More powerful models aren't guarantees, the backend model in this screenshot is GPT-4. Book your demo now with us leo@thinkreliably.io
Like Comment
To view or add a comment, sign in
Md Amanatullah
4w
Report this post
WeKnora is an LLM-powered framework designed for deep document understanding and semantic retrieval, especially for handling complex, heterogeneous documents. It adopts a modular architecture that combines multimodal preprocessing, semantic vector indexing, intelligent retrieval, and large language model inference. At its core, WeKnora follows the RAG (Retrieval-Augmented Generation) paradigm, enabling high-quality, context-aware answers by combining relevant document chunks with model reasoning. https://lnkd.in/g3b4Kxdn
Like Comment
To view or add a comment, sign in
OpenVoiceOS (Stichting OVOS)

98 followers
3w
Report this post
We’re excited to share a another major upgrade to our wake-word system: Precise is now ONNX-powered! For a long time, Precise-lite has been a backbone of our voice system stack but it came with baggage: dependencies that held us back and made it harder to stay current. OpenVoiceOS Blog By migrating the model to ONNX and introducing ovos-ww-plugin-precise-onnx, we’ve removed the reliance on tflite_runtime and outdated package versions. But here’s the best part: in our initial tests on a Raspberry Pi 5, CPU usage dropped substantially. In short, listeners run more efficiently, leaving room for more features and smoother performance. If you care about open, reliable, low-overhead voice systems, give the new ONNX version a spin. You can dive into the plugin itself or browse the converted models via the links in the blog post. This upgrade aligns with our mission: building a voice stack that’s open, sustainable, and capable of evolving without lock-in. Here’s to more responsiveness, more flexibility, and more voice freedom. Read more on the OVOS blog; https://lnkd.in/euWz_KrC #OpenVoiceOS #OpenSourceAI #VoiceTechnology #EdgeAI #WakeWord

Precise Wake Word Engine Goes ONNX! blog.openvoiceos.org
Like Comment
To view or add a comment, sign in
Marshall Atkinson
1mo
Report this post
Fast, Tiny, and Smart AI: Small Language Models for Your Phone https://buff.ly/cLzKCjK A hybrid architecture boosts speed and memory efficiency
Like Comment
To view or add a comment, sign in

30,881 followers

View Profile Connect

ASR Demo: Whisper and Ollama Gemma for transcription and text enhancement

More from this author

Market Crash Indicators

How To Set Up Audio Video Drivers in a MacBook Running Ubuntu

How To Manage Data, AI Principal – AI, GenAI, and Analytics Team In Your Organisation

Explore content categories

ASR Demo: Whisper and Ollama Gemma for transcription and text enhancement

More Relevant Posts

Session "Architecting with Semantic AI: A Language Model and an Embedding Model Walk Into a Bar…" by Christian Weyer

More from this author

Market Crash Indicators

How To Set Up Audio Video Drivers in a MacBook Running Ubuntu

How To Manage Data, AI Principal – AI, GenAI, and Analytics Team In Your Organisation

Explore content categories