dhirajpatra/Automatic-Speech-Recognition-with-Gemma-SLM: ASR Demo - Automatic Speech Recognition A lightweight microservice demo for Automatic Speech Recognition using Whisper for transcription and Ollama Gemma for text enhancement. https://lnkd.in/g5aBTqUw
ASR Demo: Whisper and Ollama Gemma for transcription and text enhancement
More Relevant Posts
-
Excited to share our new paper on AI-based tropical cyclone prediction! In this study, we leveraged a transformer-based architecture—the foundation of models like ChatGPT—to predict tropical cyclone tracks and intensity. By analyzing 15 key variables, we assessed their relative contributions to forecast accuracy. Check out the full paper in npj Artificial Intelligence: https://lnkd.in/gcZs3568
To view or add a comment, sign in
-
𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝗶𝗻𝗴 𝘁𝗵𝗲 𝗙𝘂𝘁𝘂𝗿𝗲 𝘄𝗶𝘁𝗵 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗔𝗜 🤖 #SemanticAI is taking #GenerativeAI to the next level – adding true meaning to intelligence. In his #SAGconf session, Christian Weyer explores how to integrate language and embedding models like GPT or LLaMA into real-world software architectures. 💡 Discover practical patterns such as Semantic Routing, Semantic Search, and Lightweight RAG – and see how they enable modular, efficient, and intelligent system designs. 🧠⚙️ Learn more about Christian’s session 👉 https://lnkd.in/e-vyD6rP #SAG2025 #SoftwareArchitecture #MachineLearning #AIArchitecture #iSAQB
Session "Architecting with Semantic AI: A Language Model and an Embedding Model Walk Into a Bar…" by Christian Weyer
To view or add a comment, sign in
-
For engineers that design AI systems - 7-layers architecture proposition by Bor-Sung Liang 1/ Application Layer 2/ Orchestrator Layer 3/ Agent Layer 4/ Context Layer - reasoning chains 5/ Neural Network Layer 6/ Link Layer - Software infrastructure , scaling, resiliency, availability 7/ Hardware - Physical Layer https://lnkd.in/d_2E2nPy
To view or add a comment, sign in
-
-
LangFast now supports full multimodality. 💬 🌁 📄 You can now attach images and other files directly to your prompts. Let’s say you’re building a JSON prompt generator for image generation. Your system prompt defines how to process images and return a structured JSON output. Now, you want to test if it’s consistent and reliable. You can attach files right to the user message, or, if you want more control, create file variables and populate them dynamically from your test cases. This makes it easy to test multimodal prompts at scale and keep everything organized and reproducible. Check it out and let me know what you think.
To view or add a comment, sign in
-
I just posted a white paper: "Bridging Context and Action: Implementing the MCP–Selenium Bridge for Agentic AI Systems" This paper presents the MCP–Selenium Bridge, an open architecture connecting AI reasoning and browser automation through the Model Context Protocol (MCP). Demonstrates how large language models can execute verifiable, context-driven web tasks, bridging cognitive intent and automated action in the emerging field of Agentic Computing.
To view or add a comment, sign in
-
-
It’s counterintuitive that, due to the efficiency of vision representations (patch/vision tokens) and high compression ratio, image inputs can be even more efficient than text inputs for LLMs. This suggests a broader architectural direction: integrating pure text inputs into the multimodal encoder stream—so that both visual and textual data are transformed into a unified representation before reaching the decoder-only LLM core. #DeepSeek-OCR
To view or add a comment, sign in
-
It may seem counterintuitive and definitely off-narrative, but "I cannot confidently answer this question," is the answer you've been needing more from LLMs.
Founder & CEO @ Reliably AI, the pre‑generation, training‑free trust layer for frontier APIs inc. GPT 5.1, Claude Sonnet 4.5 and many more. Build today with confidence.
One of my favourite things to do in demos is ask a client to give me a question that GPT-5 or Claude Sonnet 4.5 hallucinated on. Then show them how our reliability layer would never let it happen. More powerful models aren't guarantees, the backend model in this screenshot is GPT-4. Book your demo now with us leo@thinkreliably.io
To view or add a comment, sign in
-
-
WeKnora is an LLM-powered framework designed for deep document understanding and semantic retrieval, especially for handling complex, heterogeneous documents. It adopts a modular architecture that combines multimodal preprocessing, semantic vector indexing, intelligent retrieval, and large language model inference. At its core, WeKnora follows the RAG (Retrieval-Augmented Generation) paradigm, enabling high-quality, context-aware answers by combining relevant document chunks with model reasoning. https://lnkd.in/g3b4Kxdn
To view or add a comment, sign in
-
-
We’re excited to share a another major upgrade to our wake-word system: Precise is now ONNX-powered! For a long time, Precise-lite has been a backbone of our voice system stack but it came with baggage: dependencies that held us back and made it harder to stay current. OpenVoiceOS Blog By migrating the model to ONNX and introducing ovos-ww-plugin-precise-onnx, we’ve removed the reliance on tflite_runtime and outdated package versions. But here’s the best part: in our initial tests on a Raspberry Pi 5, CPU usage dropped substantially. In short, listeners run more efficiently, leaving room for more features and smoother performance. If you care about open, reliable, low-overhead voice systems, give the new ONNX version a spin. You can dive into the plugin itself or browse the converted models via the links in the blog post. This upgrade aligns with our mission: building a voice stack that’s open, sustainable, and capable of evolving without lock-in. Here’s to more responsiveness, more flexibility, and more voice freedom. Read more on the OVOS blog; https://lnkd.in/euWz_KrC #OpenVoiceOS #OpenSourceAI #VoiceTechnology #EdgeAI #WakeWord
To view or add a comment, sign in
-
Fast, Tiny, and Smart AI: Small Language Models for Your Phone https://buff.ly/cLzKCjK A hybrid architecture boosts speed and memory efficiency
To view or add a comment, sign in
-