Parloa Labs’ cover photo
Parloa Labs

Parloa Labs

Technology, Information and Internet

Research that advances artificial intelligence for everyone

About us

AI still leaves too many questions unanswered. At Parloa Labs, we open the black box to understand what today’s systems are capable of and where they can go next. We believe the best innovation happens at the intersection of theory and practice. Every day, Parloa's AI agents handle millions of interactions across industries and languages. This gives us a unique vantage point to identify real challenges, test solutions at scale, and contribute meaningful findings back to the research community. We study the frontiers of voice and agentic AI and publish our work openly, share practical learnings, and give back to the broader community’s understanding of what makes conversations work.

Website
https://www.parloa.com/labs/
Industry
Technology, Information and Internet
Company size
201-500 employees

Updates

  • Training AI agents on brand voice has become table stakes. Adapting an agent’s tone to the individual it’s talking to is far less common, though just as important. In her new piece for Agent Architect’s Digest, Dr. Rangina Ahmad, Lead AI Agent Architect at Parloa, breaks down linguistic style matching: the human reflex of adjusting how you speak based on who you’re talking to, and what it takes to bring that capability into enterprise AI agents. Rangina's perspective builds on years of research into Personality-Adaptive Conversational Agents and lands on where the field stands now: modern LLMs can read communication style from just a few turns of dialogue. Acting on that reading is the harder half of the work: deciding which style fits which user, when to shift, and how to keep it reliable within brand-voice constraints. Read the full piece in Parloa Labs: https://okt.to/Ui7VPR

    • No alternative text description for this image
  • At Parloa, we believe good software comes from the perfect combination of tools, processes, and people. Last year, we transitioned to agent-written code. We haven’t looked back. How we did it: We built Claude’s Kitchen, an environment within Claude that ensures every line of code is written the Parloa way, aligned with our standards and guardrails. A tool for our tool. The engineering role evolved with it. The work is no longer about producing code line by line, but about designing the scaffolding that allows agents to do their best work. Read the step-by-step breakdown of how we got here, along with the biggest lessons we learned along the way: https://okt.to/pxMO8L

    • No alternative text description for this image
  • Parloa Labs reposted this

    In the early days of the company, Parloa co-founder Stefan Ostwald spent a day sitting inside an insurance call center, listening to the same requests on repeat: Password resets. Policy questions. Simple changes. It was clear a lot of that work could be automated. Today, Parloa builds voice agents that handle those conversations end to end in production. With AMP, teams define agent behavior in natural language, connect backend systems, and simulate real customer calls before launch. Those same systems evaluate whether the agent actually resolved the request. In one deployment, a global travel company reduced requests for a human agent by 80%. Read more at the link in the comments.

  • Parloa Labs reposted this

    View organization page for Parloa

    73,217 followers

    OpenAI just published a story on how we're building our AI Agent Management Platform (AMP) on top of frontier models like GPT-5.4, and what it takes to make them reliable for real-time, multilingual, enterprise-grade customer conversations. Our AI agents power millions of conversations across retail, travel, insurance, and other industries. We need to make sure that each one performs reliably from the first second to the last, because in customer service, every interaction counts. Our approach: We continuously evaluate and stress-test new model iterations in production-like environments before rolling them out to live customer interactions. That means simulating real customer calls before agents go live, evaluating every interaction with a mix of LLM-as-a-judge scoring and deterministic checks, and only deploying models that hold up under realistic conditions, not just on abstract benchmarks. 🔗 Read the full story - link in the comments. #Parloa #AIInnovation #CustomerExperience

    • No alternative text description for this image
  • Parloa Labs reposted this

    View organization page for Parloa

    73,217 followers

    We just conducted the largest agent-led customer experience study to-date, and what our discovery agents found shocked us: 💡46% of websites bury the service phone number. 💡9% of chatbot experiences achieve the customer’s goal. 💡99% of voice experiences operate like it’s 1990. 💡1% of enterprises are ready for the agentic future. The gap between what customers expect and what companies deliver is enormous, but not impossible to close. Those who move first will be the ones to win customer loyalty. 👉 Curious how your company compares? Benchmark your readiness in our State of Agentic Customer Experience in 2026 report: https://lnkd.in/daNaGGVM #Parloa #StateOfAgenticCX2026 #AIInnovation #CustomerExperience

  • View organization page for Parloa Labs

    650 followers

    A year ago, our engineers wrote code. Today, they orchestrate swarms of agents that produce software. In less than 12 months, the question shifted from "how do I implement this?" to "how do I design and supervise a system of agents that will implement this for me?" For us at Parloa, this is more than a thought experiment. It’s how we ship production systems today. Our latest Parloa Labs article by Masashi Beheim and Nuno M. walks through our journey: from autocomplete to AI pair programmers to fully instrumented agent environments. Each step forced us to rethink what engineering ownership means, and how we build safely at AI speed. 🔗 Read the full story: https://okt.to/yQWv23

    • No alternative text description for this image
  • Most AI agent evaluations still rely on raw win rates or simple averages. These metrics often miss uncertainty, ignore scenario variation, or overstate what the data really tells you. Today, we’re sharing a Bayesian evaluation framework our team built in simulated conversations. It accounts for uncertainty, mixed metric types, and group-level variation – giving product teams clearer guidance for real-world decisions. Read the full article here: ✔️ A new approach to A/B testing AI agents from our team at Parloa ✔️ What we learned from hundreds of customer calls ✔️ How to express results with statistical confidence, not mere point scores This work is part of how we’re evolving evaluation at scale at Parloa, because in production, your reasoning matters as much as your model. #Parloa #ParloaLabs #AIResearch #AIAgents

    • No alternative text description for this image
  • Every customer conversation has a rhythm: a start, a middle, and (generally) an end. But what happens when it doesn’t? In contact centers, the lengthy ones are often the norm. Customers explain, clarify, change their minds, resend details, and ask, “Wait, did I already tell you that?” Meanwhile, forty minutes later, the AI agent is juggling order data, policy limits, and three tool calls while still trying to sound helpful. We asked ourselves a simple question: 𝗪𝗵𝗲𝗻 𝗰𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻𝘀 𝗴𝗲𝘁 𝗹𝗼𝗻𝗴, 𝗱𝗼𝗲𝘀 𝗟𝗟𝗠 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗱𝗿𝗼𝗽, 𝗼𝗿 𝗰𝗮𝗻 𝘁𝗼𝗱𝗮𝘆’𝘀 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 𝗸𝗲𝗲𝗽 𝘂𝗽? We built a stress test that mirrors how conversations really unfold in enterprise settings - messy, indirect, and packed with tool output. We ran hundreds of extended conversations using two models – GPT‑4.1 and GPT‑4.1 mini – tasked with real-world workflows. For enterprises, the takeaway is clear: long conversations aren’t a breaking point – for the right models. GPT‑4.1 handled realistic, tool-heavy dialogues without measurable degradation. Smaller models didn’t fail because of length but because of conversational noise. The next improvements will come from better context management, cleaner tool outputs, smarter summarization policies, and new ways to track when a conversation is drifting. This research helps us refine how Parloa’s platform manages long, complex sessions. The goal: agents that stay clear, focused, and efficient, no matter how long the call lasts. Read the complete study - link in the comments 👇 #Parloa #ParloaLabs #AIResearch #AIAgents

    • No alternative text description for this image
  • Parloa Labs reposted this

    View organization page for Parloa

    73,217 followers

    BREAKING: Just seven months since our Series C, we’re excited to announce our next big step: our $350 million Series D at a $3 billion valuation. 🚀 With this raise, led by General Catalyst, we’re accelerating our global growth, advancing our AI agent management platform (AMP), and launching the Parloa Promise – a new initiative aimed to redefine what’s meant by responsible AI. There is a clear signal. Customer patience for bad service has run out, and they’re leaving brands that don’t deliver on support promises. That’s why our mission is to eliminate these detached experiences and empower global enterprises to build meaningful relationships with their customers, turning every conversation into lasting loyalty and business value.  🔗 Learn more here: https://lnkd.in/de2KS6Ti #ParloaSeriesD #Parloa #CustomerExperience #AIInnovation

  • Long-context performance is widely discussed but often misunderstood: as context windows grow and vendors claim models can “remember everything,” there’s still no clear agreement on what that means in practice. For contact centers, a single dropped context can mean repeating verification steps or losing a sale. Every extra step adds cost, delay, and frustration. Most evaluations stretch models horizontally: testing ever-longer inputs. The real challenge is vertical: maintaining successful goal-oriented conversations over time. Our goal was to close that gap by measuring the performance drop when conversations get longer in production. Instead of looking at abstract benchmarks, we focused on signals that reflect enterprise production reality. Each variation ran on both GPT-4.1 and GPT-4.1 mini to test how model size shaped long-conversation stability. Our guiding questions were: ✔️ At what point, if any, do models start to degrade? ✔️ Is degradation caused by length, noise, or linguistic indirection? ✔️ Can summarization or message trimming prevent it? Read the report to learn what the data showed: https://okt.to/iH6aQ1

    • No alternative text description for this image

Affiliated pages

Similar pages