Claude Opus 4.8 is now available for Max subscribers on Perplexity and Computer.
About us
The most powerful answer engine. Powering curiosity with answers backed by up-to-date sources. This is where knowledge begins.
- Website
-
https://www.perplexity.ai
External link for Perplexity
- Industry
- Software Development
- Company size
- 201-500 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2022
Products
Employees at Perplexity
Locations
-
Primary
Get directions
115 Sansome St
Suite 900
San Francisco, California 94104, US
Updates
-
Perplexity Computer is now available inside Microsoft Excel, Word, PowerPoint, and Outlook. Orchestrate across work with Computer directly in the side panel of your app to draft documents, model, build decks, and handle email. Available now: https://lnkd.in/esRW_sak Computer runs on the same secure infrastructure as the rest of Perplexity. The enterprise-grade security layer includes SAML SSO, audit logs, and granular admin controls. Read more: https://lnkd.in/e6PQAp2S
-
We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x. Small rerankers and embedders run in single-digit milliseconds on GPU, making CPU tokenization a meaningful share of total latency. The work targets XLM-RoBERTa’s 250K-token Unigram vocabulary, commonly used for ranking and retrieval. The encoder produces the same tokens as the reference implementation, but avoids rebuilding strings and chasing hash maps while deciding how text should be split. At production input lengths, the encoder cuts p50 latency by roughly 5× vs. HuggingFace tokenizers, 2× vs. SentencePiece C++, and 1.5× vs. IREE C. At 514 tokens, it runs in 63 μs with zero heap allocations. Github: https://lnkd.in/gCeFN6F6 Read more about improving Unigram tokenizer CPU performance on our blog: https://lnkd.in/g-BEigqF
-
-
Today we're open-sourcing Bumblebee, a read-only scanner for macOS and Linux. It checks developer machines for risky packages, extensions, and AI tool configs. Connected to Computer, it can trigger deeper scans whenever a new supply-chain risk emerges. https://lnkd.in/g95-tw_U Bumblebee started as an internal tool. Making Perplexity products more secure for users starts with protecting the developer systems we use to build them. Read the full blog: https://lnkd.in/gubgsXvK
-
-
We've productionized query-aware compression for faster, cleaner, more-accurate search. Better context is better than more context. Our system cuts context tokens up to 70% while improving answer quality. Less noise = more signal. Vital content per snippet is up 63%. Ads, navigation, metadata, and unhelpful content are culled before handoff to the answer model. On SimpleQA, we achieve a 50x compression ratio at frontier-level performance. Context compression isn't new in RAG. Our contribution is making it query-aware, citation-preserving, and fast enough for orchestration. Read the full research blog: https://lnkd.in/gQ5WQUNC
-
-
Rho cut weekly meeting time by 90% with Perplexity Computer. Computer checks Slack, Notion, Jira, Figma, and Google Docs, then flags missing tasks and changes the team needs to see. 120 work hours saved during a 12-week project. Read the customer story: https://lnkd.in/gcRNvxgH
-
Computer now connects to Snowflake. Run end-to-end work against live warehouse data and get answers with SQL, source tables, filters, and metrics. It’s like a personal data science team, on call with accurate answers from live company data. Build dashboards and automations from your Snowflake data for pipeline analysis, product usage, customer segments, and more. Admins maintain control over access, business definitions, and shared data logic across the organization. Learn more: https://lnkd.in/gJYWuUA4
-
Computer is secure by default. Every task runs in its own hardware-isolated sandbox with VPC-level storage and compute separation. Agents are authenticated with short-lived proxy tokens instead of raw API keys. External content is scanned in parallel by ML classifiers and the BrowseSafe model before agents act on it. File connector data is encrypted in transit and at rest, uploaded files automatically delete after 7 days, and more. Read more on the blog: https://lnkd.in/gZq9pbZe
-
PayPal runs 74,000 weekly tasks in Perplexity Enterprise. Teams use it for model validation, channel performance, market trend research, competitive intelligence, and product analysis. “Perplexity gives us the rationale behind every output, and that’s what lets us move with confidence,” says Graham Woods, a model governance lead at PayPal. Read the customer story: https://lnkd.in/g7vQBwR3
-
We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks. GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform. Prefill and decode stress hardware differently. Prefill is compute-bound, so Blackwell Tensor Cores, memory bandwidth, NVLink, and SHARP reductions help. Decode is latency/memory-bound, where GB200’s rack-scale NVLink domain opens up parallelism Hopper could not. The benchmarks show the gap. NVLS all-reduce latency drops from 586.1μs on H200 to 313.3μs on GB200. In MoE prefill at EP=4, combine falls from 730.1μs to 438.5μs. For decode, GB200 sustains much higher throughput at high token speeds. NVIDIA remains the strongest platform for large-model inference at scale. Prefill/decode disaggregation, Blackwell-native quantization, custom kernels, and rack-scale NVLink turn GB200 into faster answers lower serving cost. Read the full paper here https://lnkd.in/gAn5DmdD
-