Can OpenSearch Shut Down Those Bad Vector Search Results?

OpenSearch is more than just vector search, and with version 3.0, it is also expected to see more features for analytics and security.

May 30th, 2025 10:56am by B. Cameron Gain

Featued image for: Can OpenSearch Shut Down Those Bad Vector Search Results?

Featured image via Unsplash+.

RAG (retrieval-augmented generation) searches have been shown to offer impressive query results. At the same time, however, there is also a prevalence of very poor and sometimes embarrassing search results associated with them. This is in addition to the rampant hallucinations and poor results that AI agents continue to serve up.

This issue was highlighted during a keynote given at the recent OpenSearchCon Europe Linux Foundation conference. There, Eric Pugh, founder of OpenSource Connections discussed how open source OpenSearch, with the upcoming 3.0 release, is expected to go a long way toward shifting the prevalence of vector search results toward those that are stellar and highly useful, he said.

But OpenSearch is more than just vector search, and with version 3.0, it is also expected to see more features for analytics and security. OpenSearch is also a project in progress, with a lot of improvement yet to come, said Pugh during his talk.

“OpenSearch is uniquely positioned among open source search engines to prevent such catastrophic outcomes,” Pugh said.

Bad Search

An article from 2023 in The Guardian featured a search team in New Zealand working for a supermarket company that offered online shopping. The team had access to customer purchase history and developed an AI-powered RAG solution to recommend and generate recipes based on items found in customers’ cupboards, Pugh said.

Unfortunately, several recipes the system produced were highly problematic, Pugh said. Examples included a “light, odorless cocktail” that was actually a recipe for chlorine gas, poisonous bread sandwiches, and a recipe for mosquito-repellent-based roast potatoes. The latter raised questions about whether anyone had actually attempted to apply mosquito repellent to food. These results were examples of failed outputs that damaged user trust and attracted negative media coverage, which in this case, was in The Guardian, Pugh said.

In version 3.1, OpenSearch will introduce a search relevance workbench — a centralized platform for measuring search quality and evaluating the spectrum between irrelevant and highly accurate results, Pugh said. This includes the ability to collect and analyze user queries, determine user intent, and identify what users are actually looking for by tracking user interactions such as clicks. This data enables deeper understanding of intent beyond typed queries and allows for evaluation of search result effectiveness (e.g., whether “helmet reviews” returned relevant content or not), Pugh said.

To drive ongoing improvement, a hybrid optimizer has been developed, Pugh said. This system evaluates user behavior over time to determine the ideal balance between lexical and semantic (keyword-based) search approaches. The hybrid optimizer adjusts dynamically — daily, weekly, or annually — without requiring a dedicated project. This functionality is included in version 3.1, Pugh said.

With the introduction of more complex data structures, analyzing query performance has become essential, Pugh said. Tools like Query Insights provide visibility into query processing, allowing optimization of new features and understanding their computational cost.

Building rich, intelligent interfaces requires simplicity, Pugh said. Tools such as Flow AI Builder simplify the development of advanced search experiences by streamlining the assembly of various components. Looking ahead, measuring the effectiveness of new user experiences is equally important. The upcoming addition of A/B testing and interleaving directly within OpenSearch will enable real-time evaluation of user experience changes — eliminating the need for multi-week experiments, Pugh said.

Combining conversation semantics with evaluation metrics will allow a greater portion of search results to move toward the “remarkably accurate” end of the quality spectrum, Pugh said. OpenSearch 3.x will be a significantly enhanced platform for building rich, immersive search experiences, Pugh said. “Technically, the solution is highly capable,” he said.

However, technical advancements alone are not enough, Pugh said. There remains a human challenge: improving collaboration between data scientists and search engineers. These communities often work in isolation, making integration difficult.

Python Help

To bridge this gap, it is necessary to adopt more Python — a core language of data science, Pugh said. One promising idea is replacing Painless with Python in scripting services. Many data scientists already use Python and are responsible for writing the complex logic that Painless typically supports, Pugh said.

Another forward-looking idea involves Jupyter Notebooks, Pugh said. As a primary environment for data science, integrating Jupyter Notebooks directly into OpenSearch Dashboards would create a welcoming, native space for data scientists to operate within the OpenSearch ecosystem. These concepts are under active discussion in RFCs, Pugh said. Engagement with these efforts and participation in events will help shape a more inclusive and collaborative future for OpenSearch.

OpenSearch aims to remove the divide between search engineering and data science by building tools and experiences that are familiar and productive for both communities, Pugh said. “These efforts are not merely technical upgrades — they represent a cultural shift toward integration, collaboration, and accessibility,” Pugh said.

By embracing Python, adopting notebook-based workflows, integrating user behavior analytics, and providing powerful tools like the search relevance workbench and hybrid optimizer, OpenSearch is positioning itself as not only a high-performance search engine but also as a modern platform for innovation at the intersection of data science and search, Pugh said.

The upcoming 3.x line represents a new chapter for OpenSearch: one focused on user intent, relevance, trust, and intelligent interaction, Pugh said. “Through ongoing community collaboration, thoughtful feature development, and inclusive design, OpenSearch is poised to enable the next generation of search-driven applications,” Pugh said.

The Big 3.0

A number of new features were described for OpenSearch 3.0. These features aim to improve not only search results but also digging into analytics through observability and security. The new features the OpenSearch contributors are preparing for the 3.0 release were outlined and described by Dagney Braun, principal product manager at Amazon Web Services (AWS) during the keynote. According to Braun, the beta release has confirmed performance improvements of 20% on aggregate across high-impact operations. Compared to OpenSearch 1.3, OpenSearch 3.0 is testing more than 9.5 times faster across key query types, Braun said.

For vector-powered applications with the introduction of GPU-accelerated vector search, “users can now deploy GPUs for significant performance gains” on data-intensive workloads like vector search and generative AI, Braun said. Benefits include up to 9.3 times faster index builds, two times higher throughput and over three times cost reduction, Braun said.

“One of the biggest evolutions in OpenSearch 3.0 is how it ingests, transports, and manages data,” Braun said. “There are dozens of new features contributing to this shift.”

As Braun described, gRPC is an open source framework for remote procedure calls. It introduces a new approach to data transport in OpenSearch — between clients and servers, and node-to-node. “With support for the Protobuf cross-platform data format, gRPC enables faster and more efficient data transport and processing,” Braun said.

Pull-Based Ingestion gives OpenSearch more control over data flow and when data is retrieved, Braun said. In practical terms, it allows the decoupling of data sources (like applications generating index operations) from data consumers (such as the OpenSearch server). Braun said.

For observability, 3.0 support for Apache Calcite as a new query engine “brings greater flexibility and performance for SQL and PPL queries,” Braun said.

Can’t We Just Get Along?

Forked from Elasticsearch 7.10.2 (released in January 2021) following Elastic’s change to its license and Kibana 7.10.2 for dashboards, OpenSearch was created to support lexical and AI-generated search use cases. It also includes log analytics capabilities for observability and security analytics, as well as features for security, anomaly detection, alerting, observability, and other use cases, according to the project’s documentation.

OpenSearch’s genesis was not without controversy. Following MongoDB’s license change to SSPL in 2018, Elastic opted to make much of its Elasticsearch and Kibana code proprietary starting with version 7.11 (released in February 2021) and launched an enterprise version. In response, Amazon forked Elasticsearch and Kibana at version 7.10.2, prompting criticism from Elastic. Some observers said AWS’ move was necessary to counter Elastic’s more restrictive commercial licensing model.

AWS introduced the OpenSearch project, which was a fork of Elasticsearch and Kibana 7.10.2 (announced in April 2021), and made it available under the liberal Apache 2.0 license.

AWS is legally able to take the code built from years of devotion to open source projects, rebrand the open source tools, and offer paid services to support and manage the code (depending on the licenses). However, some observers say the cloud giant risks being perceived as betraying its customers and contributors.

Then, in January 2025, Elastic announced it was changing the license for Elasticsearch and Kibana from Apache 2.0 starting with version 7.11, offering users the choice of use under either the more restrictive Server Side Public License (SSPL) or the Elastic License. This license change was a key factor motivating the OpenSearch fork and sparked ongoing debate among those who favor OpenSearch’s continued use of the less-restrictive Apache 2.0 license.

Notwithstanding the licensing controversy, both Elastic and AWS remain major open source contributors in the community. ElasticSearch and OpenSearch have also created what is likely to be a formidable competition between the two projects, which is rarely a bad thing for the user community in this dynamic convergence of search, observability and security functionality.

BC Gain is founder and principal analyst for ReveCom Media. His obsession with computers began when he hacked a Space Invaders console to play all day for 25 cents at the local video arcade in the early 1980s. He then...