US12505115B2 - Search query refinement using generated keyword triggers - Google Patents
Search query refinement using generated keyword triggersInfo
- Publication number
- US12505115B2 US12505115B2 US17/889,326 US202217889326A US12505115B2 US 12505115 B2 US12505115 B2 US 12505115B2 US 202217889326 A US202217889326 A US 202217889326A US 12505115 B2 US12505115 B2 US 12505115B2
- Authority
- US
- United States
- Prior art keywords
- search
- triggers
- content
- search query
- fields
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
- G06F16/24565—Triggers; Constraints
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
Definitions
- the present technology relates generally to searching in computing environments and, more particularly, but not by limitation, to systems and methods for automatic search query refinement.
- Conventional search systems typically use all words in a search query as keywords for searching data sources. If a user wants to select additional or modified parameters for the search, as is often the case, the user is generally limited to selecting specific filters or options, if available, provided via a user interface.
- conventional search systems tend to ignore contextual cues within a search query. Rather, these conventional systems are ordinarily context-agnostic and instead place most or all search terms on equal footing as prospective keywords. As a result, these conventional search features often return irrelevant, excessive, and even nonsensical results. These approaches can adversely affect the quality of the search result and can require additional computational resources to accommodate subsequent trial-and-error searches by the user, resulting in wasted time from the user's perspective.
- a method for search query refinement may include identifying a plurality of electronic sources of data content of an entity stored at different network-accessible locations. Fields may be dynamically assigned to the content. A unified search interface can thereupon be to authorized users to search the content. A search query may be received from one of the users via the search interface. The search query may be parsed to identify different triggers using rules configurable by the entity for structuring the search to match a likely user intent. The triggers may be correlated with relevant ones of the fields to obtain search results.
- a system for search query refinement may include a memory. At least one processor may be coupled to the memory. The at least one processor may be configured to identify a plurality of electronic sources of data content of an entity stored at different network-accessible locations. The at least one processor may dynamically assign fields to the content. Thereupon, the at least one processor may provide a unified search interface to authorized user devices to search the content.
- a search query may be received by the at least one processor from one of the user devices via the search interface.
- the at least one processor may parse the search query to identify different triggers using rules configurable by the entity for structuring the search to match a likely user intent.
- the at least one processor may correlate the triggers with relevant ones of the fields to obtain search results.
- a non-transitory computer-readable medium includes code that, when executed by at least one processor, causes the processor to identify a plurality of electronic sources of data content of an entity stored at different network-accessible locations, dynamically assign fields to the content, provide a unified search interface to authorized user devices to search the content, receive a search query from one of the user devices via the search interface, parse the search query to identify different triggers using rules configurable by the entity for structuring the search to match a likely user intent, and correlate the triggers with relevant ones of the fields to obtain search results.
- the triggers may include search triggers to be used for searching content, filter triggers to be applied for filtering search results, and structural triggers to be used for ranking the search results.
- the filter triggers may include one or more of the following: a content type, a document type, a document author, a topic, and so forth.
- the structural triggers may include one or more of the following: a creation date, a modification date, a last opening date, a size of a document, and so forth.
- the system may enable the user to edit one or more of the triggers.
- the system may further highlight words of the search query to indicate one or more of the search triggers, the filter triggers, and the structural triggers to the user.
- the system may also provide, via an application programming interface, an annotation associated with the search query.
- the annotation may indicate terms of the search query to be used as the triggers.
- the system may further enable the user to create one or more of the predetermined rules for determining the triggers.
- the predetermined rules may include one or more of the following: static rules, rules based on a schema of content indexed, rules based on values in data fields of the content, rules based on the content, and so forth.
- the search engine may search the content based on the triggers to retrieve the search results and process the search results based on the triggers.
- the display module may display the processed search results to the user.
- FIG. 1 is a high-level schematic diagram of a computing architecture for practicing aspects of the present technology, according to example embodiments.
- FIG. 2 is a schematic diagram showing automatic search query refinement by a system for automatic search query refinement, according to an example embodiment.
- FIG. 3 illustrates a user interface showing an automatic search query refinement process, according to an example embodiment.
- FIG. 4 is a schematic diagram of a computing system that is used to implement embodiments according to the present technology.
- FIG. 5 is an example flow diagram of an automatic search query refinement process, in accordance with an embodiment.
- FIG. 6 is another example flow diagram of an automatic search query refinement process, in accordance with an embodiment.
- a component may include, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
- an application running on a computing device and the computing device can be a component, system, module, etc.
- One or more components can reside within a system, process and/or thread of execution and a component or system can be localized on one computer and/or distributed between two or more computers.
- these components and systems can execute from various computer-readable media having a plurality of different data structures stored thereon.
- the components can communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets, such as data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal.
- Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
- the present disclosure is directed to various embodiments of systems and methods for automatic search query refinement.
- the system of the present disclosure may consolidate a plurality of electronic sources of data content accessible to or belonging to an entity.
- the system may manipulate the data content stored, or being stored on the one or more electronic sources to add fields, revise or delete existing fields, or update fields that can thereafter be used by an authorized user of the entity to run a centralized search query of all the different electronic sources from a single unified search interface, e.g., at the laptop, computer, workstation, smart phone, or other transportable or mobile device of the user, as is dependent upon the system and the controlling entity.
- the system of the present disclosure dynamically leverages the schema, values and fields associated with content in data sources to analyze search query inputs and automatically refine the underlying structure of the search query to match intent of a user that provided the search query.
- a system administrator may first use an initial user interface provided by the system to connect all of an organization's enterprise sources, which may include the organization's e-mail server(s), any internal/proprietary document storage system used by the organization, Google Drive, Dropbox, Microsoft One Drive, and the like. After the administrator specifies these different sources, the system described herein extracts all the content from all of these sources, and indexes the content in a manner further described below.
- the system may extract this information on an ongoing basis, as the organization and its employees continue to revise documents, store data, send e-mail, etc.
- the system may then provide a single “search box” to each employee or other member of the organization, generally referred to herein as a user.
- the user can then enter a topic of interest in the search box.
- the system can thereupon search all the different enterprise sources for matching topics, and can provide the list to the server.
- the search queries may be highly structured. If the search query is “design documents created by Jane,” in lieu of treating all terms of the search query as keywords and then searching the content for some combination of those keywords, the system can use predetermined rules to identify so-called triggers. For example, the system may use various rules to categorize the word “document” in the query. Rather than being a simple keyword to be searched for in combination with other keywords, the system can recognize that the word “document” is really intended to mean a class of search results that may include e-mails, memoranda, word processing documents, etc. The rules for establishing which terms are triggers, such as “documents” may be configured by the operator of the system.
- the operator may elect to include a plurality of different files having specific extensions (e.g., .doc, .pdf, etc.) as corresponding to the trigger word “documents.”
- Rules for recognizing triggers may be dynamically specified, meaning that the system operator can change the rules for structuring data in search queries at any time.
- the system automatically parses the terms entered by the user to assess the search structure. Instead of treating all terms as keywords, the system automatically parses the search query and extracts terms from the word search that it determines are structural to the query.
- the system automatically builds a structured search in the background. The system also factors out of the keyword portion of the search terms that are determined to be structural.
- the example word “document” can be treated as one such trigger and can be removed from the other terms that are determined to be treated as keywords.
- the word “document” is the filter trigger that can be used to refer to the various relevant file formats as described above.
- the term “created by Jane” can be used as a filter trigger to filter only those documents whose author is Jane.
- the system when parsing the query may use the predetermined rules to identify “recent” as not a keyword, but rather a sorting structural trigger that is used to place a temporal limitation on the search.
- the system may remove the word “recent” from the other keyword text, and then automatically sort the search results to include the newest documents first.
- the system described herein can automatically create a structured search, which in turn can greatly simplify searches by users across many different sources at once.
- the system of the present disclosure allows end users to type a natural language search query into a search box and then translates the natural language search query into a structured query based at least in part on predetermined rules dynamically provided by the entity, an agent thereof, or in appropriate contexts, a user (e.g., an employee of the entity).
- the rules can be dynamically provided, as noted, in that they can be changed by a user to change the way the system parses queries.
- the searches themselves can be also performed dynamically in that the search engine can adapt automatically based on the content stored in the system.
- the schema of the content in the system helps the system determine the different fields that can be parsed out of a query during the structuring of the query by the system.
- the system provides two modes of functionality.
- the first mode is when the system is inputting content from one or more sources and making that content “searchable.” This procedure can include updating content periodically over time, or updating the content on the fly as it is revised, or as new content is input into the system.
- the second mode is the end-user performing the actual search.
- the new content that is being extracted from the different electronic sources has a schema.
- a schema is a representation in the form of an outline. For example, an e-mail has informational attributes like a subject, an author, a recipient, ccs and bccs, a body, one or more attachments, etc. All of these fields represent the schema of the document, which are ultimately populated into fields.
- the system may automatically search these documents to determine their schema—namely, what additional fields those documents may include. Whatever such schema is identified can be used by the system to automatically change the way queries are parsed.
- the system will have already established the schema of e-mail in the first mode.
- the system may use “e-mails” and “attachments” as triggers and may then proceed to search for all actual e-mails that have a non-empty attachment field, flag, etc.
- This simple example of automatically identifying the schema of an e-mail by identifying its fields and later using this information to parse search queries can be extended to more complex documents and data structures.
- the system can automatically create fields for such documents, so that when a search is subsequently run, the system can use the schema (e.g., the outline of fields) to correlate the triggers with the fields.
- certain cloud-based networks have a sharing function.
- the sharing function is part of the schema of documents sourced from that cloud source.
- the system can automatically identify documents that are being shared.
- the system can thus automatically create the schema of the document to include a “shared” field.
- the system may also automatically create other related fields, such as the name of the individual sharing the document.
- the parsing of the query can be modified, for example, when the search query states “all documents shared by Mike.”
- the search can be structured to identify all documents that have a sharing function and that is currently enabled, and that includes “Mike” as the sharer of the documents.
- the dynamic nature of this automatic schema-building is yet another benefit of the flexibility of the centralized search system according to some embodiments.
- the principles herein can be employed in the context of a single person, or an organization.
- entity shall broadly refer to any organization, such as a corporation or governmental entity, or other community of users of any kind that may acquire and/or use the systems and methods for search refinement as described herein.
- the entity may include different users.
- the entity may provide one or more criteria or rules used for adding and revising fields, or for manipulating triggers at the search stage.
- the entity providing this information is equivalent to an agent of the entity, such as an employee, an authorized information technology (IT) technician or an authorized third-party contractor providing the information.
- IT information technology
- a network may in appropriate circumstances refer to a plurality of networks that may be mutually accessible.
- the different electronic sources may reside on different networks, while being accessible to network users (e.g., employees, executives, partners or other members) of the entity.
- a single, unified search dialog box on a graphical user interface, or a combination of linked input entries on the unified user interface may be used to broadly conduct a search across a plurality of disparate electronic sources.
- electronic sources include e-mail servers (e.g., Gmail), various cloud network locations (Google, iCloud, Microsoft's OneDrive, and the like), a database internal to the company that includes proprietary content, an e-mail server, and other network locations or URLs, e.g., that may include information about the entity's customers.
- the user may enter a topic once, and the utility may conduct a search across all consolidated electronic sources for any search results relevant to the topic input as the search query, as described in more detail herein.
- the system can process the data content in the plurality of specified electronic sources on an ongoing basis.
- the system can start by indexing the data as it is stored, and as it continues to be updated, revised, and populated with new data over time. This indexing may include adding, updating, revising, and deleting fields, and creating new fields that may include not only typical fields such as dates and numbers, but also fields that corresponds to the content of the data itself.
- the search query can be analyzed by dissecting the search query to determine triggers in the search query.
- the keyword “documents” in the search query can be determined to be a filter trigger that is applied to the search results to match the user's intent.
- any record that is a document e.g., PDF, docx, Google Document
- the keywords “created by Jane” in the search query may be determined to be a filter trigger.
- the filter trigger is found, any document that was created by a person with the first name Jane is filtered into search results.
- a filter trigger generally refers to a word or phrase that can be used to exclude otherwise responsive documents from scope of the search
- a structural trigger may be used for ranking or sorting the search results.
- “documents” may be a filter trigger in that any items that do not fall within the scope of a “document” as per the established rules provided by or to the system may be filtered out of the search.
- Examples of filter triggers may include one or more of the following: a content type, a document type, a document author, a topic, and so forth.
- the structural triggers may include one or more of the following: a creation date, a modification date, a last opening date, a size of a document, and so forth. For structural triggers, for example, larger documents may come up earlier in the search results, while smaller documents come up later.
- structural triggers may also be considered a filter trigger when documents of a specific size are relevant, and other document sizes are excluded.
- the keywords “created by” may be associated with a created action.
- Documents stored in the data sources may have fields that are associated with the created action (e.g., a field storing a name of a person that created the document). Therefore, when the system determines that the search query includes the “created by Jane” keywords, the system can search for any content that was created by any user named “Jane” in an organization.
- the structured nature of the search query enables the system to properly correlate the filter and structural triggers with the appropriate fields in a manner that is most relevant to the intended search of the user.
- the keyword “design” in the search query may be designated as a keyword for searching the content.
- the content of the data itself can be used as one of the fields for matching a keyword.
- the keyword is found, any documents created by any user named “Jane” and having the keyword “design” are included in the search results. Therefore, the ultimate structured search query is to search for documents of any types that have the term “design” in the title or the document content and that were created by a user named “Jane.”
- the predetermined rules specified by the entity and used by the system to identify relevant triggers and contextual cues may include one or more of the following: static rules, rules based on a schema of content indexed, rules based on values in data fields of the content, rules based on the content, and so forth.
- the rules for identifying triggers are dynamic in that they are changeable in real-time.
- the rules may originate from the content stored in the data sources. Specifically, the rules can be dynamically changed based on the type of content added to the data sources, configurable criteria used in the indexing of the content and specified by the entity using the system (or by the system itself), files recently accessed by users, additional data sources added, new users joining an organization, and so forth.
- the rules for filtering the content can be updated in real time to respond to the search query efficiently, based on the content being added to the system. Therefore, as the organization is creating more content around different topics with different structures, all of the content from the topics is fed to a search engine of the system automatically and the search queries are handled properly based on the evolution of the content and topics in the organization. Further refinement of the added content can be specified by the entity or user of the system via different configurable criteria.
- the processing of the search query performed by the system may include disambiguation between dates and names. For example, keywords “created on” may be determined to relate to date-related fields, and keywords “created by” may be determined to relate to person-related fields. In the person-related fields, values of the fields may be identifiers for a person.
- the user can create different contextual words and designate these words as triggers. For example, the user can set a qualifying term. That is, for instance, the term “on” after an action can be qualified as a date-related field, and the term “by” after an action be qualified as a person-related field. Moreover, the user can set a rule such that if an action is followed by “by,” the system needs to search for people-related fields, but not date-related fields in the content. Furthermore, the user can set specific keywords to determine the context of a specific language.
- the user may be allowed to create custom block lists.
- the block lists are lists of words that should never be considered trigger words (e.g., “it,” “in,” “pages,” “string,” and so forth).
- the system can provide some false positives.
- some structure of the content that gets indexed can introduce new rules to a search model, which may result in the search model behaving in a way not needed by the user.
- the user can affirmatively configure the search engine to not consider specific terms or not consider specific terms in a particular way.
- the problem solved by the system is translating a search query represented in human language (e.g., “design documents created by Jane”) into a meaningful structured query including one or more triggers that can be used to attempt to most closely match the search intent where the triggers are correlated with fields created during the storage of data content in different electronic sources.
- a search query represented in human language e.g., “design documents created by Jane”
- a meaningful structured query including one or more triggers that can be used to attempt to most closely match the search intent where the triggers are correlated with fields created during the storage of data content in different electronic sources.
- the data content across a plurality of specified electronic sources is initially analyzed and dissected when it is first stored. Based on these analyses and rules predetermined by the operator or organization, also referenced as the entity, the system can dynamically create a plurality of relevant fields.
- the search engine of the system provides users with a unified interface to conduct searches for data content across the plurality of electronic sources.
- the system analyzes a user-input search query, automatically dissects the search query, and applies refinements to the search query to match user intent determined based on the analysis of the search query.
- the search rules provided by the search engine are dynamic in part because they are configurable based on the data in the data sources and can also be configured directly by the operator of the system or entity.
- the schema of records and values of fields in the data sources are initially used to build the rules for searching and processing (e.g., filtering, ranking) the search results. Using this procedure, the system can expediently obtain a set of documents from the varied data content that is more relevant in a faster time than in the conventional case where the search terms are mere keywords for comparison with a vast body of generic text.
- the system can build a set of rules dynamically as content from the data sources when the content is ingested in the search engine. Every time a search query is input by a user, the text of the search query is analyzed to determine the intent of the user. The output of this analysis is multipart. Specifically, the system decides whether search results should be filtered (e.g., “created by Jane” can become a filter trigger for fields that denote creation personas using the name “Jane”). The system further determines which terms are actual text queries and not structural (e.g., “design”). The system further determines which terms are structural triggers, such that terms that denote structural intent (such as “most recent”) become a “sort trigger” to sort the search results by “creation date” based on the “most recent” terms used in the search query.
- search results should be filtered e.g., “created by Jane” can become a filter trigger for fields that denote creation personas using the name “Jane”.
- the system further determines which terms are
- the rules may include determining the trigger “most recent” as a sorting directive. Therefore, the search results may be sorted by showing the most recent content first.
- the system may highlight or annotate the ultimate search query to indicate, to the user, how the search query was analyzed by the system. Specifically, when the user types in words in a search box, the system may highlight words determined by the system as the triggers. Therefore, if the words triggered a refinement in the search query, the words are highlighted, and the user is able to see how the search query was processed in the structured search query.
- the system may provide, via an application programming interface, an annotation to the search query to indicate terms of the search query used as the triggers. This enables a developer to tune a custom user interface to reflect how the system processed the search query.
- the system may further change the user interface to look as though the user selected specific filters that were found in the search query by the system.
- the changes made to the user interface act as a feedback loop provided to the user via the application programming interface and in the user interface to show how the system analyzes the search query.
- the system can use dynamic rules based on content, special concepts for classes of fields and terms identifying people, custom context triggers, custom block lists for false positives, and user interface elements to provide user feedback on the analysis of the search query.
- the system also provides developer application programming interfaces to enable automatic refinements to the search query. Therefore, the system not only processes the search query, but also provides feedback to the user to show how the search query was processed by the system. The user can in turn modify the search or a trigger to change how the search query is parsed by the system or to match the intent of the search more closely.
- advantages provided by the system include creating rules for searching and processing the search results that dynamically inherit rule structure from the content in the data sources.
- the system also covers key usability concerns from real-world scenarios, such as the ability to configure required contextual terms, static rules, and disallowing false positive terms.
- FIG. 1 is a high-level schematic diagram of an exemplary computing architecture (hereinafter referred to as architecture 100 ) of a computing environment of the present technology.
- the architecture 100 includes an exemplary system for automatic search query refinement shown as a system 105 .
- the system 105 includes a server or cloud-based computing device configured specifically to perform the analysis described herein. That is, the system 105 , in some embodiments, is a specific purpose computing device that is specifically designed and programmed (e.g., configured or adapted) to perform any of the methods described herein.
- the system 105 can also include a plurality of distributed computing systems that cooperatively provide the features of the system 105 . For example, individual ones of the plurality of distributed computing systems can provide one or more unique functions or services.
- the system 105 can comprise a cloud computing environment or other similar networked computing system.
- the system 105 can be communicatively coupled, via a network 150 , with one or more content sources 110 .
- a content source 110 can include, for example, a computing system, an enterprise network, a plurality of computing systems arranged as a network, virtual machines, application(s), databases, network tap(s), services, a cloud, containers, or other similar computing environment that creates data instances.
- the system 105 includes a processor 115 and memory 120 for storing instructions.
- the processor 115 may be implemented as one or more general purpose processors, reduced instruction set computer (RISC) processors, or dedicated processors.
- the latter category may include, for example, one or more digital signal processors (DSPs), field programmable gate arrays (FPGAs), digital logic gates, or any combination thereof.
- DSPs digital signal processors
- FPGAs field programmable gate arrays
- the instructions and steps of the methods described in this disclosure may be implemented or executed by the one or more processors 410 ( FIG. 4 ) or related devices.
- the processor 115 may reside on a single device. In other configurations, the processor 115 may include two or more processing devices or central processing units (CPUs) distributed across a plurality of devices.
- CPUs central processing units
- the term “electronic source” can broadly include any number of recognizable data sources including, without limitation, local folders, local or network hard drives, solid-state drives or any physical configuration that is accessible locally or via a network.
- “electronic sources” may include cloud-based network locations controlled by the operator, organization or other entity, e-mail servers, folders, and any number of private or public applications that are accessible to the operator or organization and includable as a source that can be searched for documents and other data content.
- the memory 120 can include a search engine 125 and a display module 135 .
- the terms “module” may also refer to an application-specific integrated circuit (“ASIC”), an electronic circuit, a processor (shared, dedicated, or group) that executes one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
- ASIC application-specific integrated circuit
- a system operator or other agent of the entity using the system 105 may designate a plurality of different electronic sources for searching data content.
- information may be extracted from the content in or near real time to identify, update, and create new fields.
- the fields can be manipulated by the organization dynamically using rules or preferences identified by the system from an application, and application programming interface, or otherwise by an IT specialist at the organization or entity, for example.
- the content source 110 may include different electronic sources of data content as described above. They may include different folders, subfolders or other organization-controlled applications that reside on one or more local or remote computing devices. While FIG. 1 shows the content source 110 as being available through the network 150 , in some embodiments the content source 110 can be local to the system 105 .
- content is construed to mean any kind of electronic data content, including but not limited to documents, writings, images, illustrations, recordings, audio works, videos, e-mails, communications, memoranda and files, data or electronic information of any kind.
- the content may be input for temporary or permanent storage into one or more potentially disparate content sources, such as an e-mail server and a cloud-based network, for example.
- an e-mail file may include an author field, a recipient field, cc and bcc fields, time and date fields, and the like.
- Documents such as memoranda may have date fields, subject fields, size fields (e.g., number of pages), body content, the number and type of attachments, etc.
- Each kind of data content may include one or more different fields.
- the system 105 and corresponding search engine 125 are dynamically configurable. For example, as new documents are stored in one of the content sources 110 , that content may contain fields that presently do not exist. Accordingly, the system 105 may automatically create new fields to facilitate subsequent searches of the content. For example, when e-mail files are initially stored with an attachment having a specific file format, a field identifying an e-mail having an attachment with the identified format may be created. Depending on the predetermined rules and the identified needs of the organization, any number of different field types can be recognized. This field-based information may be specified by an operator of the system 105 , such as via an application programming interface, or via another technique. Other types of more common fields may also be automatically recognized and created. One such field type may be the content source itself (e.g., Google Drive or another cloud-based location maintained by the relevant organization). Another exemplary field may be the name of an individual.
- Google Drive e.g., Google Drive or another cloud-based location maintained by the relevant organization.
- Another exemplary field may be the
- the system 105 can assign new fields to incoming content designated for a content source 110 .
- the system 105 may receive a search query from a user. Upon receiving the search query, the search engine 125 may parse the search query. Upon parsing, the search engine 125 may determine, based on predetermined rules, triggers associated with the search query.
- the triggers may include search triggers to be used for searching content.
- the search triggers may include keywords identified in the search query.
- the triggers may further include filter triggers.
- the filter triggers may be applied to filter search results.
- the filter triggers may include one or more of the following: a content type, a document type, a document author, a topic, and so forth.
- the triggers may further include structural triggers.
- the structural triggers may be used for ranking the search results.
- the structural triggers may include one or more of the following: a creation date, a modification date, a last opening date, a size of a document, and so forth.
- the search engine 125 may search the content to retrieve the search results. Upon finding the search results, the search engine 125 may process the search results based on the triggers. The display module 135 may display the processed search results to the user.
- the search engine 125 may enable the user to edit one or more of the triggers.
- the system 105 may highlight words of the search query to indicate the keywords, the filter triggers, and/or the structural triggers to the user.
- search engine 125 may enable the user to create one or more of the predetermined rules for determining the triggers to tailor search experience to their needs.
- system 105 can use the search engine 125 to access a plurality of consolidated features including content source(s) 110 .
- the system 105 can be configured to identify context in a natural language search and run it internally using processor 115 as a highly structured query. Rather than treating the search as a plurality of keywords and looking for a combination of the keywords across the content source 110 as is performed conventionally, the system 105 may instead be configured to ascertain context based on the search input.
- the system 105 can use the triggers and predetermined rules described above to determine that the intent of the search in this context is not to treat “pdfs” as a keyword, but rather that the term refers to a document type.
- the system 105 may instead understand that the search context in this example may be much narrower.
- the system 105 may search across the local and remote content sources for all documents in .pdf form.
- the system 105 may be preconfigured to contextualize the search and determine that “documents” is similar to the above example of “pdfs,” and just at one level of abstraction. Accordingly, rather than run a simple keyword search, the system 105 may instead refer to the predetermined rules and conclude that document explicitly encompasses a narrower definition.
- the system 105 can then search for all e-mails, memoranda, and other types of sources that can reasonably be referred to as “documents” that relate to the “Gruvhausen patent,” which itself may be partitioned into a keyword “Gruvhausen” and another trigger (“patent”) as further described herein.
- an operator of the system 105 can also dynamically establish rules that attempt to ferret out false positives, such as information that commonly leads to search errors due to the diversity of language.
- the organization is an accounting firm named Smith & Jones, it may be undesirable in certain circumstances to assign a name field to “Smith.”
- the user can specify rules for disregarding the use of a particular false positive. This flexibility for ferreting out false positives can render the search engine 125 highly robust and efficient.
- FIG. 2 is a schematic diagram 200 showing automatic search query refinement by the system 105 , according to an example embodiment.
- a user provides a natural language search query 205 to the system 105 .
- An example natural language search query 205 may include “acme inc contract created recently by Jane.”
- the system 105 receives the natural language search query 205 and processes the natural language search query 205 into a structured search query 210 .
- the processing may include determining, in the natural language search query 205 , search triggers shown 215 , filter triggers 220 , and structural triggers 225 .
- the system 105 receives the natural language search query 305 and processes the natural language search query 305 into a structured search query.
- the processing includes determining, in the natural language search query 305 , search triggers 310 , filter triggers 315 , 320 , and 325 , and structural triggers 330 .
- the system 105 may determine, based on predetermined and configurable rules, that the keywords “acme inc” are search triggers 310 to be searched in the content data sources.
- the system 105 may further determine that the keyword “contract” is the filter trigger 315 .
- the system 105 may filter search results found based on the keyword search to select only documents that are contracts.
- system 105 may further determine that the keywords “created by” and “Jane” are the person-related filter triggers 320 and 325 , respectively. Based on the filter triggers 320 and 325 , the system 105 may filter the previously selected contracts to select only contracts created by a user named “Jane.” Then, system 105 may further determine that the keyword “recently” is a date-related filter trigger. Based on the date-related filter trigger, the system 105 may filter the previously selected contracts created by a user named “Jane” to select only documents that were created within the predetermined term, for example, within the week or any predetermined number of days before the current date. Furthermore, the keyword “recently” may be also designated as a structural trigger 330 . Based on the structural trigger 330 , the system 105 can sort the search results to show the most recent search results first. The filtered and sorted search results 335 may be presented to the user on the user interface 300 .
- the system 105 may change the user interface 300 based on the trigger filters identified in the natural language search query 305 .
- the user typically needs to use a dropdown menu and select the date range, for example, “past week.”
- the system 105 can automatically change the user interface 300 based on the trigger filters. Specifically, upon determining that the natural language search query 305 includes the keyword “recently,” which is the date-related filter trigger, the system may automatically select the “past week” option in a dropdown menu 340 . Therefore, the user needs to enter the natural language search query 305 in a search box 350 instead of manually selecting options from any of the dropdown menus of the user interface 300 .
- the system 105 may automatically select the options in the dropdown menus of the user interface 300 based on the triggers identified by the system 105 in the natural language search query 305 .
- Changing the user interface 300 may serve as a feedback loop provided via the application programming interface and in the user interface 300 to the user to show the user how the system 105 processed the natural language search query 305 .
- FIG. 4 illustrates an exemplary computing system 400 that can be used to implement embodiments described herein.
- the computing system 400 can be implemented in the contexts of the system 105 .
- the exemplary computing system 400 of FIG. 4 may include one or more processors 410 and memory 420 .
- Memory 420 may store, in part, instructions and data for execution by the one or more processors 410 .
- Memory 420 can store the executable code when the exemplary computing system 400 is in operation.
- the exemplary computing system 400 of FIG. 4 may further include a mass storage 430 , portable storage 440 , one or more output devices 450 , one or more input devices 460 , a network interface 470 , and one or more peripheral devices 480 .
- the instructions herein can be executed on the one or more processors and can be stored on memory 420 in the form of one or more application programming interfaces, application software, and other routines and provided, at least in part, on the portable storage device 440 .
- Mass storage 430 which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by a magnetic disk or an optical disk drive, which in turn may be used by one or more processors 410 . Mass storage 430 can store the system software for implementing embodiments described herein for purposes of loading that software into memory 420 .
- Portable storage 440 may operate in conjunction with a portable non-volatile storage medium, such as a compact disk (CD) or digital video disc (DVD), to input and output data and code to and from the computing system 400 of FIG. 4 .
- a portable non-volatile storage medium such as a compact disk (CD) or digital video disc (DVD)
- CD compact disk
- DVD digital video disc
- the system software for implementing embodiments described herein may be stored on such a portable medium and input to the computing system 400 via the portable storage 440 .
- One or more input devices 460 provide a portion of a user interface.
- the one or more input devices 460 may include an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, a stylus, or cursor direction keys.
- the computing system 400 as shown in FIG. 4 includes one or more output devices 450 .
- Suitable one or more output devices 450 include speakers, printers, network interfaces, and monitors.
- the components contained in the exemplary computing system 400 of FIG. 4 are those typically found in computing systems that may be suitable for use with embodiments described herein and are intended to represent a broad category of such computer components that are well known in the art.
- the exemplary computing system 400 of FIG. 4 can be a personal computer, handheld computing device, telephone, mobile computing device, workstation, server, minicomputer, mainframe computer, or any other computing device.
- the computer can also include different bus configurations, networked platforms, multi-processor platforms, and so forth.
- Various operating systems (OS) can be used including UNIX, Linux, Windows, Macintosh OS, Palm OS, and other suitable operating systems.
- a bus carries the data to system RAM, from which a CPU retrieves and executes the instructions.
- the instructions received by system RAM can optionally be stored on a fixed disk either before or after execution by a CPU.
- FIG. 5 is an example flow diagram 500 of an automatic search query refinement process, in accordance with an embodiment.
- the steps identified in FIG. 5 may be performed by the system 105 ( FIG. 1 ), including processor 115 , search engine 125 and memory 120 ; the steps may also be performed by Natural Language Search Query component 205 along with system 105 and structured search query component 210 ( FIG. 2 ); the steps may still further be performed by the one or more processors 410 and the various other devices shown in FIG. 4 .
- the search query refinement commences by the system 105 .
- the search query received by a user at step 504 via the search interface ((e.g., shown by display module 135 ( FIG. 1 )) and one or more input devices 460 ( FIG. 4 ) is parsed.
- the system at step 508 may determine one or more search triggers based on predetermined and configurable rules.
- the search triggers may, for example, correspond to content identified in new or custom fields created by the system when indexing the data content in the various electronic sources 110 ( FIG. 1 ).
- one or more filter triggers may be identified as described above.
- the system may use the filter triggers for filtering the search results identified in step 508 , for example, by correlating the filter triggers to the search results and/or to the fields identified in the search results.
- the search terms are returned at step 516 .
- one or more structural filters may be identified.
- the system may rank or sort the heretofore identified data content using one of more of the structural triggers as described in greater detail above. For example, the system may correlate the structural triggers with similar fields that have indexed information in the contents of the data content in the different electronic sources 110 .
- the system can obtain and sift through relevant search content in one or more intermediate steps, if necessary, to obtain the final search results for display, e.g., on display module 135 .
- FIG. 6 is another example flow diagram 600 of an automatic search query refinement process, in accordance with an embodiment.
- the steps in FIG. 6 may be performed by the devices identified above with reference to FIG. 5 , for example.
- the system may use information from the entity to identify a plurality of different electronic sources of data content of the entity stored at different network-accessible locations.
- the system may proceed to index the content by dynamically assigning fields to the content based at least in part on criteria specified by the entity, which for purposes of this disclosure includes an agent, contractor, employee, or group of personnel authorized to provide this information on behalf of the entity.
- the assignment and addition of fields is a systematic undertaking by the system as the data content is continuously indexed as new files and other content are revised, updated, or stored over time on any of the different electronic sources.
- the system may provide a unified search interface to a plurality of authorized users to search the content.
- the system may receive a search query from one of the authorized users (e.g., an employee, agent, contractor, or otherwise authorized personnel) via the search interface (e.g., systems 105 , natural language search query component 205 , display 125 , or devices 460 , 470 , etc.).
- the search interface may be presented in natural language.
- the system may parse the search query to identify different triggers using rules configurable by the entity for narrowing the search to match a likely user intent. Having identified the search triggers, the system at step 660 can thereupon correlate the triggers with relevant ones of the fields to obtain the search results 660 .
- Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
- a storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.
- computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code means in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor.
- any connection is properly termed a computer-readable medium.
- Disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above are also included within the scope of computer-readable media.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Bioethics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (22)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/889,326 US12505115B2 (en) | 2021-08-16 | 2022-08-16 | Search query refinement using generated keyword triggers |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163233622P | 2021-08-16 | 2021-08-16 | |
| US17/889,326 US12505115B2 (en) | 2021-08-16 | 2022-08-16 | Search query refinement using generated keyword triggers |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230046298A1 US20230046298A1 (en) | 2023-02-16 |
| US12505115B2 true US12505115B2 (en) | 2025-12-23 |
Family
ID=85178012
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/889,326 Active 2043-01-29 US12505115B2 (en) | 2021-08-16 | 2022-08-16 | Search query refinement using generated keyword triggers |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US12505115B2 (en) |
| WO (1) | WO2023023099A1 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119961296B (en) * | 2023-10-30 | 2026-02-03 | 荣耀终端股份有限公司 | Method for retrieving verticals and electronic equipment |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030204501A1 (en) | 2002-04-26 | 2003-10-30 | Moon Sung Ub | Method and system for improving reliability of search engine information |
| US20040199491A1 (en) | 2003-04-04 | 2004-10-07 | Nikhil Bhatt | Domain specific search engine |
| US20050102251A1 (en) | 2000-12-15 | 2005-05-12 | David Gillespie | Method of document searching |
| US20050262050A1 (en) | 2004-05-07 | 2005-11-24 | International Business Machines Corporation | System, method and service for ranking search results using a modular scoring system |
| US20100185611A1 (en) * | 2006-03-01 | 2010-07-22 | Oracle International Corporation | Re-ranking search results from an enterprise system |
| US8688667B1 (en) * | 2011-02-08 | 2014-04-01 | Google Inc. | Providing intent sensitive search results |
| US20160308963A1 (en) * | 2015-04-17 | 2016-10-20 | Zuora, Inc. | System and method for real-time cloud data synchronization using a database binary log |
| US9672277B2 (en) * | 2009-12-04 | 2017-06-06 | Google Inc. | Presenting real-time search results |
| US11397737B2 (en) * | 2019-05-06 | 2022-07-26 | Google Llc | Triggering local extensions based on inferred intent |
-
2022
- 2022-08-16 US US17/889,326 patent/US12505115B2/en active Active
- 2022-08-16 WO PCT/US2022/040518 patent/WO2023023099A1/en not_active Ceased
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050102251A1 (en) | 2000-12-15 | 2005-05-12 | David Gillespie | Method of document searching |
| US20030204501A1 (en) | 2002-04-26 | 2003-10-30 | Moon Sung Ub | Method and system for improving reliability of search engine information |
| US20040199491A1 (en) | 2003-04-04 | 2004-10-07 | Nikhil Bhatt | Domain specific search engine |
| US20050262050A1 (en) | 2004-05-07 | 2005-11-24 | International Business Machines Corporation | System, method and service for ranking search results using a modular scoring system |
| US20100185611A1 (en) * | 2006-03-01 | 2010-07-22 | Oracle International Corporation | Re-ranking search results from an enterprise system |
| US9672277B2 (en) * | 2009-12-04 | 2017-06-06 | Google Inc. | Presenting real-time search results |
| US8688667B1 (en) * | 2011-02-08 | 2014-04-01 | Google Inc. | Providing intent sensitive search results |
| US20160308963A1 (en) * | 2015-04-17 | 2016-10-20 | Zuora, Inc. | System and method for real-time cloud data synchronization using a database binary log |
| US11397737B2 (en) * | 2019-05-06 | 2022-07-26 | Google Llc | Triggering local extensions based on inferred intent |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2023023099A1 (en) | 2023-02-23 |
| US20230046298A1 (en) | 2023-02-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11972203B1 (en) | Using anchors to generate extraction rules | |
| US12353400B1 (en) | Summarized view of search results with a panel in each column | |
| US12007989B1 (en) | Query execution using access permissions of queries | |
| US11423216B2 (en) | Providing extraction results for a particular field | |
| US20250209059A1 (en) | Providing supplemental values for events | |
| US8645905B2 (en) | Development artifact searching in an integrated development environment | |
| US11468072B2 (en) | Computer-implemented method and system for writing and performing a data query | |
| US9594814B2 (en) | Advanced field extractor with modification of an extracted field | |
| US20250061140A1 (en) | Systems and methods for enhancing search using semantic search results | |
| US20240169088A1 (en) | Systems and methods for providing searchable access to documents across separate document repositories | |
| US20220342950A1 (en) | System and method for searching based on text blocks and associated search operators | |
| US11526575B2 (en) | Web browser with enhanced history classification | |
| US8768913B2 (en) | Multi-source searching in a data driven application | |
| CN116894022B (en) | Utilizing structured audit logs to improve the accuracy and efficiency of database auditing. | |
| US20250061139A1 (en) | Systems and methods for semantic search scoping | |
| EP3480706A1 (en) | Automatic search dictionary and user interfaces | |
| WO2025191341A1 (en) | Enhanced query processing using domain specific retrieval augmented generation for financial services | |
| US9984108B2 (en) | Database joins using uncertain criteria | |
| US12505115B2 (en) | Search query refinement using generated keyword triggers | |
| CN112463792B (en) | Data authority control method and device | |
| US11790017B2 (en) | Systems and methods for searching related documents and associated search operators | |
| EP4305530A1 (en) | Automated, configurable and extensible digital asset curation tool | |
| US12346337B1 (en) | Advanced hybrid search syntax for power user content retrieval | |
| US12169713B2 (en) | Managing artifact information including finding a searched artifact information item | |
| US20250328489A1 (en) | Interaction method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ALLOWED -- NOTICE OF ALLOWANCE NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| AS | Assignment |
Owner name: ELASTICSEARCH B.V., NETHERLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HOXIE, QUINLAN J.;HOY, MARK;STORY, SEAN;AND OTHERS;SIGNING DATES FROM 20240301 TO 20250708;REEL/FRAME:072882/0292 Owner name: ELASTICSEARCH B.V., NETHERLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SCORCIO, MARSHALL;HARSHA, DAVID;REEL/FRAME:072882/0377 Effective date: 20210901 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |