Skip to content

Jay4242/llm-websearch

Repository files navigation

LLM Web Search Agent

This project implements a web search agent powered by a local Large Language Model (LLM) and a SearXNG search engine. It automates the process of searching the web for information and summarizing the results using an LLM.

Overview

The llm-websearch.bash script orchestrates the search process. It first formulates a search query using the LLM based on the user's input. Then, it queries a SearXNG instance and iterates through the search results, using the LLM to determine if a page is relevant. If a page is deemed relevant, its content is extracted, and the LLM summarizes the key information. Finally, the script presents a consolidated summary of the findings, along with the source URLs.

Requirements

  • SearXNG Instance: A running SearXNG instance is required for web searching. See the SearXNG Docker Compose for an easy setup.
  • Local LLM: A local LLM endpoint is needed for query formulation, relevance assessment, and summarization. This project was tested with Gemma 2 2B Q8.
  • Python Dependencies: The Python scripts (llm-python-chat.py, llm-python-file.py) require the openai library.
  • Utilities: The llm-websearch.bash script depends on curl, htmlq, html2text, and pdf2txt.

Installation

  1. Install Dependencies:

    pip install openai
    sudo apt-get install curl htmlq html2text poppler-utils # For Ubuntu
  2. Configure SearXNG Address: Modify the llm-websearch.bash script to point to your local SearXNG instance. The default is http://searx.lan.

  3. Configure LLM Endpoint: Modify the Python scripts (llm-python-chat.py, llm-python-file.py) to point to your local LLM endpoint. The default is http://localhost:9090/v1.

  4. Place Scripts in PATH: Ensure that the three scripts (llm-python-chat.py, llm-python-file.py, llm-websearch.bash) are in your system's PATH (e.g., /usr/local/bin).

Usage

Run the llm-websearch.bash script with a search query as an argument:

llm-websearch.bash "Best Robot Vacuum of 2025"

The script will output a list of relevant URLs, descriptions, and summaries, followed by a final summary generated by the LLM.

Search Process Details

  1. Query Formulation: The script uses llm-python-chat.py to ask the LLM to refine the user's search term into a search engine friendly phrase.
  2. SearXNG Query: The script queries the SearXNG instance with the formulated search phrase.
  3. Relevance Assessment: For each search result, the script uses llm-python-chat.py to determine if the result's description suggests it's relevant to the original search term.
  4. Content Extraction and Summarization: If a result is deemed relevant, the script extracts the content using curl and html2text (or pdf2txt if it's a PDF), and then uses llm-python-file.py to summarize the content.
  5. Final Summary: After processing all relevant results, the script uses llm-python-file.py to generate a final summary of all the extracted information.

Troubleshooting

  • Ensure SearXNG is Running: Verify that your SearXNG instance is running and accessible at the configured address.
  • Check LLM Endpoint: Make sure your local LLM endpoint is running and accessible.
  • Inspect Temporary Files: The script uses /dev/shm/llm-websearch.txt as a temporary file. Check this file for intermediate results and error messages.
  • Examine Script Output: Pay close attention to the script's output for any error messages or unexpected behavior.

Screenshot

Best Robot Vacuum of 2025 Output

Disclaimer

THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

My version of an LLM Websearch Agent using a local SearXNG server because SearXNG is great.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published