This project implements a web search agent powered by a local Large Language Model (LLM) and a SearXNG search engine. It automates the process of searching the web for information and summarizing the results using an LLM.
The llm-websearch.bash script orchestrates the search process. It first formulates a search query using the LLM based on the user's input. Then, it queries a SearXNG instance and iterates through the search results, using the LLM to determine if a page is relevant. If a page is deemed relevant, its content is extracted, and the LLM summarizes the key information. Finally, the script presents a consolidated summary of the findings, along with the source URLs.
- SearXNG Instance: A running SearXNG instance is required for web searching. See the SearXNG Docker Compose for an easy setup.
- Local LLM: A local LLM endpoint is needed for query formulation, relevance assessment, and summarization. This project was tested with Gemma 2 2B Q8.
- Python Dependencies: The Python scripts (
llm-python-chat.py,llm-python-file.py) require theopenailibrary. - Utilities: The
llm-websearch.bashscript depends oncurl,htmlq,html2text, andpdf2txt.
-
Install Dependencies:
pip install openai sudo apt-get install curl htmlq html2text poppler-utils # For Ubuntu -
Configure SearXNG Address: Modify the
llm-websearch.bashscript to point to your local SearXNG instance. The default ishttp://searx.lan. -
Configure LLM Endpoint: Modify the Python scripts (
llm-python-chat.py,llm-python-file.py) to point to your local LLM endpoint. The default ishttp://localhost:9090/v1. -
Place Scripts in PATH: Ensure that the three scripts (
llm-python-chat.py,llm-python-file.py,llm-websearch.bash) are in your system's PATH (e.g.,/usr/local/bin).
Run the llm-websearch.bash script with a search query as an argument:
llm-websearch.bash "Best Robot Vacuum of 2025"The script will output a list of relevant URLs, descriptions, and summaries, followed by a final summary generated by the LLM.
- Query Formulation: The script uses
llm-python-chat.pyto ask the LLM to refine the user's search term into a search engine friendly phrase. - SearXNG Query: The script queries the SearXNG instance with the formulated search phrase.
- Relevance Assessment: For each search result, the script uses
llm-python-chat.pyto determine if the result's description suggests it's relevant to the original search term. - Content Extraction and Summarization: If a result is deemed relevant, the script extracts the content using
curlandhtml2text(orpdf2txtif it's a PDF), and then usesllm-python-file.pyto summarize the content. - Final Summary: After processing all relevant results, the script uses
llm-python-file.pyto generate a final summary of all the extracted information.
- Ensure SearXNG is Running: Verify that your SearXNG instance is running and accessible at the configured address.
- Check LLM Endpoint: Make sure your local LLM endpoint is running and accessible.
- Inspect Temporary Files: The script uses
/dev/shm/llm-websearch.txtas a temporary file. Check this file for intermediate results and error messages. - Examine Script Output: Pay close attention to the script's output for any error messages or unexpected behavior.
THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
