Skip to content

SearXNG Multiple Engine CAPTCHA and Error Issues #410

@ariel-frischer

Description

@ariel-frischer

Describe the bug

SearXNG is encountering CAPTCHA challenges and errors across multiple search engines (DuckDuckGo, Qwant) and receiving malformed requests to Wikipedia API. This severely impacts the search functionality of AgenticSeek as primary search engines are being blocked.

searxng   | 2025-09-28 23:52:32,399 WARNING:searx.engines.duckduckgo: ErrorContext('searx/engines/duckduckgo.py', 363, 'raise SearxEngineCaptchaException(suspended_time=0, message=f"CAPTCHA ({resp.search_params[\'data\'].get(\'kl\')})")', 'searx.exceptions.SearxEngineCaptchaException', None, ('CAPTCHA (us-en)',)) False
searxng   | 2025-09-28 23:52:32,399 ERROR:searx.engines.duckduckgo: CAPTCHA
searxng   | Traceback (most recent call last):
searxng   |   File "/usr/local/searxng/searx/search/processors/online.py", line 160, in search
searxng   |     search_results = self._search_basic(query, params)
searxng   |   File "/usr/local/searxng/searx/search/processors/online.py", line 148, in _search_basic
searxng   |     return self.engine.response(response)
searxng   |            ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
searxng   |   File "/usr/local/searxng/searx/engines/duckduckgo.py", line 363, in response
searxng   |     raise SearxEngineCaptchaException(suspended_time=0, message=f"CAPTCHA ({resp.search_params['data'].get('kl')})")
searxng   | searx.exceptions.SearxEngineCaptchaException: CAPTCHA (us-en), suspended_time=0
searxng   | 2025-09-28 23:52:32,665 WARNING:searx.engines.wikipedia: ErrorContext('searx/engines/wikipedia.py', 187, '_network.raise_for_httperror(resp)', 'httpx.HTTPStatusError', None, ('400', 'Bad Request', 'en.wikipedia.org')) False
searxng   | 2025-09-28 23:52:32,665 ERROR:searx.engines.wikipedia: requests exception (search duration : 0.5207587419999982 s, timeout: 3.0 s) : Client error '400 Bad Request' for url 'https://en.wikipedia.org/api/rest_v1/page/summary/search%3A%20site%3Aycombinator.com/companies%20%22YC%20W24%22%20OR%20%22YC%20S24%22%20OR%20%22YC%20W25%22%20engineering'
searxng   | For more information check: https://httpstatuses.com/400
searxng   | Traceback (most recent call last):
searxng   |   File "/usr/local/searxng/searx/search/processors/online.py", line 160, in search
searxng   |     search_results = self._search_basic(query, params)
searxng   |   File "/usr/local/searxng/searx/search/processors/online.py", line 148, in _search_basic
searxng   |     return self.engine.response(response)
searxng   |            ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
searxng   |   File "/usr/local/searxng/searx/engines/wikipedia.py", line 187, in response
searxng   |     _network.raise_for_httperror(resp)
searxng   |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
searxng   |   File "/usr/local/searxng/searx/network/raise_for_httperror.py", line 77, in raise_for_httperror
searxng   |     resp.raise_for_status()
searxng   |     ~~~~~~~~~~~~~~~~~~~~~^^
searxng   |   File "/usr/local/searxng/venv/lib/python3.13/site-packages/httpx/_models.py", line 749, in raise_for_status
searxng   |     raise HTTPStatusError(message, request=request, response=self)
searxng   | httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://en.wikipedia.org/api/rest_v1/page/summary/search%3A%20site%3Aycombinator.com/companies%20%22YC%20W24%22%20OR%20%22YC%20S24%22%20OR%20%22YC%20W25%22%20engineering'
searxng   | For more information check: https://httpstatuses.com/400
searxng   | 2025-09-28 23:52:32,797 WARNING:searx.engines.qwant: ErrorContext('searx/engines/qwant.py', 199, 'raise SearxEngineCaptchaException()', 'searx.exceptions.SearxEngineCaptchaException', None, ('CAPTCHA',)) False
searxng   | 2025-09-28 23:52:32,797 ERROR:searx.engines.qwant: CAPTCHA
searxng   | Traceback (most recent call last):
searxng   |   File "/usr/local/searxng/searx/search/processors/online.py", line 160, in search
searxng   |     search_results = self._search_basic(query, params)
searxng   |   File "/usr/local/searxng/searx/search/processors/online.py", line 148, in _search_basic
searxng   |     return self.engine.response(response)
searxng   |            ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
searxng   |   File "/usr/local/searxng/searx/engines/qwant.py", line 159, in response
searxng   |     return parse_web_api(resp)
searxng   |   File "/usr/local/searxng/searx/engines/qwant.py", line 199, in parse_web_api
searxng   |     raise SearxEngineCaptchaException()
searxng   | searx.exceptions.SearxEngineCaptchaException: CAPTCHA, suspended_time=86400

To Reproduce

Steps to reproduce the behavior:

  1. Start AgenticSeek with ./start_services.sh full
  2. Send a query requiring web search, particularly with complex search operators like:
    • site:ycombinator.com/companies "YC W24" OR "YC S24" OR "YC W25" engineering
  3. Observe SearXNG logs showing CAPTCHA errors from multiple engines

Expected behavior

SearXNG should successfully execute searches without triggering CAPTCHA challenges. The system should:

  • Properly rate-limit requests to avoid triggering anti-bot measures
  • Handle search operators correctly when passing to different engines
  • Have fallback mechanisms when some engines fail

Issues Identified

  1. DuckDuckGo Engine: Receiving CAPTCHA challenge (suspended_time=0)
  2. Qwant Engine: Receiving CAPTCHA challenge (suspended_time=86400 - 24 hour suspension)
  3. Wikipedia Engine: 400 Bad Request due to malformed query with search operators

LLM Model used

qwen/qwen3-vl-235b-a22b-instruct (via OpenRouter)

Desktop (please complete the following information):

  • OS: Linux (Arch Linux 6.16.5-arch1-1)
  • Browser: N/A (affects SearXNG backend)
  • Version: Latest main branch (commit abc7658)
  • Docker: Running in containerized environment

Additional context

  • The issue appears when using complex search queries with site: operators and boolean logic
  • Multiple engines are failing simultaneously, suggesting either:
    1. Too aggressive querying triggering rate limits
    2. Improper query formatting for certain engines
    3. Docker container IP being flagged by search providers
  • Qwant has the most severe penalty with 24-hour suspension
  • Wikipedia errors are due to incorrect query encoding - it's trying to search Wikipedia for Google search operators

Root Cause Analysis

  1. CAPTCHA Issues: The Docker container's IP may be flagged due to:

    • High query volume in short time
    • Suspicious query patterns (automated behavior)
    • Shared IP ranges known to hosting providers
  2. Wikipedia 400 Error: The query site:ycombinator.com/companies is being passed directly to Wikipedia API, which doesn't understand Google search operators

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions