-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Describe the bug
SearXNG is encountering CAPTCHA challenges and errors across multiple search engines (DuckDuckGo, Qwant) and receiving malformed requests to Wikipedia API. This severely impacts the search functionality of AgenticSeek as primary search engines are being blocked.
searxng | 2025-09-28 23:52:32,399 WARNING:searx.engines.duckduckgo: ErrorContext('searx/engines/duckduckgo.py', 363, 'raise SearxEngineCaptchaException(suspended_time=0, message=f"CAPTCHA ({resp.search_params[\'data\'].get(\'kl\')})")', 'searx.exceptions.SearxEngineCaptchaException', None, ('CAPTCHA (us-en)',)) False
searxng | 2025-09-28 23:52:32,399 ERROR:searx.engines.duckduckgo: CAPTCHA
searxng | Traceback (most recent call last):
searxng | File "/usr/local/searxng/searx/search/processors/online.py", line 160, in search
searxng | search_results = self._search_basic(query, params)
searxng | File "/usr/local/searxng/searx/search/processors/online.py", line 148, in _search_basic
searxng | return self.engine.response(response)
searxng | ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
searxng | File "/usr/local/searxng/searx/engines/duckduckgo.py", line 363, in response
searxng | raise SearxEngineCaptchaException(suspended_time=0, message=f"CAPTCHA ({resp.search_params['data'].get('kl')})")
searxng | searx.exceptions.SearxEngineCaptchaException: CAPTCHA (us-en), suspended_time=0
searxng | 2025-09-28 23:52:32,665 WARNING:searx.engines.wikipedia: ErrorContext('searx/engines/wikipedia.py', 187, '_network.raise_for_httperror(resp)', 'httpx.HTTPStatusError', None, ('400', 'Bad Request', 'en.wikipedia.org')) False
searxng | 2025-09-28 23:52:32,665 ERROR:searx.engines.wikipedia: requests exception (search duration : 0.5207587419999982 s, timeout: 3.0 s) : Client error '400 Bad Request' for url 'https://en.wikipedia.org/api/rest_v1/page/summary/search%3A%20site%3Aycombinator.com/companies%20%22YC%20W24%22%20OR%20%22YC%20S24%22%20OR%20%22YC%20W25%22%20engineering'
searxng | For more information check: https://httpstatuses.com/400
searxng | Traceback (most recent call last):
searxng | File "/usr/local/searxng/searx/search/processors/online.py", line 160, in search
searxng | search_results = self._search_basic(query, params)
searxng | File "/usr/local/searxng/searx/search/processors/online.py", line 148, in _search_basic
searxng | return self.engine.response(response)
searxng | ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
searxng | File "/usr/local/searxng/searx/engines/wikipedia.py", line 187, in response
searxng | _network.raise_for_httperror(resp)
searxng | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
searxng | File "/usr/local/searxng/searx/network/raise_for_httperror.py", line 77, in raise_for_httperror
searxng | resp.raise_for_status()
searxng | ~~~~~~~~~~~~~~~~~~~~~^^
searxng | File "/usr/local/searxng/venv/lib/python3.13/site-packages/httpx/_models.py", line 749, in raise_for_status
searxng | raise HTTPStatusError(message, request=request, response=self)
searxng | httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://en.wikipedia.org/api/rest_v1/page/summary/search%3A%20site%3Aycombinator.com/companies%20%22YC%20W24%22%20OR%20%22YC%20S24%22%20OR%20%22YC%20W25%22%20engineering'
searxng | For more information check: https://httpstatuses.com/400
searxng | 2025-09-28 23:52:32,797 WARNING:searx.engines.qwant: ErrorContext('searx/engines/qwant.py', 199, 'raise SearxEngineCaptchaException()', 'searx.exceptions.SearxEngineCaptchaException', None, ('CAPTCHA',)) False
searxng | 2025-09-28 23:52:32,797 ERROR:searx.engines.qwant: CAPTCHA
searxng | Traceback (most recent call last):
searxng | File "/usr/local/searxng/searx/search/processors/online.py", line 160, in search
searxng | search_results = self._search_basic(query, params)
searxng | File "/usr/local/searxng/searx/search/processors/online.py", line 148, in _search_basic
searxng | return self.engine.response(response)
searxng | ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
searxng | File "/usr/local/searxng/searx/engines/qwant.py", line 159, in response
searxng | return parse_web_api(resp)
searxng | File "/usr/local/searxng/searx/engines/qwant.py", line 199, in parse_web_api
searxng | raise SearxEngineCaptchaException()
searxng | searx.exceptions.SearxEngineCaptchaException: CAPTCHA, suspended_time=86400
To Reproduce
Steps to reproduce the behavior:
- Start AgenticSeek with
./start_services.sh full - Send a query requiring web search, particularly with complex search operators like:
site:ycombinator.com/companies "YC W24" OR "YC S24" OR "YC W25" engineering
- Observe SearXNG logs showing CAPTCHA errors from multiple engines
Expected behavior
SearXNG should successfully execute searches without triggering CAPTCHA challenges. The system should:
- Properly rate-limit requests to avoid triggering anti-bot measures
- Handle search operators correctly when passing to different engines
- Have fallback mechanisms when some engines fail
Issues Identified
- DuckDuckGo Engine: Receiving CAPTCHA challenge (suspended_time=0)
- Qwant Engine: Receiving CAPTCHA challenge (suspended_time=86400 - 24 hour suspension)
- Wikipedia Engine: 400 Bad Request due to malformed query with search operators
LLM Model used
qwen/qwen3-vl-235b-a22b-instruct (via OpenRouter)
Desktop (please complete the following information):
- OS: Linux (Arch Linux 6.16.5-arch1-1)
- Browser: N/A (affects SearXNG backend)
- Version: Latest main branch (commit abc7658)
- Docker: Running in containerized environment
Additional context
- The issue appears when using complex search queries with site: operators and boolean logic
- Multiple engines are failing simultaneously, suggesting either:
- Too aggressive querying triggering rate limits
- Improper query formatting for certain engines
- Docker container IP being flagged by search providers
- Qwant has the most severe penalty with 24-hour suspension
- Wikipedia errors are due to incorrect query encoding - it's trying to search Wikipedia for Google search operators
Root Cause Analysis
-
CAPTCHA Issues: The Docker container's IP may be flagged due to:
- High query volume in short time
- Suspicious query patterns (automated behavior)
- Shared IP ranges known to hosting providers
-
Wikipedia 400 Error: The query
site:ycombinator.com/companiesis being passed directly to Wikipedia API, which doesn't understand Google search operators