-
Notifications
You must be signed in to change notification settings - Fork 897
Fix: Arbitrary File Read Vulnerability + Feat: Security settings #1991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@jjmachan I have prepared a comprehensive fix for the vulnerability and would appreciate your review. Please let me know if you have any questions. While I have conducted preliminary functional testing, I would value your thorough testing to ensure all features remain intact and nothing breaks. I will update the pull request with the CVE-ID once it is assigned by MITRE, and you can proceed with merging the fix afterward 😃 |
hey @adithyan-ak thanks again man 🙂. I'll do the testing on my end to get this merged in this week so that we can do a release next week along with a couple of PRs we have pending as well. |
This PR introduces checks to mitigate Server-Side Request Forgery (SSRF) vulnerabilities when fetching remote images via URLs. Changes: - Before fetching http/https URLs in _download_validate_and_encode, the hostname is now resolved to its IP address(es) using socket.getaddrinfo. - Each resolved IP address is validated using the ipaddress library against a configurable policy (DISALLOWED_IP_CHECKS). - By default, requests to loopback (127.0.0.1, ::1), private (10.x, 172.16.x, 192.168.x), link-local (169.254.x.x), and reserved IPs are blocked. - A configuration flag ALLOW_INTERNAL_TARGETS (default: False) exists but disabling this check is strongly discouraged.
Explanation of ChangesSSRF Protection via IP Address Validation:
Limitations and Considerations
|
…mpt.py Added explicit binascii import and fixed the exception handler to use binascii.Error instead of base64.binascii.Error, resolving type checking errors. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Arbitrary File Read Vulnerability Fix + Security Enhancements
Summary
This document outlines the security vulnerabilities identified in the previous implementation of image handling within the
ImageTextPromptValue
class and details the remediation steps taken in the refactored code. The core issues revolved around insufficient input validation and unsafe handling of user-provided strings, leading to Arbitrary File Read (LFI), Server-Side Request Forgery (SSRF), and potential Denial of Service (DoS) vulnerabilities.Identified Vulnerabilities (Original Code)
Arbitrary File Read (LFI):
file://
URL andurlopen
: Theis_valid_url
check allowedfile://
schemes if anetloc
(hostname likelocalhost
) was provided.mimetypes.guess_type
could be tricked by appending an image extension in a URL fragment (e.g.,#fake.jpg
), whichurlopen
ignores when accessing the file system. This allowed reading arbitrary files accessible to the application user (e.g.,file://localhost/etc/passwd#fake.jpg
).open()
(Potential): Although the primary exploit usedurlopen
, the code path involvingencode_image_to_base64(item)
directly calledopen(item)
. If an attacker could bypass theis_valid_url
check but still trickmimetypes.guess_type
(e.g., using path manipulation if not properly handled beforeopen
), this path could also lead to LFI. Null bytes (\0
) were blocked by Python'sopen()
, but other techniques might exist depending on the OS and context.Server-Side Request Forgery (SSRF):
http://
/https://
URL andurlopen
: The code allowedhttp
andhttps
URLs viais_valid_url
. Ifmimetypes.guess_type
passed (e.g., URL ending in.jpg
), thedownload_and_encode_image
function would useurllib.request.urlopen
to fetch the URL. Attackers could provide URLs pointing to internal network services, cloud metadata endpoints (like AWS169.254.169.254
), or external servers, making the application act as a proxy. The fetched content was then base64 encoded and potentially returned, facilitating data exfiltration.Denial of Service (DoS):
/dev/zero
) using thefile://
LFI could exhaust memory or CPU.Weak Input Validation:
mimetypes.guess_type
based on file extensions in user-controlled strings is fundamentally insecure. It does not verify the actual content.is_valid_url
only checked for scheme and netloc presence, not which schemes were safe or allowed.is_base64
check was basic and could potentially match non-image base64 data.Remediation Strategy Implemented
The refactored code abandons the flawed
is_image
logic and implements a secure processing pipeline (_securely_process_item
) with the following principles:Explicit Input Type Handling: The code now explicitly checks for distinct input formats in a specific order:
data:image/...;base64,...
)http://
,https://
)Secure Base64 Handling:
DATA_URI_REGEX
) to strictly match the expecteddata:image/...;base64,...
format.Secure URL Fetching:
ALLOWED_URL_SCHEMES
(default:http
,https
) are processed.file://
is disallowed by default.requests
library instead ofurllib.request
for easier configuration of timeouts (REQUESTS_TIMEOUT_SECONDS
) and streaming downloads.Content-Length
header and enforcesMAX_DOWNLOAD_SIZE_BYTES
during streaming download to prevent DoS.Pillow
library (Image.open
,img.verify()
) to ensure it is actually a valid image file before encoding and returning. This prevents processing malicious non-image files delivered via allowed URLs.data:
URI is derived from the actual image format identified by Pillow, not guessed from the URL.Secure Local File Handling (Optional & Default Disabled):
ALLOW_LOCAL_FILE_ACCESS
isFalse
. Must be explicitly enabled and configured with extreme care.ALLOWED_IMAGE_BASE_DIR
. User input is treated as relative to this directory.os.path.abspath
andos.path.commonprefix
to rigorously ensure the resolved file path remains within theALLOWED_IMAGE_BASE_DIR
, preventing directory traversal attacks (../
).MAX_LOCAL_FILE_SIZE_BYTES
usingos.path.getsize
before reading the file.Removal of Insecure Functions: The original, vulnerable methods (
is_image
,get_image
,is_base64
,is_valid_url
,encode_image_to_base64
,download_and_encode_image
) have been removed or replaced by the secure processing logic.Required Configuration & Dependencies
requests
andPillow
:ALLOWED_URL_SCHEMES
MAX_DOWNLOAD_SIZE_BYTES
REQUESTS_TIMEOUT_SECONDS
ALLOW_LOCAL_FILE_ACCESS = True
.ALLOWED_IMAGE_BASE_DIR
to the absolute path of the only directory allowed for image loading. Ensure this directory has appropriate permissions and contains no sensitive files.MAX_LOCAL_FILE_SIZE_BYTES
if needed.Conclusion
The refactored code significantly enhances security by replacing insecure pattern matching and uncontrolled resource fetching with explicit validation, strict policy enforcement (schemes, paths, sizes), and content verification using trusted libraries. This approach mitigates the identified LFI, SSRF, and DoS vulnerabilities. Remember to keep dependencies (
requests
,Pillow
) updated to patch potential vulnerabilities within them.