Closed
Description
Checkboxes for prior research
- I've gone through Developer Guide and API reference
- I've checked AWS Forums and StackOverflow.
- I've searched for previous similar issues and didn't find any solution.
Describe the bug
When uploading a large file (~500 MB) to S3 using the @aws-sdk/lib-storage package, the upload fails with the error:
The XML you provided was not well-formed or did not validate against our published schema
The issue appears to occur during a multipart upload, specifically when using a Readable stream as the file's body. Smaller files upload successfully, and the problem seems related to the interaction between the stream provided and the configured partSize. When I set chunk size over 500 MB it works without any problem.
/**
* Upload a file to S3 in buffered chunks (~5 MB) with tagging.
*/
export async function uploadFileToS3(
bucket: string,
file: FileDTO,
tags?: S3UploadTags,
) {
const functionName = "uploadFileToS3";
const s3Client = S3CLIENT_NEW;
log.info(`[${functionName}] Uploading file to S3: ${file.displayName}`);
const stream : ReadableStream<Uint8Array> = await getPDFFromExternalStorage(file.url, file.uuid);
const finalTags: S3UploadTags = tags || {
ConversionStatus: ConversionStatusEnum.PENDING,
OCRStatus: OCRStatusEnum.SKIPPED,
};
const tagSet = Object.entries(finalTags).map(([Key, Value]) => ({
Key,
Value,
}));
console.log("Uploading file to S3 with file size: ", file.size);
if (!stream) {
throw new Error("Stream is empty");
}
const uploadParamsPDF = {
Bucket: bucket,
Key: file.uuid,
Body: stream,
ContentType: file.contentType,
ContentLength: file.size,
Metadata: {
title: file.displayName,
},
Tagging: tagSet.map((tag) => `${tag.Key}=${tag.Value}`).join("&"),
};
const uploadToS3 = new Upload({
client: s3Client,
params: uploadParamsPDF,
queueSize: 4,
partSize: 50 * 1024 * 1024 // 50MB
});
uploadToS3.on("httpUploadProgress", (progress) => {
log.debug(`Progress event: ${progress.loaded}/${progress.total} bytes`);
});
uploadToS3.addListener("error", (err) => {
log.error(`Upload error: ${err}`);
});
console.log("Uploading file to S3");
const result = await uploadToS3.done();
}
Regression Issue
- Select this option if this issue appears to be a regression.
SDK version number
@aws-sdk/lib-storage@3.474.0
Which JavaScript Runtime is this issue in?
Node.js
Details of the browser/Node.js/ReactNative version
v23.5.0
Reproduction Steps
const s3Client = new S3Client({ region: "us-east-1" });
export async function uploadFileToS3(bucket, file) {
// Simulate a Readable stream (replace this with your actual stream source)
const stream = new Readable({
read() {
this.push(Buffer.alloc(1024 * 1024 * 500));
this.push(null);
},
});
if (!stream) {
throw new Error("Stream is empty");
}
const uploadParams = {
Bucket: bucket,
Key: file.uuid,
Body: stream, // The stream to upload
ContentType: "application/pdf", // Simplified content type
ContentLength: file.size, // Expected file size
};
const upload = new Upload({
client: s3Client,
params: uploadParams,
queueSize: 4,
partSize: 5 * 1024 * 1024,
});
upload.on("httpUploadProgress", (progress) => {
console.log(`Progress: ${progress.loaded}/${progress.total} bytes`);
});
try {
console.log("Uploading file to S3...");
const result = await upload.done();
console.log("Upload complete:", result);
} catch (err) {
console.error("Upload error:", err);
}
}
Observed Behavior
[handleConversion] pUCV6phH120ZruXxV3uzqXzBvQgWHwnGcVK5hrGm: Error during conversion process The XML you provided was not well-formed or did not validate against our published schema
Error: The XML you provided was not well-formed or did not validate against our published schema
at Worker.<anonymous> (file:///app/dist/alexandria/server.js:1061:16)
at Worker.emit (node:events:507:28)
at MessagePort.<anonymous> (node:internal/worker:267:53)
at [nodejs.internal.kHybridDispatch] (node:internal/event_target:827:20)
at MessagePort.<anonymous> (node:internal/per_context/messageport:23:28)
at MessagePort.callbackTrampoline (node:internal/async_hooks:130:17)
Expected Behavior
I expected the file to be uploaded in chunks, and assembled by s3.
Possible Solution
No response
Additional Information/Context
No response