Skip to content

S3 Upload Fails with "The XML you provided was not well-formed or did not validate against our published schema" #6849

Closed
@arthikek

Description

@arthikek

Checkboxes for prior research

Describe the bug

When uploading a large file (~500 MB) to S3 using the @aws-sdk/lib-storage package, the upload fails with the error:

The XML you provided was not well-formed or did not validate against our published schema

The issue appears to occur during a multipart upload, specifically when using a Readable stream as the file's body. Smaller files upload successfully, and the problem seems related to the interaction between the stream provided and the configured partSize. When I set chunk size over 500 MB it works without any problem.

/**
 * Upload a file to S3 in buffered chunks (~5 MB) with tagging.
 */
export async function uploadFileToS3(
  bucket: string,
  file: FileDTO,
  tags?: S3UploadTags,
) {
  const functionName = "uploadFileToS3";
  const s3Client = S3CLIENT_NEW;
  log.info(`[${functionName}] Uploading file to S3: ${file.displayName}`);

  const stream : ReadableStream<Uint8Array>  = await getPDFFromExternalStorage(file.url, file.uuid);


  const finalTags: S3UploadTags = tags || {
    ConversionStatus: ConversionStatusEnum.PENDING,
    OCRStatus: OCRStatusEnum.SKIPPED,
  };

  const tagSet = Object.entries(finalTags).map(([Key, Value]) => ({
    Key,
    Value,
  }));


  console.log("Uploading file to S3 with file size: ", file.size);

  if (!stream) {
    throw new Error("Stream is empty");
  }

  const uploadParamsPDF = {
    Bucket: bucket,
    Key: file.uuid,
    Body: stream,
    ContentType: file.contentType,
    ContentLength: file.size,
    Metadata: {
      title: file.displayName,
    },
    Tagging: tagSet.map((tag) => `${tag.Key}=${tag.Value}`).join("&"),
  };

  const uploadToS3 = new Upload({
    client: s3Client,
    params: uploadParamsPDF,
    queueSize: 4,
    partSize: 50 * 1024 * 1024 // 50MB
  });


  uploadToS3.on("httpUploadProgress", (progress) => {
    log.debug(`Progress event: ${progress.loaded}/${progress.total} bytes`);
  });

  uploadToS3.addListener("error", (err) => {
    log.error(`Upload error: ${err}`);
  });

  console.log("Uploading file to S3");
  const result = await uploadToS3.done();


}

Regression Issue

  • Select this option if this issue appears to be a regression.

SDK version number

@aws-sdk/lib-storage@3.474.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v23.5.0

Reproduction Steps

const s3Client = new S3Client({ region: "us-east-1" });


export async function uploadFileToS3(bucket, file) {
  // Simulate a Readable stream (replace this with your actual stream source)
  const stream = new Readable({
    read() {
      this.push(Buffer.alloc(1024 * 1024 * 500)); 
      this.push(null); 
    },
  });

  if (!stream) {
    throw new Error("Stream is empty");
  }

  const uploadParams = {
    Bucket: bucket,
    Key: file.uuid,
    Body: stream, // The stream to upload
    ContentType: "application/pdf", // Simplified content type
    ContentLength: file.size, // Expected file size
  };

  const upload = new Upload({
    client: s3Client,
    params: uploadParams,
    queueSize: 4,
    partSize: 5 * 1024 * 1024, 
  });

  upload.on("httpUploadProgress", (progress) => {
    console.log(`Progress: ${progress.loaded}/${progress.total} bytes`);
  });

  try {
    console.log("Uploading file to S3...");
    const result = await upload.done();
    console.log("Upload complete:", result);
  } catch (err) {
    console.error("Upload error:", err);
  }
}

Observed Behavior

[handleConversion] pUCV6phH120ZruXxV3uzqXzBvQgWHwnGcVK5hrGm: Error during conversion process The XML you provided was not well-formed or did not validate against our published schema
Error: The XML you provided was not well-formed or did not validate against our published schema
    at Worker.<anonymous> (file:///app/dist/alexandria/server.js:1061:16)
    at Worker.emit (node:events:507:28)
    at MessagePort.<anonymous> (node:internal/worker:267:53)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:827:20)
    at MessagePort.<anonymous> (node:internal/per_context/messageport:23:28)
    at MessagePort.callbackTrampoline (node:internal/async_hooks:130:17)

Expected Behavior

I expected the file to be uploaded in chunks, and assembled by s3.

Possible Solution

No response

Additional Information/Context

No response

Metadata

Metadata

Assignees

Labels

bugThis issue is a bug.closed-for-stalenessp2This is a standard priority issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions