Skip to content

Conversation

@MUDASSIR-75
Copy link

Description

This PR adds a new SupadataLoader to @langchain/community’s document loaders.

The loader is a thin wrapper around the official @supadata/js SDK and supports two Supadata operations:

  • transcript – fetches text transcripts for a URL (e.g. YouTube).
  • metadata – fetches structured metadata for a URL (YouTube via youtube.video, other URLs via web.scrape).

Implementation details:

  • Loader lives at: libs/langchain-community/src/document_loaders/web/supadata.ts.
  • Exposes a typed SupadataLoaderParams interface with:
    • urls (required array of URLs),
    • operation ("transcript" | "metadata", default "transcript"),
    • lang, text, mode,
    • params for arbitrary extra Supadata options.
  • API key resolution:
    • Uses the apiKey constructor param when provided.
    • Fallback to the SUPADATA_API_KEY environment variable via getEnvironmentVariable.
  • Returns LangChain Document instances with:
    • metadata.source set to the URL,
    • metadata.supadataOperation set to "transcript", "transcript_job", or "metadata",
    • for transcripts, pageContent is the transcript text (or a message indicating an async job with jobId).

Usage

import { SupadataLoader } from "@langchain/community/document_loaders/web/supadata";

const loader = new SupadataLoader({
  urls: ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
  operation: "transcript",
  apiKey: process.env.SUPADATA_API_KEY,
  lang: "en",
  text: true,
  mode: "auto",
});

const docs = await loader.load();
console.log(docs[0].pageContent);

Metadata example:

const loader = new SupadataLoader({
  urls: ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
  operation: "metadata",
  apiKey: process.env.SUPADATA_API_KEY,
});

const [doc] = await loader.load();
console.log(doc.metadata.supadataOperation); // "metadata"
console.log(doc.pageContent); // JSON string with Supadata metadata

Tests

From libs/langchain-community:

pnpm test -- supadata

This runs src/document_loaders/tests/supadata.test.ts, which:

  • Mocks @supadata/js and verifies the Supadata client is constructed with the correct apiKey.
  • Asserts that transcript responses are converted into Document objects with the expected pageContent.
  • Asserts that metadata responses are converted into Document objects with the expected pageContent and metadata.supadataOperation.

All tests are currently passing.

Issue

N/A – new integration.

Dependencies

  • Adds @supadata/js as a dev dependency for @langchain/community to support unit tests.
  • The loader dynamically imports @supadata/js at runtime (await import("@supadata/js")), so users must install @supadata/js in their own project to use this loader.
@changeset-bot
Copy link

changeset-bot bot commented Nov 29, 2025

⚠️ No Changeset found

Latest commit: 2d75c01

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added community Issues related to `@langchain/community` pkg:@langchain/community labels Nov 29, 2025
@MUDASSIR-75
Copy link
Author

Hi @hntrl, could you please help approve the workflows for this PR? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Issues related to `@langchain/community` pkg:@langchain/community

1 participant