feat(document_loaders): add SupadataLoader #9516
Open
+282
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a new
SupadataLoaderto@langchain/community’s document loaders.The loader is a thin wrapper around the official
@supadata/jsSDK and supports two Supadata operations:transcript– fetches text transcripts for a URL (e.g. YouTube).metadata– fetches structured metadata for a URL (YouTube viayoutube.video, other URLs viaweb.scrape).Implementation details:
libs/langchain-community/src/document_loaders/web/supadata.ts.SupadataLoaderParamsinterface with:urls(required array of URLs),operation("transcript"|"metadata", default"transcript"),lang,text,mode,paramsfor arbitrary extra Supadata options.apiKeyconstructor param when provided.SUPADATA_API_KEYenvironment variable viagetEnvironmentVariable.Documentinstances with:metadata.sourceset to the URL,metadata.supadataOperationset to"transcript","transcript_job", or"metadata",pageContentis the transcript text (or a message indicating an async job withjobId).Usage
Metadata example:
Tests
From
libs/langchain-community:pnpm test -- supadataThis runs
src/document_loaders/tests/supadata.test.ts, which:@supadata/jsand verifies the Supadata client is constructed with the correctapiKey.transcriptresponses are converted intoDocumentobjects with the expectedpageContent.metadataresponses are converted intoDocumentobjects with the expectedpageContentandmetadata.supadataOperation.All tests are currently passing.
Issue
N/A – new integration.
Dependencies
@supadata/jsas a dev dependency for@langchain/communityto support unit tests.@supadata/jsat runtime (await import("@supadata/js")), so users must install@supadata/jsin their own project to use this loader.