test(ingest): add unit tests for file, web, and feed ingestors#239
Merged
KaifAhmad1 merged 1 commit intoHawksight-AI:mainfrom Jan 28, 2026
Merged
Conversation
KaifAhmad1
added a commit
that referenced
this pull request
Jan 28, 2026
KaifAhmad1
added a commit
that referenced
this pull request
Jan 28, 2026
KaifAhmad1
added a commit
that referenced
this pull request
Feb 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR addresses issue #232 by adding comprehensive unit tests for the ingestion modules (
semantica.ingest). It improves code coverage and ensures reliability for file, web, and feed ingestion strategies. The tests cover happy paths, edge cases, and error handling while mocking external dependencies to ensure fast and isolated execution.Type of Change
Related Issues
Fixes #232
Changes Made
tests/ingest/test_file_ingestor.py: Covers local file scanning, recursion, filtering, and cloud storage (S3/GCS/Azure) logic.tests/ingest/test_web_ingestor.py: Covers URL ingestion, sitemap crawling, metadata extraction, androbots.txtcompliance.tests/ingest/test_feed_ingestor.py: Covers RSS/Atom parsing, date normalization, and feed discovery.sys.modulespatching to mock optional cloud dependencies (boto3,google-cloud-storage,azure-storage-blob) so tests pass without requiring installation.Testing
python -m build)Test Commands
Coverage Results:
Documentation
Breaking Changes
Breaking Changes: [No]
Checklist
Additional Notes
The tests use
unittest.mockextensively to avoid making real network requests or requiring cloud credentials.sys.modulesis patched intest_file_ingestor.pyto handle environments where optional cloud SDKs are not installed.