Skip to content

test(ingest): add unit tests for file, web, and feed ingestors#239

Merged
KaifAhmad1 merged 1 commit intoHawksight-AI:mainfrom
Mohammed2372:test/ingest-unit-tests
Jan 28, 2026
Merged

test(ingest): add unit tests for file, web, and feed ingestors#239
KaifAhmad1 merged 1 commit intoHawksight-AI:mainfrom
Mohammed2372:test/ingest-unit-tests

Conversation

@Mohammed2372
Copy link
Copy Markdown
Contributor

Description

This PR addresses issue #232 by adding comprehensive unit tests for the ingestion modules (semantica.ingest). It improves code coverage and ensures reliability for file, web, and feed ingestion strategies. The tests cover happy paths, edge cases, and error handling while mocking external dependencies to ensure fast and isolated execution.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring

Related Issues

Fixes #232

Changes Made

  • Created tests/ingest/test_file_ingestor.py: Covers local file scanning, recursion, filtering, and cloud storage (S3/GCS/Azure) logic.
  • Created tests/ingest/test_web_ingestor.py: Covers URL ingestion, sitemap crawling, metadata extraction, and robots.txt compliance.
  • Created tests/ingest/test_feed_ingestor.py: Covers RSS/Atom parsing, date normalization, and feed discovery.
  • Implemented sys.modules patching to mock optional cloud dependencies (boto3, google-cloud-storage, azure-storage-blob) so tests pass without requiring installation.

Testing

  • Tested locally
  • Added tests for new functionality
  • Package builds successfully (python -m build)

Test Commands

# Verify all tests pass
pytest tests/ingest/

# Check coverage
pytest --cov=semantica tests/ingest/

Coverage Results:

  • file_ingestor.py: 86%
  • web_ingestor.py: 86%
  • feed_ingestor.py: 80%

Documentation

  • Updated relevant documentation
  • Added code examples if applicable
  • Updated API reference if adding new APIs
  • Updated cookbook if adding new examples
  • No documentation changes needed

Breaking Changes

Breaking Changes: [No]

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • Package builds successfully

Additional Notes

The tests use unittest.mock extensively to avoid making real network requests or requiring cloud credentials. sys.modules is patched in test_file_ingestor.py to handle environments where optional cloud SDKs are not installed.

@KaifAhmad1 KaifAhmad1 merged commit 15b32f4 into Hawksight-AI:main Jan 28, 2026
1 check passed
@Mohammed2372 Mohammed2372 deleted the test/ingest-unit-tests branch January 28, 2026 11:11
KaifAhmad1 added a commit that referenced this pull request Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants