10 Mar 22:07

43a8f82

v0.3.0 Latest

Latest

🧠 Semantica v0.3.0 — First Stable Release

Released: 2026-03-10 | PyPI: pip install semantica | Python: 3.8 – 3.12 | License: MIT

The first Production/Stable release of Semantica — an open-source framework for building context graphs and decision intelligence layers for AI agents. This release consolidates everything shipped across three stages: 0.3.0-alpha (2026-02-19), 0.3.0-beta (2026-03-07), and 0.3.0 stable (2026-03-10).

pip install --upgrade semantica

No breaking changes. All new parameters carry safe defaults. All new methods are purely additive.

🚦 Release Highlights

🕐 Temporal Validity — valid_from/valid_until on nodes & edges; query what's active at any point in time
🔗 Cross-Graph Navigation — link separate ContextGraph instances; navigate across them; survives save/load
⚖️ Weighted BFS Traversal — filter multi-hop queries by edge confidence with min_weight
🧠 Decision Intelligence — full lifecycle: record → causal chain → impact analysis → precedent search → policy enforcement
🔄 Delta Processing — SPARQL-based incremental graph diffs; only changed data flows through the pipeline
🗃️ Deduplication v2 — 6.98x faster semantic dedup, 63.6% faster candidate generation
📤 New Export Formats — ArangoDB AQL, Apache Parquet (Spark/BigQuery/Databricks ready)
🗄️ Graph Backends — Apache AGE, PgVector, AWS Neptune, FalkorDB
✅ 886+ tests passing — 0 failures

👥 Contributors

Contributor	Areas
@KaifAhmad1	Lead maintainer — context graph, decision intelligence, KG algorithms, semantic extraction, pipeline, provenance, bug fixes, release management
@ZohaibHassan16	Deduplication v2 suite, incremental/delta processing, benchmark suite
@Sameer6305	Apache AGE backend, PgVector store, Snowflake connector, Apache Arrow export
@tibisabau	ArangoDB AQL export, Apache Parquet export
@d4ndr4d3	ResourceScheduler deadlock fix

✨ v0.3.0 Stable — Context Graph Feature Completeness

Shipped 2026-03-10 · All changes by @KaifAhmad1

🕐 Temporal Validity Windows

Nodes and edges now carry first-class valid_from / valid_until ISO datetime fields — stored directly on the ContextNode and ContextEdge dataclasses, not buried in metadata.

New API:

add_node(valid_from=..., valid_until=...) and add_edge(valid_from=..., valid_until=...) — set validity window at creation
node.is_active(at_time=None) and edge.is_active(at_time=None) — returns True if live at the given time (defaults to now)
graph.find_active_nodes(node_type=None, at_time=None) — filters entire graph to active nodes only

Bug fixes:

is_active() crashed with TypeError on tz-aware datetime inputs — fixed by normalising to tz-naive UTC via new _parse_iso_dt() helper
Validity fields silently lost during serialisation — fixed across all four paths: add_nodes(), add_edges(), to_dict(), from_dict()

🔗 Cross-Graph Navigation

Separate ContextGraph instances can now be linked and navigated between. Links are fully durable — they survive save_to_file() / load_from_file() and reconnect via a registry.

New API:

graph.graph_id — stable UUID assigned at init; persisted to JSON
link_graph(other_graph, source_node_id, target_node_id) — creates a navigable bridge; returns link_id
navigate_to(link_id) — returns (other_graph, target_node_id)
resolve_links({graph_id: instance}) — reconnects links after load; returns count resolved
save_to_file() — now writes a links section alongside nodes and edges
load_from_file() — restores graph_id and populates _unresolved_links

Bug fix: Previous implementation auto-created marker targets as phantom "entity" nodes — fixed by pre-creating a "cross_graph_link" typed ContextNode before inserting the marker edge.

14 new tests in tests/context/test_cross_graph_navigation.py covering link creation, phantom-node prevention, partial registry resolution, and full save/load round-trips.

⚖️ Weighted Multi-Hop BFS Traversal

get_neighbors() now accepts a min_weight threshold to confine traversal to high-confidence causal links only. Default 0.0 passes all edges — fully backward-compatible.

🔧 Additional Fixes in v0.3.0 Stable

PipelineBuilder.add_step() return type annotation corrected from "PipelineBuilder" to "PipelineStep"
test_hybrid_search_performance fixed to accumulate a true search_times list; threshold relaxed to < 5.0s for real sentence-transformers latency

🔧 v0.3.0-beta — Semantic Extraction, Deduplication v2, New Export Formats

Shipped 2026-03-07

🧩 Semantic Extraction Fixes — @KaifAhmad1 (PR #354, #355)

LLM Relation Extraction:

Unmatched subjects/objects now produce a synthetic UNKNOWN entity instead of silently dropping the relation
Orphaned legacy block in _parse_relation_result that appended every relation twice has been removed
extraction_method parameter added — typed extraction paths now record "llm_typed" instead of "llm"

Reasoner Pattern Matching:

_match_pattern in reasoner.py fully rewritten — splits patterns on ?var placeholders, escapes only literal segments, uses backreferences for repeated variables and non-greedy .+? to prevent over-consumption

RDF Export Aliases:

RDFExporter now accepts "ttl", "nt", "xml", "rdf", and "json-ld" as format aliases — zero API changes

Tests added: tests/reasoning/test_reasoner.py (4 tests), tests/semantic_extract/test_relation_extractor.py (6 tests), tests/export/test_rdf_exporter.py (8 tests)

🔄 Incremental / Delta Processing — @ZohaibHassan16, @KaifAhmad1 (PR #349)

Native SPARQL-based diff between graph snapshots — only changed triples enter the pipeline
delta_mode flag in PipelineBuilder for near-real-time incremental workloads
Version snapshot management with graph URI tracking and per-snapshot metadata storage
prune_versions() for automatic retention cleanup of old snapshots

Bug fixes: corrected SPARQL variable order, fixed class references, resolved duplicate dictionary keys.

🗃️ Deduplication v2 Suite — @ZohaibHassan16, @KaifAhmad1 (PR #338, #339, #340, #344)

Three independently opt-in tiers — legacy mode remains the default, fully backward-compatible.

Candidate Generation v2 (PR #338):

New blocking_v2 and hybrid_v2 strategies replace O(N²) pair enumeration
Multi-key blocking with normalised token prefixes, type-aware keys, and optional phonetic (Soundex) matching
Deterministic max_candidates_per_entity budgeting with stable sorting
63.6% faster in worst-case scenarios (0.259s → 0.094s for 100 entities)

Two-Stage Scoring Prefilter (PR #339):

Fast gates for type mismatch, name-length ratio, and token overlap eliminate expensive semantic scoring for obvious non-matches
Configurable thresholds: min_length_ratio, min_token_overlap_ratio, required_shared_token
18–25% faster batch processing when enabled (prefilter_enabled=False by default)

Semantic Relationship Deduplication v2 (PR #340):

Canonicalisation engine with predicate synonym mapping (e.g. works_for → employed_by)
O(1) hash matching for exact canonical signatures before any semantic scoring
Weighted scoring: 60% predicate + 40% object with explainable semantic_match_score in metadata
6.98x faster than legacy mode (83ms vs 579ms)
dedup_triplets() infinite recursion bug fixed; promoted to first-class API in methods.py

Migration guide: MIGRATION_V2.md with complete examples for all v2 strategies (PR #344)

📤 New Export Formats — @tibisabau (PR #342, #343)

ArangoDB AQL Export (PR #342):

Full AQL INSERT statement generation for vertices and edges
Configurable collection names with validation and sanitisation; batch processing (default: 1000)
export_arango() convenience function; .aql auto-detection in the unified exporter
17 tests — 100% pass rate

Apache Parquet Export (PR #343):

Columnar storage with configurable compression: snappy, gzip, brotli, zstd, lz4, none
Explicit Apache Arrow schemas with type safety and field normalisation
Analytics-ready: pandas, Spark, Snowflake, BigQuery, Databricks
export_parquet() convenience function; .parquet auto-detection
25 tests — 100% pass rate

🐛 Beta Bug Fixes — @KaifAhmad1

Context module:

retrieve_decision_precedents — entity extraction correctly gated on use_hybrid_search=True
_extract_entities_from_query — switched to word[0].isupper() to capture camelCase identifiers like CreditCard
Added missing expand_context() (BFS traversal) and _get_decision_query() methods
Fixed hybrid_retrieval, dynamic_context_traversal, multi_hop_context_assembly for correct single-pass BFS
Fixed _retrieve_from_vector fallback to prevent empty content and negative similarity scores

KG module:

calculate_pagerank — added alpha/max_iter aliases; return format structured to {"centrality": scores, "rankings": sorted_list}
community_detector._to_networkx — fixed silent edge-loss when a NetworkX graph is passed directly
Added 9 domain-specific tracking methods to AlgorithmTrackerWithProvenance
Created provenance_tracker.py with ProvenanceTracker; correctly exported from semantica.kg

Pipeline module:

Retry loop fixed — now correctly iterate...

Contributors

d4ndr4d3, tibisabau, and 3 other contributors

Assets 4

07 Mar 11:28

github-actions

v0.3.0-beta

26b3b9b

v0.3.0-beta Pre-release

Pre-release

Semantica v0.3.0-beta — Release Notes

Date: 2026-03-07 | Tag: v0.3.0-beta | Status: Internal Beta (Pre-release)

Consolidates all alpha and unreleased features for internal validation ahead of the public 0.3.0 launch.

What's New

Semantic Extraction & Reasoning

Multi-Founder LLM Extraction Fix (#354) — Unmatched relation subjects/objects now produce synthetic UNKNOWN entities instead of being silently dropped; all LLM-returned co-founders preserved
Reasoner Pattern Matching Rewrite (#354) — _match_pattern correctly handles multi-word values, pre-bound variables, repeated variable backreferences, and non-greedy separators

Export

RDF / TTL Alias Fix (#355) — format="ttl", "nt", "xml", "rdf", "json-ld" all resolve without breaking existing callers
ArangoDB AQL Export (#342) — Full AQL INSERT generation for vertices and edges; configurable batching; 17 tests passing
Apache Parquet Export (#343) — Columnar storage with configurable compression (snappy, gzip, brotli, zstd, lz4); explicit Arrow schemas; 25 tests passing

Deduplication v2 (Epic #333)

Candidate Generation v2 (#338) — blocking_v2 / hybrid_v2 strategies with multi-key and phonetic blocking; 63.6% faster worst-case
Two-Stage Scoring Prefilter (#339) — Fast prefilter gates before expensive semantic scoring; 18–25% faster batch processing
Semantic Deduplication v2 (#340) — Opt-in semantic_v2 with canonicalization, O(1) hash matching, weighted scoring; 6.98x speedup; fixed infinite recursion bug
Migration Guide (#344) — MIGRATION_V2.md with full examples; 5.86x speedup confirmed; backward compatible

Incremental / Delta Processing

Delta Processing (#349) — Native SPARQL delta computation between graph snapshots; delta_mode pipeline config; prune_versions() for snapshot retention; production-ready for near real-time pipelines

Bug Fixes

NameError — missing Type import in utils/helpers.py; removed unused import from config_manager.py
Context module — fixed retrieve_decision_precedents, hybrid_retrieval, dynamic_context_traversal, multi_hop_context_assembly, _retrieve_from_vector, _extract_entities_from_query; added missing expand_context and _get_decision_query methods
Knowledge Graph module — fixed calculate_pagerank, community_detector._to_networkx, detect_communities, _build_adjacency; added ProvenanceTracker and 9 domain-specific tracking methods
Pipeline module — fixed retry loop in execution_engine; added RecoveryAction with LINEAR / EXPONENTIAL / FIXED backoff; fixed add_step return value; added validate alias
Test files — replaced emoji with ASCII for Windows cp1252 compatibility; fixed assertion ordering and loop bugs across 4 test files

Test Results

Passing	Skipped (external services)	Failed
~840	36	0

Contributors

@KaifAhmad1 · @ZohaibHassan16 · @tibisabau

Contributors

tibisabau, KaifAhmad1, and ZohaibHassan16

Assets 4

19 Feb 18:46

github-actions

v0.3.0-alpha

d5e2637

v0.3.0-alpha Pre-release

Pre-release

🎉 Semantica v0.3.0-alpha Release

This alpha release introduces comprehensive decision tracking capabilities, advanced knowledge graph algorithms, and production-ready architecture for testing.

🚀 Major Features

Decision Tracking System

Complete decision lifecycle management with audit trails
Provenance tracking and lineage management
Policy compliance and exception handling
Decision influence analysis and impact scoring

Advanced Knowledge Graph Algorithms

Node2Vec embeddings for semantic similarity
Centrality analysis (degree, betweenness, closeness, eigenvector)
Community detection and graph analytics
Path finding and link prediction

Enhanced Context Module

Unified AgentContext with granular feature flags
Decision tracking integration
Production-ready architecture with validation
GraphStore capability validation

Vector Store Features

Hybrid search combining semantic, structural, and category similarity
Advanced retrieval with configurable weights
FastEmbed integration for efficient operations

🧪 Testing & Quality

113+ tests passing across context and core modules
Comprehensive decision tracking test coverage
Enhanced error handling and edge case testing
Fixed all critical test failures for release readiness

📦 Installation

pip install semantica==0.3.0a0

Assets 4

09 Feb 07:26

github-actions

v0.2.7

affe3aa

Semantica 0.2.7

Overview

Release 0.2.7 adds Snowflake integration, Apache Arrow export, and benchmark suite.

🚀 New Features

Snowflake Connector for Data Ingestion

PR #276 by @Sameer6305

Native Snowflake connector with multi-authentication support (password, OAuth, key-pair, SSO). Includes table/query ingestion, schema introspection, and SQL injection prevention.

Tests: 24/24 passing
Dependency: db-snowflake optional

Apache Arrow Export Support

PR #273 by @Sameer6305

High-performance columnar export with explicit schemas, compression, and Pandas/DuckDB compatibility.

Tests: 20/20 passing
Dependency: db-arrow optional

Comprehensive Benchmark Suite

PR #289 by @ZohaibHassan16, @KaifAhmad1

137+ benchmarks across all modules with regression detection and CI/CD integration.

Features: Statistical analysis, environment-agnostic design, CLI tool

📊 Quality Assurance

Total Tests: 44/44 passing
Breaking Changes: None
Backward Compatible: Yes

🛠 Installation

pip install semantica==0.2.7
pip install semantica[db-snowflake,db-arrow]==0.2.7

🙏 Contributors

@Sameer6305: Snowflake Connector, Arrow Export
@ZohaibHassan16: Benchmark Suite implementation
@KaifAhmad1: Benchmark enhancements, CI/CD integration

🔗 Links

GitHub: https://github.com/Hawksight-AI/semantica
PyPI: https://pypi.org/project/semantica/
Benchmarks: python benchmarks/benchmark_runner.py

📈 Performance

Text Processing: >10,000 ops/sec
Arrow Export: 10x faster
Benchmark Coverage: 137+ tests

Thanks to all contributors for making this release possible!

Contributors

KaifAhmad1, ZohaibHassan16, and Sameer6305

Assets 4

03 Feb 05:10

github-actions

v0.2.6

a4ab3fd

Semantica v0.2.6

Release Date: February 3, 2026

We're excited to announce Semantica v0.2.6, featuring major enhancements in provenance tracking, change management, and several important bug fixes!

🎉 Highlights

Major Features

W3C PROV-O Compliant Provenance Tracking - Enterprise-grade lineage tracking across all 17 modules
Enhanced Change Management - Version control for knowledge graphs and ontologies
CSV Ingestion Improvements - Auto-detection and robust error handling
Comprehensive Test Coverage - 80-86% coverage for ingestion modules

Bug Fixes

Temperature compatibility for LLM providers
JenaStore empty graph initialization

✨ New Features & Enhancements

W3C PROV-O Compliant Provenance Tracking

PRs: #254, #246 | Contributor: @KaifAhmad1

A comprehensive provenance tracking system with W3C PROV-O compliance across all 17 Semantica modules.

Core Module:

ProvenanceManager for centralized tracking
W3C PROV-O schemas (Activity, Entity, Agent)
Storage backends: InMemory and SQLite
SHA-256 integrity verification

Module Integrations:

Semantic Extract, LLMs (Groq, OpenAI, HuggingFace, LiteLLM)
Pipeline, Context, Ingest, Embeddings
Graph/Vector/Triplet stores
Reasoning, Conflicts, Deduplication
Export, Parse, Normalize, Ontology, Visualization

Features:

Complete lineage tracking: Document → Chunk → Entity → Relationship → Graph
LLM tracking: tokens, costs, latency
Source tracking and bridge axioms for domain transformations

Compliance:

W3C PROV-O, FDA 21 CFR Part 11, SOX, HIPAA, TNFD

Testing:

237 tests covering core functionality, all 17 module integrations, edge cases, backward compatibility

Design:

Opt-in with provenance=False by default
Zero breaking changes
No new dependencies

Enhanced Change Management Module

PRs: #248, #243 | Contributor: @KaifAhmad1

Enterprise-grade version control for knowledge graphs and ontologies with persistent storage and audit trails.

Core Classes:

TemporalVersionManager - Knowledge graph versioning
OntologyVersionManager - Ontology versioning
ChangeLogEntry - Change metadata tracking

Storage:

SQLite (persistent) and in-memory backends
Thread-safe operations

Features:

SHA-256 checksums for integrity
Detailed entity/relationship diffs
Structural ontology comparison
Email validation

Compliance:

HIPAA, SOX, FDA 21 CFR Part 11
Immutable audit trails

Testing:

104 tests (100% pass)
Unit, integration, compliance, performance, edge cases

Performance:

17.6ms for 10k entities
510+ ops/sec concurrent
Handles 5k+ entity graphs

Migration:

Backward compatible
Simplified class names
Zero external dependencies

CSV Ingestion Enhancements

PR: #244 | Contributor: @saloni0318

Robust CSV parsing with auto-detection and error handling.

Features:

Auto-detect CSV encoding using chardet
Auto-detect delimiter using csv.Sniffer
Tolerant decoding and malformed-row handling (on_bad_lines='warn')
Optional chunked reading for large files
Metadata tracks detected values

Testing:

Expanded unit tests covering:
- Multiple delimiters
- Quoted/multiline fields
- Header overrides
- Chunked reading
- NaN preservation

Comprehensive Test Coverage

TextNormalizer Tests

PR: #242 | Contributor: @ZohaibHassan16

Added focused test coverage for TextNormalizer behavior across various inputs.

Integration Test Improvements

PR: #241 | Contributor: @KaifAhmad1

Introduced integration test marker
Reduced noisy warnings in ingest tests

Ingest Unit Tests

PRs: #239, #232 | Contributor: @Mohammed2372

Comprehensive unit tests for ingestion modules (file, web, and feed ingestors).

Coverage:

File scanning: local/cloud (S3/GCS/Azure)
Web ingestion: URL/sitemap/robots.txt
RSS/Atom feed parsing

Testing:

998 lines of test code
Mocked external dependencies for fast, isolated execution

Results:

file_ingestor: 86% coverage
web_ingestor: 86% coverage
feed_ingestor: 80% coverage

Covers happy paths, edge cases, and error handling.

🐛 Bug Fixes

Temperature Compatibility Fix

PRs: #256, #252 | Contributors: @F0rt1s, @IGES-Institut

Fixed hardcoded temperature=0.3 that broke compatibility with models requiring specific temperature values (e.g., gpt-5-mini).

Changes:

Added _add_if_set helper method to BaseProvider
Only passes parameters when explicitly set
When temperature=None, parameter is omitted allowing APIs to use model defaults
Updated all 5 providers: OpenAI, Groq, Gemini, Ollama, DeepSeek

Impact:

Reduced code by ~85 lines with cleaner parameter handling
Comprehensive test coverage added (10 temperature tests, all passing)
Backward compatible - no breaking changes

JenaStore Empty Graph Bug

PRs: #257, #258 | Contributor: @ZohaibHassan16

Fixed ProcessingError: Graph not initialized when operating on empty (but initialized) graphs.

Changes:

Replaced implicit if not self.graph: checks with explicit if self.graph is None: validation
Updated 5 methods: add_triplets, get_triplets, delete_triplet, execute_sparql, serialize
Properly distinguishes None (uninitialized) from empty graphs (initialized with 0 triplets)

Impact:

Unblocks benchmarking suite
Enables fresh deployments
Improves testing workflows

📦 Installation

pip install semantica==0.2.6

Or upgrade from a previous version:

pip install --upgrade semantica

🙏 Contributors

Special thanks to all contributors who made this release possible:

@KaifAhmad1 - Provenance tracking, change management, test improvements
@saloni0318 - CSV ingestion enhancements
@ZohaibHassan16 - TextNormalizer tests, JenaStore bug fix
@Mohammed2372 - Comprehensive ingest unit tests
@F0rt1s - Temperature compatibility fix
@IGES-Institut - Temperature compatibility fix

📚 Documentation

Documentation: https://semantica.readthedocs.io
GitHub: https://github.com/Hawksight-AI/semantica
PyPI: https://pypi.org/project/semantica/

🔗 Links

🚀 What's Next?

Stay tuned for upcoming features in future releases. Check our GitHub Issues to see what we're working on!

Full Changelog: v0.2.5...v0.2.6

Contributors

F0rt1s, Mohammed2372, and 4 other contributors

Assets 4

27 Jan 16:26

github-actions

v0.2.5

3968a45

Deep Extraction, BYOM & Pinecone Support (v0.2.5)

Semantica v0.2.5

🚀 Release Highlights

This release brings native Pinecone Vector Store support, configurable LLM retry logic, and major enhancements to the Semantic Extraction module, including robust support for custom Hugging Face models (BYOM), improved NER/Relation extraction, and completed Triplet extraction logic.

🌟 New Features

Pinecone Vector Store Support

Implemented native PineconeStore with full CRUD capabilities.
Support for serverless and pod-based indexes, namespaces, and metadata filtering.
Fully integrated with the unified VectorStore interface and registry.
(Closes #219, Resolves #220)

Configurable LLM Retry Logic

Exposed max_retries parameter in NERExtractor, RelationExtractor, and TripletExtractor.
Defaults to 3 retries to handle JSON validation failures or API timeouts gracefully.
Propagated retry configuration through chunked processing helpers for consistent long-document handling.

Bring Your Own Model (BYOM) Support

Custom Hugging Face Models: Enabled full support for custom models in NERExtractor, RelationExtractor, and TripletExtractor.
Custom Tokenizers: Added support for models with non-standard tokenization requirements.
Runtime Overrides: extract(model=...) now correctly overrides configuration defaults.

Enhanced Extraction Capabilities

NER: Added configurable aggregation strategies (simple, first, average, max) and robust IOB/BILOU parsing.
Relation Extraction: Implemented standard entity marker techniques (<subj>, <obj>) and structured output parsing.
Triplet Extraction: Added specialized parsing for Seq2Seq models (e.g., REBEL) to generate structured triplets directly from text.

🐛 Bug Fixes

LLM Extraction Stability: Fixed infinite retry loops by strictly enforcing max_retries limits.
Model Parameter Precedence: Resolved issues where config defaults overrode runtime arguments.
Import Handling: Fixed circular import issues in test suites via improved mocking strategies.

📦 Installation

pip install semantica==0.2.5

Assets 6

22 Jan 07:20

github-actions

v0.2.4

b382a7d

Semantica v0.2.4

Added

Ontology Ingestion Module:
- Implemented OntologyIngestor for parsing RDF/OWL files (Turtle, RDF/XML, JSON-LD, N3).
- Added ingest_ontology and unified ingest(source_type="ontology") interface.
- Added recursive directory scanning for batch ontology ingestion.
- Added OntologyData dataclass for consistent metadata.
Documentation:
- Updated ontology_usage.md and ontology.md with usage examples and API details.
Tests:
- Added comprehensive test suite tests/ingest/test_ontology_ingestor.py.
- Added examples/demo_ontology_ingest.py for end-to-end demonstration.

Assets 4

20 Jan 06:39

github-actions

v0.2.3

fa8544c

Semantica v0.2.3

We are excited to announce Semantica v0.2.3! This release focuses on stability, performance, and developer experience improvements, including critical fixes for LLM relation extraction, high-performance vector store ingestion, and resolved circular dependencies.

🚀 Added

Vector Store High-Performance Ingestion

New add_documents API: Added high-throughput ingestion with automatic embedding generation, batching, and parallel processing.
embed_batch Helper: Efficiently generate embeddings for lists of texts without immediate storage.
Parallel Defaults: Enabled default parallel ingestion in VectorStore (default: max_workers=6) for faster processing.
Documentation: Added dedicated guide docs/vector_store_usage.md for high-performance configuration.
Tests: Added tests/vector_store/test_vector_store_parallel.py covering parallel vs. sequential performance and edge cases.

Amazon Neptune Dev Environment

CloudFormation Template: Added cookbook/introduction/neptune-setup.yaml to provision a development Neptune cluster with public endpoints and IAM auth.
Documentation: Updated cookbook/introduction/21_Amazon_Neptune_Store.ipynb with deployment guides, cost estimates, and IAM best practices.
Linting: Added cfn-lint to pre-commit hooks for CloudFormation validation.

Comprehensive Test Suite

Unit Tests: Added tests/test_relations_llm.py covering typed and structured response paths for relation extraction.
Integration Tests: Added tests/integration/test_relations_groq.py for real Groq API validation.

🐛 Fixed

LLM Relation Extraction Parsing

Zero Relations Fix: Resolved issue where relation extraction returned zero results despite successful API calls.
Response Normalization: Normalized typed responses from Instructor/OpenAI/Groq to a consistent dictionary format.
JSON Fallback: Added structured JSON fallback when typed generation yields empty results.
Parameter Cleanup: Removed unsupported kwargs (max_tokens, max_entities_prompt) from internal calls to prevent API errors.

Pipeline Circular Import

Resolved Import Cycles: Fixed circular dependency between pipeline_builder and pipeline_validator (Issues #192, #193).
Lazy Loading: Implemented lazy loading for PipelineValidator to ensure stable imports.

JupyterLab Stability

Progress Output Control: Added SEMANTICA_DISABLE_JUPYTER_PROGRESS environment variable.
Memory Fix: Fallback to console-style output when enabled to prevent JupyterLab out-of-memory errors from infinite scrolling tables (Issue #181).

⚡ Changed

Relation Extraction API

Simplified Interface: Removed unused kwargs to prevent parameter leakage.
Better Debugging: Improved error handling and verbose logging for extraction workflows.
Robust Parsing: Enhanced post-response parsing stability across different LLM providers.

Vector Store Defaults

Standardized Concurrency: Set default max_workers=6 for VectorStore parallel ingestion.
Simplified Usage: Updated documentation to rely on smart defaults rather than manual configuration.

Assets 4

14 Jan 19:13

github-actions

v0.2.2

c6316ba

Semantica 0.2.2

Highlights

High-throughput parallel extraction engine across all core extractors.
Major performance improvements (~1.89x speedup) for real-world extraction workloads.
Stronger security hygiene in examples and caching.
Updated Gemini SDK integration and dependency constraints for more stable installs.

Added

Parallel Extraction Engine
- Implemented parallel batch processing across all core extractors:
  - NERExtractor, RelationExtractor, TripletExtractor
  - EventDetector, SemanticNetworkExtractor
- Added max_workers parameter to all extractor extract() methods so users can tune concurrency based on CPU or rate limits.
- Enabled parallel chunking for large documents in:
  - _extract_entities_chunked
  - _extract_relations_chunked
- Enhanced ProgressTracker to be thread-safe for concurrent batch updates.
Semantic Extract Performance & Regression
- Added regression suite for:
  - Max worker defaults
  - LLM prompt entity filtering
  - Extractor reuse scenarios
- Added a runnable benchmark script for batch latency across:
  - NERExtractor, RelationExtractor, TripletExtractor
  - EventDetector, SemanticAnalyzer, SemanticNetworkExtractor
- Added Groq LLM smoke tests for entities/relations/triplets when GROQ_API_KEY is set.

Security

Credential Sanitization
- Removed hardcoded API keys from 8 cookbook notebooks to prevent secret leakage.
- Enforced environment variable usage for GROQ_API_KEY across all examples.
Secure Caching
- Updated ExtractionCache to exclude sensitive parameters (api_key, token, password, etc.) from cache keys, enabling safe cache sharing.
- Upgraded cache key hashing from MD5 to SHA-256 for stronger collision resistance.

Changed

Gemini SDK Migration
- Migrated GeminiProvider to the new google-genai SDK (v0.1.0+) to address deprecations.
- Added graceful fallback to google.generativeai for backward compatibility.
Dependency Resolution
- Pinned opentelemetry-api and opentelemetry-sdk to 1.37.0 to resolve pip conflicts.
- Updated protobuf and grpcio constraints for better stability.
Entity Filtering Scope
- Removed entity filtering from non-LLM extraction flows to avoid accuracy regressions.
- Limited entity downselection to LLM relation prompt construction, while still matching returned entities against the full original list.
Batch Concurrency Defaults
- Standardized max_workers defaults across semantic_extract:
  - ML-backed methods default to single worker.
  - Pattern/regex/rules/LLM/HuggingFace methods use higher parallelism, capped by CPU.
- Increased global optimization.max_workers default to 8 for better throughput on batch workloads.

Performance

Bottleneck Optimization (GitHub Issue #186)
- Replaced sequential loops with parallel execution for:
  - Document-level batches
  - Intra-document chunks
- Achieved roughly 1.89x speedup in real-world extraction scenarios
  (benchmarked with Groq llama-3.3-70b-versatile).
Low-Latency Entity Matching
- Reduced reliance on heavyweight embedding stacks for common cases by:
  - Improving fast matching heuristics.
  - Short-circuiting before embedding similarity when possible.
- Optimized entity matching to:
  - Prefer exact / substring / word-boundary matches.
  - Only fall back to embedding similarity when necessary, reducing CPU overhead.

Assets 4

12 Jan 12:36

KaifAhmad1

v0.2.1

ccaadf6

# Release v0.2.1: Stability Fixes

🚀 Summary

This release addresses critical LLM extraction failures on long documents (Bug #176) and fixes the Earnings Call Analysis cookbook (Bug #177).

🛠 Key Changes

LLM Stability (Fixes #176):
- Solved incomplete JSON outputs by correctly propagating max_tokens.
- Added auto-retry with reduced chunk sizes when token limits are hit.
- Standardized default chunk size to 64k for Groq, OpenAI, and Anthropic.
Cookbook Fix (Fixes #177):
- Resolved TypeError in 03_Earnings_Call_Analysis.ipynb by fixing SourceReference usage.
Improvements:
- Added max_completion_tokens support for newer provider APIs.
- Removed hardcoded length constraints from semantic classes.

✅ Verification

Verified max_tokens propagation and error handling via new tests.
Validated Groq Llama 3.3 70B integration manually.
PyPI Release: Successfully built and uploaded semantica-0.2.1.

Assets 4

Uh oh!

Releases: Hawksight-AI/semantica

v0.3.0

🧠 Semantica v0.3.0 — First Stable Release

🚦 Release Highlights

👥 Contributors

✨ v0.3.0 Stable — Context Graph Feature Completeness

🕐 Temporal Validity Windows

🔗 Cross-Graph Navigation

⚖️ Weighted Multi-Hop BFS Traversal

🔧 Additional Fixes in v0.3.0 Stable

🔧 v0.3.0-beta — Semantic Extraction, Deduplication v2, New Export Formats

🧩 Semantic Extraction Fixes — @KaifAhmad1 (PR #354, #355)

🔄 Incremental / Delta Processing — @ZohaibHassan16, @KaifAhmad1 (PR #349)

🗃️ Deduplication v2 Suite — @ZohaibHassan16, @KaifAhmad1 (PR #338, #339, #340, #344)

📤 New Export Formats — @tibisabau (PR #342, #343)

🐛 Beta Bug Fixes — @KaifAhmad1

Contributors

Uh oh!

v0.3.0-beta

Semantica v0.3.0-beta — Release Notes

What's New

Semantic Extraction & Reasoning

Export

Deduplication v2 (Epic #333)

Incremental / Delta Processing

Bug Fixes

Test Results

Contributors

Contributors

Uh oh!

v0.3.0-alpha

🎉 Semantica v0.3.0-alpha Release

🚀 Major Features

Decision Tracking System

Advanced Knowledge Graph Algorithms

Enhanced Context Module

Vector Store Features

🧪 Testing & Quality

📦 Installation

Uh oh!

Semantica 0.2.7

Overview

🚀 New Features

Snowflake Connector for Data Ingestion

Apache Arrow Export Support

Comprehensive Benchmark Suite

📊 Quality Assurance

🛠 Installation

🙏 Contributors

🔗 Links

📈 Performance

Contributors

Uh oh!

Semantica v0.2.6

Semantica v0.2.6

🎉 Highlights

Major Features

Bug Fixes

✨ New Features & Enhancements

W3C PROV-O Compliant Provenance Tracking

Enhanced Change Management Module

CSV Ingestion Enhancements

Comprehensive Test Coverage

TextNormalizer Tests

Integration Test Improvements

Ingest Unit Tests

🐛 Bug Fixes

Temperature Compatibility Fix

JenaStore Empty Graph Bug

📦 Installation

🙏 Contributors

📚 Documentation

🔗 Links

🚀 What's Next?

Contributors

Uh oh!

Deep Extraction, BYOM & Pinecone Support (v0.2.5)

Semantica v0.2.5

🚀 Release Highlights