Improve error handling for FAISS index and metadata operations #942

veerababu1729 · 2025-08-03T10:09:15Z

Title:
Improve Error Handling for FAISS Index and Metadata Operations
PATH : src/vanna/faiss/faiss.py

Description:
This PR adds robust error handling to the remove_training_data and remove_collection methods in the FAISS integration. Previously, failures during index rebuilding or metadata saving (e.g., disk errors, permission issues, corrupted files) would either fail silently or return False without diagnostic information. This made debugging and reliability difficult.

Changes:

Wrap critical FAISS index and metadata operations in try/except blocks.
Log errors to the console for visibility.
Raise a DependencyError if saving the FAISS index or metadata fails, surfacing issues immediately.
Return False only for non-critical errors, ensuring the system does not silently lose data.

Before (Old Output):

If saving the index failed, the method would just return False:

result = faiss_instance.remove_training_data(some_id)
# Output: False
# No error message, no exception, hard to debug

After (New Output):

If saving the index fails, you get a clear error message and exception:

result = faiss_instance.remove_training_data(some_id)
# Output: Error saving FAISS index or metadata: [Errno 13] Permission denied: 'sql_index.faiss'
# Raises DependencyError: Failed to save FAISS index or metadata: [Errno 13] Permission denied: 'sql_index.faiss'

Why This Should Be Accepted:

Makes failures visible and actionable for maintainers and users.
Prevents silent data loss or corruption.
Follows best practices for robust production code.
Minimal, non-breaking change that does not affect main logic or API.
Uses the project's existing DependencyError class for consistency.

Summary:
This improvement ensures that any failure in index rebuilding or saving is logged and surfaced, making your system more reliable and maintainable. It directly benefits both development and production environments by making error states explicit and easier to diagnose.

Always open to discussing further and providing more clarification from my side to contribute to this repo wholeheartedly.

llamapreview

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

Primary purpose and scope: Improves error handling for FAISS index and metadata operations in the Vanna AI system, specifically focusing on the remove_training_data and remove_collection methods.
Key components modified: FAISS integration methods in src/vanna/faiss/faiss.py
Cross-component impacts: Error handling changes affect the entire RAG pipeline's reliability and debugging capabilities
Business value alignment: Directly improves system reliability and maintainability, reducing debugging time and preventing silent data corruption in production environments

1.2 Technical Architecture

System design modifications: Introduces layered error handling for index operations with immediate error surfacing
Component interaction changes: Error states now properly propagate through the system rather than being silently swallowed
Integration points impact: Changes how errors are handled during FAISS index operations which are critical for the RAG functionality
Dependency changes and implications: Uses existing DependencyError class consistently for error reporting

2. Critical Findings

2.1 Must Fix (P0🔴)

Issue: Inconsistent error handling levels causing critical errors to be swallowed

Analysis Confidence: High
Impact: Critical persistence errors become simple False return values, potentially hiding serious issues like data corruption or permission problems
Resolution: Restructure the exception handling hierarchy to properly propagate critical errors while handling non-critical ones appropriately. See suggested code in the Technical Analysis section.

Issue: Insecure embedding generation logic that may cause data inconsistency

Analysis Confidence: High
Impact: Partial index rebuilds leading to data inconsistency and potential information loss during critical operations
Resolution: Revert to the original JSON-based embedding generation approach to maintain consistent indexing behavior

2.2 Should Fix (P1🟡)

Issue: Improper logging mechanism using print statements

Analysis Confidence: High
Impact: Production diagnostics difficulty and console pollution in operational environments
Suggested Solution: Replace print statements with proper logging using Python's logging module

Issue: Overly broad exception catching that hides error specifics

Analysis Confidence: High
Impact: Makes root cause analysis difficult due to lack of specific error information
Suggested Solution: Catch specific exceptions rather than the generic Exception class to provide better error context

Issue: Potential state inconsistency between index and metadata

Analysis Confidence: Medium
Impact: Risk of index/metadata mismatch on save failures leading to corrupted state
Suggested Solution: Implement a transactional approach to state management where changes are only committed after successful persistence

2.3 Consider (P2🟢)

Area: Index rebuilding optimization

Analysis Confidence: High
Improvement Opportunity: Current full index rebuilds could be optimized to use incremental updates with FAISS ID mappings for better performance

Area: Enhanced error recovery mechanisms

Analysis Confidence: Medium
Improvement Opportunity: Could add automatic retry for transient errors and corruption detection with fallback to last good state

Area: Enhanced diagnostics in error messages

Analysis Confidence: High
Improvement Opportunity: Including error codes and operation context would improve programmatic handling and debugging

Area: Resource management during index operations

Analysis Confidence: Medium
Improvement Opportunity: Current approach creates full index copies which could be optimized to reuse existing indexes with in-place modifications

2.4 Summary of Action Items

Critical Fixes Needed Before Merge:
- Restructure exception handling hierarchy to properly propagate critical errors (P0)
- Revert embedding generation to JSON-based approach to prevent data inconsistency (P0)
Important Improvements:
- Replace print statements with proper logging (P1)
- Implement specific exception catching (P1)
- Add transactional state management (P1)
Optional Enhancements:
- Optimize index rebuilding process (P2)
- Add error recovery mechanisms (P2)
- Enhance diagnostic information in errors (P2)

3. Technical Analysis

3.1 Code Logic Analysis

📁 src/vanna/faiss/faiss.py - remove_training_data()

Submitted PR Code:

  try:
      for metadata_list, index, index_name in [
          (self.sql_metadata, self.sql_index, 'sql_index.faiss'),
          (self.ddl_metadata, self.ddl_index, 'ddl_index.faiss'),
          (self.doc_metadata, self.doc_index, 'doc_index.faiss')
      ]:
          for i, item in enumerate(metadata_list):
              if item['id'] == id:
                  del metadata_list[i]
                  new_index = faiss.IndexFlatL2(self.embedding_dim)
                  if index_name == 'sql_index.faiss':
                      embeddings = [self.generate_embedding(m["question"] + " " + m["sql"]) for m in metadata_list if "question" in m and "sql" in m]
                  elif index_name == 'ddl_index.faiss':
                      embeddings = [self.generate_embedding(m["ddl"]) for m in metadata_list if "ddl" in m]
                  elif index_name == 'doc_index.faiss':
                      embeddings = [self.generate_embedding(m["documentation"]) for m in metadata_list if "documentation" in m]
                  else:
                      embeddings = []
                  if embeddings:
                      new_index.add(np.array(embeddings, dtype=np.float32))
                  setattr(self, index_name.split('.')[0], new_index)
                  if self.curr_client == 'persistent':
                      try:
                          self._save_index(new_index, index_name)
                          self._save_metadata(metadata_list, f"{index_name.split('.')[0]}_metadata.json")
                      except Exception as e:
                          print(f"Error saving FAISS index or metadata: {e}")
                          raise DependencyError(f"Failed to save FAISS index or metadata: {e}")
                  return True
      return False
  except Exception as e:
      print(f"Error in remove_training_data: {e}")
      return False

Analysis:
- The new error handling structure is a significant improvement but has some critical issues
- The outer exception handler swallows all errors including critical DependencyError cases
- The embedding generation logic has changed from the original JSON-based approach to field-specific handling which may cause inconsistencies
- Error messages are printed rather than properly logged
- The structure could lead to state inconsistencies between the index and metadata
LlamaPReview Suggested Improvements:

  def remove_training_data(self, id: str, **kwargs) -> bool:
      try:
          for metadata_list, index, index_name in [
              (self.sql_metadata, self.sql_index, 'sql_index.faiss'),
              (self.ddl_metadata, self.ddl_index, 'ddl_index.faiss'),
              (self.doc_metadata, self.doc_index, 'doc_index.faiss')
          ]:
              for i, item in enumerate(metadata_list):
                  if item['id'] == id:
                      del metadata_list[i]
                      new_index = faiss.IndexFlatL2(self.embedding_dim)
                      # Revert to original embedding approach for consistency
                      embeddings = [self.generate_embedding(json.dumps(m)) for m in metadata_list]
                      if embeddings:
                          new_index.add(np.array(embeddings, dtype=np.float32))
                      setattr(self, index_name.split('.')[0], new_index)
                      if self.curr_client == 'persistent':
                          try:
                              self._save_index(new_index, index_name)
                              self._save_metadata(metadata_list, f"{index_name.split('.')[0]}_metadata.json")
                          except Exception as e:
                              logger.error(f"Error saving FAISS index or metadata: {e}")
                              raise DependencyError(f"Failed to save FAISS index or metadata: {e}")
                      return True
          return False
      except DependencyError:
          raise  # Re-raise critical persistence errors
      except Exception as e:
          logger.error(f"Non-critical error in remove_training_data: {e}")
          return False

Improvement rationale:
- Proper error propagation for critical issues while handling non-critical errors appropriately
- Consistent embedding generation prevents data loss and maintains vector space consistency
- Proper logging instead of print statements
- Clear separation between critical and non-critical error handling

📁 src/vanna/faiss/faiss.py - remove_collection()

Submitted PR Code:

  def remove_collection(self, collection_name: str) -> bool:
      try:
          if collection_name in ["sql", "ddl", "documentation"]:
              setattr(self, f"{collection_name}_index", faiss.IndexFlatL2(self.embedding_dim))
              setattr(self, f"{collection_name}_metadata", [])
              if self.curr_client == 'persistent':
                  try:
                      self._save_index(getattr(self, f"{collection_name}_index"), f"{collection_name}_index.faiss")
                      self._save_metadata([], f"{collection_name}_metadata.json")
                  except Exception as e:
                      print(f"Error saving FAISS index or metadata: {e}")
                      raise DependencyError(f"Failed to save FAISS index or metadata: {e}")
              return True
          return False
      except Exception as e:
          print(f"Error in remove_collection: {e}")
          return False

Analysis:
- Similar structure to remove_training_data with the same issues
- State changes occur before persistence operations which could lead to inconsistencies
- Uses print statements for error reporting
- Catches overly broad exceptions
LlamaPReview Suggested Improvements:

  def remove_collection(self, collection_name: str) -> bool:
      if collection_name not in ["sql", "ddl", "documentation"]:
          return False

      try:
          # Create temporary state
          temp_index = faiss.IndexFlatL2(self.embedding_dim)
          temp_metadata = []

          # Attempt persistence first
          if self.curr_client == 'persistent':
              try:
                  self._save_index(temp_index, f"{collection_name}_index.faiss")
                  self._save_metadata(temp_metadata, f"{collection_name}_metadata.json")
              except Exception as e:
                  logger.error(f"Error saving FAISS data for {collection_name}: {e}")
                  raise DependencyError(f"Failed to save FAISS data for {collection_name}: {e}")

          # Only commit if persistence succeeds
          setattr(self, f"{collection_name}_index", temp_index)
          setattr(self, f"{collection_name}_metadata", temp_metadata)
          return True

      except DependencyError:
          raise
      except Exception as e:
          logger.error(f"Non-critical error in remove_collection: {e}")
          return False

Improvement rationale:
- Transactional approach prevents state inconsistencies
- Proper error propagation and logging
- Clear separation of critical and non-critical errors
- More robust state management

3.2 Key Quality Aspects

System scalability considerations: The current implementation could be optimized for better performance with large indexes
Performance bottlenecks and optimizations: Full index rebuilds on each removal could be optimized to use incremental updates
Testing strategy and coverage: Should include tests for various error scenarios (permission issues, disk full, corrupted files)
Documentation needs: Should document the new error handling behavior and exception types that can be raised

4. Overall Evaluation

Technical assessment: The PR addresses critical reliability issues but needs fixes to prevent introducing new problems
Business impact: Significant improvement to system reliability and maintainability
Risk evaluation: Medium risk due to potential data consistency issues that need to be addressed
Notable positive aspects and good practices: Proper error classification, use of existing error types, backward compatibility
Implementation quality: Good conceptual approach but needs refinement in execution
Final recommendation: Request Changes - The PR needs critical fixes before merging, particularly around error handling structure and embedding generation logic

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

veerababu1729 · 2025-08-05T14:48:21Z

can someone review this?

Improve error handling for FAISS index and metadata operations

865642d

llamapreview bot reviewed Aug 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve error handling for FAISS index and metadata operations #942

Improve error handling for FAISS index and metadata operations #942

Uh oh!

veerababu1729 commented Aug 3, 2025 •

edited

Loading

llamapreview bot left a comment

veerababu1729 commented Aug 5, 2025

Labels

1 participant

Improve error handling for FAISS index and metadata operations #942

Are you sure you want to change the base?

Improve error handling for FAISS index and metadata operations #942

Uh oh!

Conversation

veerababu1729 commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llamapreview bot left a comment

Choose a reason for hiding this comment

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

1.2 Technical Architecture

2. Critical Findings

2.1 Must Fix (P0🔴)

2.2 Should Fix (P1🟡)

2.3 Consider (P2🟢)

2.4 Summary of Action Items

3. Technical Analysis

3.1 Code Logic Analysis

3.2 Key Quality Aspects

4. Overall Evaluation

veerababu1729 commented Aug 5, 2025

Labels

1 participant

veerababu1729 commented Aug 3, 2025 •

edited

Loading