Skip to content

Conversation

@veerababu1729
Copy link

@veerababu1729 veerababu1729 commented Aug 3, 2025

Title:
Improve Error Handling for FAISS Index and Metadata Operations
PATH : src/vanna/faiss/faiss.py

Description:
This PR adds robust error handling to the remove_training_data and remove_collection methods in the FAISS integration. Previously, failures during index rebuilding or metadata saving (e.g., disk errors, permission issues, corrupted files) would either fail silently or return False without diagnostic information. This made debugging and reliability difficult.

Changes:

  • Wrap critical FAISS index and metadata operations in try/except blocks.
  • Log errors to the console for visibility.
  • Raise a DependencyError if saving the FAISS index or metadata fails, surfacing issues immediately.
  • Return False only for non-critical errors, ensuring the system does not silently lose data.

Before (Old Output):

  • If saving the index failed, the method would just return False:
    result = faiss_instance.remove_training_data(some_id)
    # Output: False
    # No error message, no exception, hard to debug
    

After (New Output):

  • If saving the index fails, you get a clear error message and exception:
    result = faiss_instance.remove_training_data(some_id)
    # Output: Error saving FAISS index or metadata: [Errno 13] Permission denied: 'sql_index.faiss'
    # Raises DependencyError: Failed to save FAISS index or metadata: [Errno 13] Permission denied: 'sql_index.faiss'
    

Why This Should Be Accepted:

  • Makes failures visible and actionable for maintainers and users.
  • Prevents silent data loss or corruption.
  • Follows best practices for robust production code.
  • Minimal, non-breaking change that does not affect main logic or API.
  • Uses the project's existing DependencyError class for consistency.

Summary:
This improvement ensures that any failure in index rebuilding or saving is logged and surfaced, making your system more reliable and maintainable. It directly benefits both development and production environments by making error states explicit and easier to diagnose.

Always open to discussing further and providing more clarification from my side to contribute to this repo wholeheartedly.

Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

  • Primary purpose and scope: Improves error handling for FAISS index and metadata operations in the Vanna AI system, specifically focusing on the remove_training_data and remove_collection methods.
  • Key components modified: FAISS integration methods in src/vanna/faiss/faiss.py
  • Cross-component impacts: Error handling changes affect the entire RAG pipeline's reliability and debugging capabilities
  • Business value alignment: Directly improves system reliability and maintainability, reducing debugging time and preventing silent data corruption in production environments

1.2 Technical Architecture

  • System design modifications: Introduces layered error handling for index operations with immediate error surfacing
  • Component interaction changes: Error states now properly propagate through the system rather than being silently swallowed
  • Integration points impact: Changes how errors are handled during FAISS index operations which are critical for the RAG functionality
  • Dependency changes and implications: Uses existing DependencyError class consistently for error reporting

2. Critical Findings

2.1 Must Fix (P0🔴)

Issue: Inconsistent error handling levels causing critical errors to be swallowed

  • Analysis Confidence: High
  • Impact: Critical persistence errors become simple False return values, potentially hiding serious issues like data corruption or permission problems
  • Resolution: Restructure the exception handling hierarchy to properly propagate critical errors while handling non-critical ones appropriately. See suggested code in the Technical Analysis section.

Issue: Insecure embedding generation logic that may cause data inconsistency

  • Analysis Confidence: High
  • Impact: Partial index rebuilds leading to data inconsistency and potential information loss during critical operations
  • Resolution: Revert to the original JSON-based embedding generation approach to maintain consistent indexing behavior

2.2 Should Fix (P1🟡)

Issue: Improper logging mechanism using print statements

  • Analysis Confidence: High
  • Impact: Production diagnostics difficulty and console pollution in operational environments
  • Suggested Solution: Replace print statements with proper logging using Python's logging module

Issue: Overly broad exception catching that hides error specifics

  • Analysis Confidence: High
  • Impact: Makes root cause analysis difficult due to lack of specific error information
  • Suggested Solution: Catch specific exceptions rather than the generic Exception class to provide better error context

Issue: Potential state inconsistency between index and metadata

  • Analysis Confidence: Medium
  • Impact: Risk of index/metadata mismatch on save failures leading to corrupted state
  • Suggested Solution: Implement a transactional approach to state management where changes are only committed after successful persistence

2.3 Consider (P2🟢)

Area: Index rebuilding optimization

  • Analysis Confidence: High
  • Improvement Opportunity: Current full index rebuilds could be optimized to use incremental updates with FAISS ID mappings for better performance

Area: Enhanced error recovery mechanisms

  • Analysis Confidence: Medium
  • Improvement Opportunity: Could add automatic retry for transient errors and corruption detection with fallback to last good state

Area: Enhanced diagnostics in error messages

  • Analysis Confidence: High
  • Improvement Opportunity: Including error codes and operation context would improve programmatic handling and debugging

Area: Resource management during index operations

  • Analysis Confidence: Medium
  • Improvement Opportunity: Current approach creates full index copies which could be optimized to reuse existing indexes with in-place modifications

2.4 Summary of Action Items

  1. Critical Fixes Needed Before Merge:

    • Restructure exception handling hierarchy to properly propagate critical errors (P0)
    • Revert embedding generation to JSON-based approach to prevent data inconsistency (P0)
  2. Important Improvements:

    • Replace print statements with proper logging (P1)
    • Implement specific exception catching (P1)
    • Add transactional state management (P1)
  3. Optional Enhancements:

    • Optimize index rebuilding process (P2)
    • Add error recovery mechanisms (P2)
    • Enhance diagnostic information in errors (P2)

3. Technical Analysis

3.1 Code Logic Analysis

📁 src/vanna/faiss/faiss.py - remove_training_data()

  • Submitted PR Code:
  try:
      for metadata_list, index, index_name in [
          (self.sql_metadata, self.sql_index, 'sql_index.faiss'),
          (self.ddl_metadata, self.ddl_index, 'ddl_index.faiss'),
          (self.doc_metadata, self.doc_index, 'doc_index.faiss')
      ]:
          for i, item in enumerate(metadata_list):
              if item['id'] == id:
                  del metadata_list[i]
                  new_index = faiss.IndexFlatL2(self.embedding_dim)
                  if index_name == 'sql_index.faiss':
                      embeddings = [self.generate_embedding(m["question"] + " " + m["sql"]) for m in metadata_list if "question" in m and "sql" in m]
                  elif index_name == 'ddl_index.faiss':
                      embeddings = [self.generate_embedding(m["ddl"]) for m in metadata_list if "ddl" in m]
                  elif index_name == 'doc_index.faiss':
                      embeddings = [self.generate_embedding(m["documentation"]) for m in metadata_list if "documentation" in m]
                  else:
                      embeddings = []
                  if embeddings:
                      new_index.add(np.array(embeddings, dtype=np.float32))
                  setattr(self, index_name.split('.')[0], new_index)
                  if self.curr_client == 'persistent':
                      try:
                          self._save_index(new_index, index_name)
                          self._save_metadata(metadata_list, f"{index_name.split('.')[0]}_metadata.json")
                      except Exception as e:
                          print(f"Error saving FAISS index or metadata: {e}")
                          raise DependencyError(f"Failed to save FAISS index or metadata: {e}")
                  return True
      return False
  except Exception as e:
      print(f"Error in remove_training_data: {e}")
      return False
  • Analysis:

    • The new error handling structure is a significant improvement but has some critical issues
    • The outer exception handler swallows all errors including critical DependencyError cases
    • The embedding generation logic has changed from the original JSON-based approach to field-specific handling which may cause inconsistencies
    • Error messages are printed rather than properly logged
    • The structure could lead to state inconsistencies between the index and metadata
  • LlamaPReview Suggested Improvements:

  def remove_training_data(self, id: str, **kwargs) -> bool:
      try:
          for metadata_list, index, index_name in [
              (self.sql_metadata, self.sql_index, 'sql_index.faiss'),
              (self.ddl_metadata, self.ddl_index, 'ddl_index.faiss'),
              (self.doc_metadata, self.doc_index, 'doc_index.faiss')
          ]:
              for i, item in enumerate(metadata_list):
                  if item['id'] == id:
                      del metadata_list[i]
                      new_index = faiss.IndexFlatL2(self.embedding_dim)
                      # Revert to original embedding approach for consistency
                      embeddings = [self.generate_embedding(json.dumps(m)) for m in metadata_list]
                      if embeddings:
                          new_index.add(np.array(embeddings, dtype=np.float32))
                      setattr(self, index_name.split('.')[0], new_index)
                      if self.curr_client == 'persistent':
                          try:
                              self._save_index(new_index, index_name)
                              self._save_metadata(metadata_list, f"{index_name.split('.')[0]}_metadata.json")
                          except Exception as e:
                              logger.error(f"Error saving FAISS index or metadata: {e}")
                              raise DependencyError(f"Failed to save FAISS index or metadata: {e}")
                      return True
          return False
      except DependencyError:
          raise  # Re-raise critical persistence errors
      except Exception as e:
          logger.error(f"Non-critical error in remove_training_data: {e}")
          return False
  • Improvement rationale:
    • Proper error propagation for critical issues while handling non-critical errors appropriately
    • Consistent embedding generation prevents data loss and maintains vector space consistency
    • Proper logging instead of print statements
    • Clear separation between critical and non-critical error handling

📁 src/vanna/faiss/faiss.py - remove_collection()

  • Submitted PR Code:
  def remove_collection(self, collection_name: str) -> bool:
      try:
          if collection_name in ["sql", "ddl", "documentation"]:
              setattr(self, f"{collection_name}_index", faiss.IndexFlatL2(self.embedding_dim))
              setattr(self, f"{collection_name}_metadata", [])
              if self.curr_client == 'persistent':
                  try:
                      self._save_index(getattr(self, f"{collection_name}_index"), f"{collection_name}_index.faiss")
                      self._save_metadata([], f"{collection_name}_metadata.json")
                  except Exception as e:
                      print(f"Error saving FAISS index or metadata: {e}")
                      raise DependencyError(f"Failed to save FAISS index or metadata: {e}")
              return True
          return False
      except Exception as e:
          print(f"Error in remove_collection: {e}")
          return False
  • Analysis:

    • Similar structure to remove_training_data with the same issues
    • State changes occur before persistence operations which could lead to inconsistencies
    • Uses print statements for error reporting
    • Catches overly broad exceptions
  • LlamaPReview Suggested Improvements:

  def remove_collection(self, collection_name: str) -> bool:
      if collection_name not in ["sql", "ddl", "documentation"]:
          return False

      try:
          # Create temporary state
          temp_index = faiss.IndexFlatL2(self.embedding_dim)
          temp_metadata = []

          # Attempt persistence first
          if self.curr_client == 'persistent':
              try:
                  self._save_index(temp_index, f"{collection_name}_index.faiss")
                  self._save_metadata(temp_metadata, f"{collection_name}_metadata.json")
              except Exception as e:
                  logger.error(f"Error saving FAISS data for {collection_name}: {e}")
                  raise DependencyError(f"Failed to save FAISS data for {collection_name}: {e}")

          # Only commit if persistence succeeds
          setattr(self, f"{collection_name}_index", temp_index)
          setattr(self, f"{collection_name}_metadata", temp_metadata)
          return True

      except DependencyError:
          raise
      except Exception as e:
          logger.error(f"Non-critical error in remove_collection: {e}")
          return False
  • Improvement rationale:
    • Transactional approach prevents state inconsistencies
    • Proper error propagation and logging
    • Clear separation of critical and non-critical errors
    • More robust state management

3.2 Key Quality Aspects

  • System scalability considerations: The current implementation could be optimized for better performance with large indexes
  • Performance bottlenecks and optimizations: Full index rebuilds on each removal could be optimized to use incremental updates
  • Testing strategy and coverage: Should include tests for various error scenarios (permission issues, disk full, corrupted files)
  • Documentation needs: Should document the new error handling behavior and exception types that can be raised

4. Overall Evaluation

  • Technical assessment: The PR addresses critical reliability issues but needs fixes to prevent introducing new problems
  • Business impact: Significant improvement to system reliability and maintainability
  • Risk evaluation: Medium risk due to potential data consistency issues that need to be addressed
  • Notable positive aspects and good practices: Proper error classification, use of existing error types, backward compatibility
  • Implementation quality: Good conceptual approach but needs refinement in execution
  • Final recommendation: Request Changes - The PR needs critical fixes before merging, particularly around error handling structure and embedding generation logic

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

@veerababu1729
Copy link
Author

can someone review this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant