Skip to content

Conversation

@suke1-tal
Copy link

fix issue #971

Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review by LlamaPReview

🎯 TL;DR & Recommendation

Recommendation: Approve with suggestions

This PR fixes a critical TypeError that crashes the application when empty training data is provided, but introduces minor code style and type safety inconsistencies that could mask data quality issues.

🌟 Strengths

  • Directly resolves a runtime crash reported in issue #971, improving application stability.
Priority File Category Impact Summary Anchors
P2 src/vanna/base/base.py Bug Silently skips None documentation, masking data issues -
P2 src/vanna/base/base.py Maintainability Method lacks input validation, remains vulnerable -
P2 src/vanna/base/base.py Architecture Type hint mismatch with actual data flow -

🔍 Notable Themes

  • Data Validation Gaps: Multiple findings highlight insufficient input validation and type consistency in the documentation processing flow, suggesting a need for broader defensive programming improvements.

💡 Have feedback? We'd love to hear it in our GitHub Discussions.
✨ This review was generated by LlamaPReview Advanced, which is free for all open-source projects. Learn more.

if len(documentation_list) > 0:
initial_prompt += "\n===Additional Context \n\n"

for documentation in documentation_list:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 | Confidence: High

The fix addresses the immediate TypeError by adding a None check before calling str_to_approx_token_count(). However, the solution introduces a potential logical inconsistency - when documentation is None, it's silently skipped without any indication or logging. This could mask data quality issues in the training pipeline. Additionally, the code formatting is inconsistent with the rest of the condition structure.

Suggested change
for documentation in documentation_list:
if documentation is not None and (
self.str_to_approx_token_count(initial_prompt)
+ self.str_to_approx_token_count(documentation)
< max_tokens
):

Evidence: method:add_documentation_to_prompt, method:str_to_approx_token_count

max_tokens: int = 14000,
) -> str:
if len(documentation_list) > 0:
initial_prompt += "\n===Additional Context \n\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Contextual Comment]
This comment refers to code near real line 515. Anchored to nearest_changed(541) line 541.


P2 | Confidence: High

Speculative: The root cause analysis suggests the str_to_approx_token_count method lacks input validation. While the immediate fix handles the None case at the call site, the underlying method remains vulnerable to similar issues if called elsewhere with invalid inputs. Adding defensive programming at the method level would make the code more robust.

Code Suggestion:

def str_to_approx_token_count(self, string: str) -> int:
        if string is None:
            return 0
        return len(string) / 4

Evidence: method:str_to_approx_token_count


[Contextual Comment]
This comment refers to code near real line 534. Anchored to nearest_changed(541) line 541.


P2 | Confidence: Medium

Speculative: The method signature indicates it accepts list[str] but the actual data flow includes None values, suggesting a type contract violation. This indicates either the type hint is incorrect or there's a data integrity issue upstream. The method should either validate input types or update the type hint to reflect the actual contract (list[Optional[str]]).

Code Suggestion:

def add_documentation_to_prompt(
        self,
        initial_prompt: str,
        documentation_list: list[Optional[str]],  # Updated type hint
        max_tokens: int = 14000,
    ) -> str:

Evidence: method:add_documentation_to_prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant