Skip to content

Conversation

@EleftheriaBatsou
Copy link

Summary

Addresses incorrect visuals reported in issue #952 where categorical-only or mixed data often produced misleading charts (e.g., line plots over the default 0..N-1 index).
Adds post-execution validation of LLM-generated figures:
Categorical-only: always replace with count-based charts (pie if ≤10 categories, otherwise bar).
Mixed data (categorical + numeric): if LLM emits a generic line over the index, replace with bar(x=categorical, y=first numeric).
Improves fallback when LLM code fails: categorical-only also uses count-based charts.
Implementation

src/vanna/base/base.py:
After executing LLM plotly_code, detect categorical-only or “line over index” patterns and override figure accordingly.
Fallback logic for categorical-only data aggregates counts to avoid equal-sized slices.
tests/test_graphics.py:
test_categorical_only_uses_counts_in_chart
test_categorical_only_overrides_llm_line_chart
test_mixed_data_overrides_llm_line_chart_to_bar
CI/Testing

Tests are deterministic and do not require API keys or network.
Run locally:
tox
or: pytest -q
Backwards compatibility

If LLM returns a sensible, explicit chart (e.g., bar with categorical x and numeric y), validation does not override it.
Closes

Closes #952

EleftheriaBatsou and others added 4 commits September 18, 2025 12:40
…tests for validation

Co-authored-by: Genie <genie@cosine.sh>
…plots

Co-authored-by: Genie <genie@cosine.sh>
…f categorical and numeric data are present

Co-authored-by: Genie <genie@cosine.sh>
…f categorical and numeric data are present

Co-authored-by: Genie <genie@cosine.sh>

Fix graphics generation for categorical data in plots
Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto Pull Request Review from LlamaPReview

🎯 TL;DR & Recommendation

Recommendation: Approve with suggestions
This PR effectively addresses misleading chart generation for categorical and mixed data with robust fallbacks and comprehensive tests, but includes minor suggestions for code quality and robustness improvements.

🌟 Strengths

  • Fixes a critical user-reported issue with incorrect visuals, enhancing data representation.
  • Adds deterministic tests that ensure reliability without external dependencies.
Priority File Category Impact Summary Anchors
P2 src/vanna/base/base.py Bug May incorrectly override valid charts due to index matching.
P2 src/vanna/base/base.py Maintainability DRY violation in type detection logic increases maintenance.
P2 src/vanna/base/base.py Robustness Exception swallowing masks potential bugs without logging.
P2 src/vanna/base/base.py Bug Assumes value_counts output format, risking breakage on changes.
P2 tests/test_graphics.py Testing Brittle test helper may fail with new chart types.

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

Comment on lines +2094 to +2097
numeric_cols = df.select_dtypes(include=["number"]).columns.tolist()
categorical_cols = df.select_dtypes(
include=["object", "category"]
).columns.tolist()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 | Confidence: High

The column type detection logic is duplicated in both the try and except blocks. This violates DRY principles and increases maintenance overhead. Future changes to type handling would require updates in multiple places.

Code Suggestion:

numeric_cols = df.select_dtypes(include=["number"]).columns.tolist()
    categorical_cols = df.select_dtypes(include=["object", "category"]).columns.tolist()
    # Use these precomputed lists throughout the function
Comment on lines +2114 to +2118
looks_like_default_index = (
len(x_values) == len(df)
and all(isinstance(v, (int, float)) for v in x_values)
and x_values == list(range(len(df)))
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 | Confidence: Medium

The heuristic for detecting default index plots may produce false positives if the data contains integer values matching the index range (e.g., a column with values [0,1,2,3] in a 4-row DataFrame). This could incorrectly override valid LLM-generated charts. The condition relies on exact index matching, which might not account for legitimate integer-based categorical data.

Suggested change
looks_like_default_index = (
len(x_values) == len(df)
and all(isinstance(v, (int, float)) for v in x_values)
and x_values == list(range(len(df)))
)
looks_like_default_index = (
len(x_values) == len(df)
and all(isinstance(v, (int, float)) for v in x_values)
and x_values == df.index.tolist() # More explicit index match
)
Comment on lines +2121 to +2123
except Exception:
# If inspection fails, leave the LLM chart as-is
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 | Confidence: Medium

Silently swallowing exceptions during figure inspection could mask underlying issues (e.g., malformed figures). While intended to preserve the original chart on errors, this might hide bugs that should be logged for debugging.

Code Suggestion:

except Exception as inspect_error:
        # Log debug information but keep the original chart
        logging.debug(f"Figure inspection failed: {inspect_error}")
Comment on lines +49 to +54
def _extract_counts(fig):
trace = fig.data[0]
values = getattr(trace, "values", None)
if values is None:
values = getattr(trace, "y", None)
return sorted(list(values))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 | Confidence: Medium

The test helper function assumes the first trace contains either 'values' (pie) or 'y' (bar) data. This might not hold for future chart types or multi-trace figures. Tests could become brittle if the visualization approach changes.

Code Suggestion:

def _extract_counts(fig):
    if hasattr(fig.data[0], 'values'):  # Pie chart
        return sorted(fig.data[0].values)
    elif hasattr(fig.data[0], 'y'):     # Bar chart
        return sorted(fig.data[0].y)
    else:
        raise ValueError("Unsupported chart type in test")
Comment on lines +2101 to +2102
vc = df[category].value_counts(dropna=False).reset_index()
vc.columns = [category, "count"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 | Confidence: Low

The column renaming assumes value_counts() returns exactly two columns ('index' and the count). This may break if pandas changes its output format or if the category column is named 'index'. While currently stable, this implicit coupling is fragile.

Suggested change
vc = df[category].value_counts(dropna=False).reset_index()
vc.columns = [category, "count"]
vc = df[category].value_counts(dropna=False).reset_index(name='count')
vc = vc.rename(columns={'index': category}) # Handle potential column name conflicts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant