Skip to content

Conversation

@betodealmeida
Copy link
Member

SUMMARY

Fixes #34894

When using get_time_filter(remove_filter=True) in virtual datasets, time filters were incorrectly applied twice:

  1. Once in the inner query (from the virtual dataset's Jinja template)
  2. Again in the outer query WHERE clause (added by Superset's query generator)

Root Cause: Timeseries charts use the special __timestamp alias for time filters, but virtual dataset Jinja templates use actual column names (e.g., "dttm"). The removed_filters check didn't recognize these as referring to the same column, causing the filter to be re-applied in the outer query.

Solution: Enhanced the filter skip logic in superset/models/helpers.py to check both the __timestamp alias and the actual datetime column name when determining if a filter should be skipped.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

BEFORE (Bug): Virtual dataset with get_time_filter('dttm', remove_filter=True) would generate SQL like:

SELECT * FROM (
  SELECT * FROM my_table WHERE dttm >= '2024-01-01' AND dttm < '2024-01-31'
) AS virtual_table
WHERE dttm >= '2024-01-01' AND dttm < '2024-01-31'  -- Filter applied again!

AFTER (Fixed): The filter is only applied once in the inner query:

SELECT * FROM (
  SELECT * FROM my_table WHERE dttm >= '2024-01-01' AND dttm < '2024-01-31'
) AS virtual_table
-- No duplicate filter in outer query

TESTING INSTRUCTIONS

  1. Run the new unit tests to verify the fix:

    pytest tests/unit_tests/models/test_time_filter_double_application.py -v
  2. Run existing time filter tests to ensure no regressions:

    pytest tests/unit_tests/jinja_context_test.py -k test_get_time_filter -v
  3. Manual test (optional):

    • Create a virtual dataset with SQL using get_time_filter('your_time_col', remove_filter=True, target_type='DATE')
    • Create a timeseries chart from this virtual dataset
    • Add the chart to a dashboard with a time filter applied
    • Inspect the generated SQL (via query inspector or logs) - the time filter should only appear in the subquery, not duplicated in the outer query

ADDITIONAL INFORMATION

Code Changes:

  • superset/models/helpers.py: Added logic to handle __timestamp alias when checking removed_filters (10 lines)
  • tests/unit_tests/models/test_time_filter_double_application.py: Added comprehensive unit tests (360 lines)

Testing Coverage:

  • Test for basic virtual dataset with remove_filter=True
  • Test for __timestamp alias handling in timeseries queries
  • Test for simple non-nested cases

All pre-commit checks pass (mypy, ruff, pylint).

…sets

Fixes #34894

When using get_time_filter(remove_filter=True) in virtual datasets,
time filters were incorrectly applied twice - once in the inner query
(virtual dataset) and again in the outer query.

Root cause: Timeseries charts use the __timestamp alias for time filters,
but virtual dataset templates use actual column names (e.g., "dttm").
The removed_filters check didn't recognize these as the same column,
causing the filter to be re-applied.

Solution: Enhanced the filter skip logic to check both the __timestamp
alias and the actual datetime column name when determining if a filter
should be skipped.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@bito-code-review
Copy link
Contributor

bito-code-review bot commented Oct 29, 2025

Code Review Agent Run #103684

Actionable Suggestions - 0
Additional Suggestions - 7
  • tests/unit_tests/models/test_time_filter_double_application.py - 7
    • Third party imports in type checking · Line 23-23
      Move third-party imports `Flask` and `MockerFixture` into a type-checking block to improve runtime performance. Multiple similar imports need the same fix.
      Code suggestion
       @@ -19,8 +19,12 @@
       from __future__ import annotations
      
       from datetime import datetime
      +from typing import TYPE_CHECKING
      
      -from flask import Flask
       from freezegun import freeze_time
      -from pytest_mock import MockerFixture
      
      +if TYPE_CHECKING:
      +    from flask import Flask
      +    from pytest_mock import MockerFixture
      +
       from superset.connectors.sqla.models import SqlaTable, TableColumn
    • DateTime objects missing timezone information · Line 96-96
      Add `tzinfo` argument to `datetime()` calls to avoid timezone-naive datetime objects. Multiple similar issues exist (lines 96, 97, 207, 208, 310, 311).
      Code suggestion
       @@ -21,1 +21,2 @@
       from datetime import datetime
      +from datetime import timezone
       @@ -96,2 +97,2 @@
      -        "from_dttm": datetime(2024, 1, 1),
      -        "to_dttm": datetime(2024, 1, 31),
      +        "from_dttm": datetime(2024, 1, 1, tzinfo=timezone.utc),
      +        "to_dttm": datetime(2024, 1, 31, tzinfo=timezone.utc),
    • Magic number used in comparison logic · Line 251-251
      Replace magic number `3` with a named constant for better code maintainability.
      Code suggestion
       @@ -247,5 +247,7 @@
               # Count occurrences of date filter patterns
               dttm_count = generated_sql.count("dttm")
      
      +        # Expected: SELECT, WHERE from template, possibly one more context
      +        MAX_EXPECTED_DTTM_OCCURRENCES = 3
               # We expect to see dttm in the SELECT and WHERE (from template only)
               # But NOT duplicated by SQLAlchemy's filter application
      -        assert dttm_count <= 3, (
      +        assert dttm_count <= MAX_EXPECTED_DTTM_OCCURRENCES, (
    • Module docstring missing ending period · Line 17-17
      Module docstring should end with a period for consistency with documentation standards.
      Code suggestion
       @@ -17,1 +17,1 @@
      -"""Test for double time filter application in virtual datasets (Issue #34894)"""
      +"""Test for double time filter application in virtual datasets (Issue #34894)."""
    • Missing trailing comma in function parameters · Line 32-32
      Add trailing comma after function parameter for consistency. Multiple similar issues exist throughout the file (lines 32, 104, 121, 123, 125, 159, 215, 232, 234, 236, 318, 335, 337, 339).
      Code suggestion
       @@ -32,1 +32,1 @@
      -    mocker: MockerFixture, app: Flask
      +    mocker: MockerFixture, app: Flask,
    • Docstring formatting needs blank line separation · Line 34-34
      Add blank line between docstring summary and description for better readability. Multiple similar issues exist (lines 161, 167).
      Code suggestion
       @@ -34,6 +34,7 @@
           """
           Test that time filter is applied only once when using get_time_filter with
           remove_filter=True in a virtual dataset.
      +
           This test reproduces the issue described in GitHub issue #34894 where time
    • Print statement used for debugging output · Line 132-132
      Remove `print` statement or replace with proper logging. Multiple similar issues exist (lines 132, 242, 345).
      Code suggestion
       @@ -132,1 +132,0 @@
      -        print(f"\n\nGenerated SQL:\n{generated_sql}\n\n")
Review Details
  • Files reviewed - 2 · Commit Range: 502c812..502c812
    • superset/models/helpers.py
    • tests/unit_tests/models/test_time_filter_double_application.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot bot added the data:dataset Related to dataset configurations label Oct 29, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed my review and didn't find any issues.

Files scanned
File Path Reviewed
superset/models/helpers.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

@bito-code-review
Copy link
Contributor

Interaction Diagram by Bito
sequenceDiagram
participant User as User Request
participant API as REST API<br/>
participant Explore as ExploreMixin.get_sqla_query<br/>🔄 Updated | ●●○ Medium
participant TP as TemplateProcessor
participant Filter as Filter Logic<br/>🔄 Updated | ●●○ Medium
participant DB as Database
User->>API: Query with time filter
API->>Explore: get_sqla_query(query_obj)
Explore->>TP: Process Jinja template
TP-->>Explore: Returns removed_filters list
Explore->>Filter: Check filter_col_name & __timestamp
Note over Filter: If flt_col == __timestamp, check both<br/>alias and actual column name
Filter-->>Explore: should_skip_filter decision
Explore->>DB: Build WHERE clause (skip if in removed_filters)
DB-->>API: Return compiled SQL
API-->>User: Response
Loading

Critical path: User Request -> REST API -> ExploreMixin.get_sqla_query -> Filter Logic -> Database

Note: The fix enhances filter-skipping logic to handle the __timestamp alias correctly. When a Jinja template uses get_time_filter(remove_filter=True), the filter is marked for removal. The updated code now checks both the alias and the actual column name to prevent double-application of time filters in virtual datasets.

@rusackas rusackas requested a review from Copilot October 29, 2025 17:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses issue #34894 where time filters are being applied twice in virtual datasets - once in the inner query (from the Jinja template with remove_filter=True) and again in the outer query. The fix ensures that when remove_filter=True is used with get_time_filter() in a Jinja template, the filter is recognized and skipped in the outer query, particularly when the __timestamp alias is used in timeseries queries.

Key Changes

  • Enhanced filter skip logic to handle __timestamp alias properly by checking both the alias and the actual column name
  • Added comprehensive test coverage for the fix with three test scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
superset/models/helpers.py Added logic to recognize __timestamp alias as equivalent to the actual datetime column when checking if a filter should be skipped
tests/unit_tests/models/test_time_filter_double_application.py Added comprehensive test suite to verify time filter is only applied once when using remove_filter=True
Comment on lines +132 to +133
print(f"\n\nGenerated SQL:\n{generated_sql}\n\n")

Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statements should be removed from test code or replaced with proper logging. Consider using logger.debug() or removing this entirely.

Suggested change
print(f"\n\nGenerated SQL:\n{generated_sql}\n\n")
Copilot uses AI. Check for mistakes.
Comment on lines +242 to +243
print(f"\n\nSimple case SQL:\n{generated_sql}\n\n")

Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statements should be removed from test code or replaced with proper logging. Consider using logger.debug() or removing this entirely.

Suggested change
print(f"\n\nSimple case SQL:\n{generated_sql}\n\n")
Copilot uses AI. Check for mistakes.
Comment on lines +345 to +346
print(f"\n\nTimestamp alias case SQL:\n{generated_sql}\n\n")

Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statements should be removed from test code or replaced with proper logging. Consider using logger.debug() or removing this entirely.

Suggested change
print(f"\n\nTimestamp alias case SQL:\n{generated_sql}\n\n")
Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:dataset Related to dataset configurations size/L

1 participant