Skip to content

Conversation

@cipheraxat
Copy link

@cipheraxat cipheraxat commented Oct 30, 2025

Issue Components closes #36614

  • Component: Beam YAML

Changes Made

  1. Added example YAML pipeline in `sdks/python/apache_beam/yaml/examples/transforms/sql/calcite_connection_properties.yaml`

    • Shows how to provide `calcite_connection_properties` under the top-level `options:` key
    • Demonstrates both YAML mapping format (preferred) and JSON string format (for compatibility)
    • Includes a working SQL transform example using PostgreSQL functions
  2. Updated YAML docs generator in `sdks/python/apache_beam/yaml/generate_yaml_docs.py`

    • Added special handling for the SQL transform to include a callout about calcite connection properties
    • The generated transform catalog page will now include clear documentation and examples
    • Shows both approaches for providing connection properties

Example Usage

Preferred YAML mapping approach:
```yaml
options:
calcite_connection_properties:
fun: postgresql
```

Alternative JSON string approach:
```yaml
options:
calcite_connection_properties: '{"fun": "postgresql"}'
```

Testing

  • Syntax validation passed for the modified Python generator
  • Example YAML follows existing patterns in the examples directory
  • Changes are backwards compatible and don't affect existing functionality
…_properties

- Add example YAML pipeline showing how to provide calcite_connection_properties
  in sdks/python/apache_beam/yaml/examples/transforms/sql/calcite_connection_properties.yaml
- Update YAML docs generator to include callout on SQL transform page explaining
  how to use calcite_connection_properties via options: key
- Shows both YAML mapping and JSON string approaches for compatibility

This addresses the issue where users had difficulty knowing how to specify
calcite_connection_properties for SQL transforms in YAML pipelines.
The transform catalog will now include examples and clear guidance.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @cipheraxat, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the usability of Beam YAML for SQL transforms by adding comprehensive documentation and an example for "calcite_connection_properties". This ensures users can easily configure dialect-specific SQL functions, with clear guidance on using both YAML mapping and JSON string formats for these properties, thereby enhancing pipeline flexibility and clarity.

Highlights

  • New Example Pipeline: Introduced a YAML pipeline example demonstrating how to configure "calcite_connection_properties" for SQL transforms, showcasing both YAML mapping and JSON string formats.
  • Documentation Enhancement: Modified the YAML documentation generator to automatically include a specific callout for the SQL transform, detailing how to provide "calcite_connection_properties" in the generated docs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mohamedawnallah
Copy link
Contributor

@cipheraxat – Just curious what Beam issue this PR closes?

@cipheraxat
Copy link
Author

@mohamedawnallah - This is the issue : #36614

@mohamedawnallah
Copy link
Contributor

@mohamedawnallah - This is the issue : #36614

Let's add in the PR description "Closes #36614". That way the reviewers and people seeing PR becomes clear what it closes. Also it would be nice for that issue to be automatically closed when that PR is merged

@mohamedawnallah
Copy link
Contributor

And seems formatting workflows complains. This can be done by following the CONTRIBUTING.md guide which references the following:
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks

@mohamedawnallah
Copy link
Contributor

mohamedawnallah commented Oct 30, 2025

It would be nice as well to state how someone's supposed to test this (any unit/integration tests) and/or if it is tested automatically in the CI?

@mohamedawnallah
Copy link
Contributor

It would be nice as well to state how someone's supposed to test this (any unit/integration tests) and/or if it is tested automatically in the CI?

Let me know if you need any helping hands regards that

@cipheraxat cipheraxat changed the title [BEAM YAML] Add documentation and examples for SQL calcite_connection… Oct 30, 2025
@cipheraxat
Copy link
Author

Sure @mohamedawnallah, let me check through formatting issues. I will reach out if I am stuck anywhere.

@cipheraxat cipheraxat changed the title Closes #36614 : Add documentation and examples for SQL calcite_connection… Oct 30, 2025
@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@github-actions
Copy link
Contributor

Assigning reviewers:

R: @jrmccluskey for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@mohamedawnallah
Copy link
Contributor

waiting on author

@mohamedawnallah
Copy link
Contributor

On initial glance, It looks like we need to register the example YAML file name as processor in examples_test.py perhaps seeing test_sqlserver_to_bigquery_yaml on how the file is created and registered in examples_test.py:

Where It is Created:

# This is an example of a Beam YAML pipeline that reads from spanner database
# and writes to GCS avro files. This matches the Dataflow Template located
# here - https://cloud.google.com/dataflow/docs/guides/templates/provided/cloud-spanner-to-avro
pipeline:
type: composite
transforms:
# Step 1: Reading data from SqlServer
- type: ReadFromSqlServer
name: ReadFromSqlServer
config:
url: "jdbc:sqlserver://localhost:12345;databaseName=shipment;user=apple;password=apple123;encrypt=false;trustServerCertificate=true"
query: "SELECT * FROM shipments"
driver_class_name: "com.microsoft.sqlserver.jdbc.SQLServerDriver"
# Step 2: Write records out to BigQuery
- type: WriteToBigQuery
name: WriteShipments
input: ReadFromSqlServer
config:
table: "apache-beam-testing.yaml_test.shipments"
create_disposition: "CREATE_NEVER"
write_disposition: "WRITE_APPEND"
error_handling:
output: "deadLetterQueue"
num_streams: 1
# Step 3: Write the failed messages to BQ to a dead letter queue JSON file
- type: WriteToJson
input: WriteShipments.deadLetterQueue
config:
path: "gs://my-bucket/yaml-123/writingToBigQueryErrors.json"
options:
temp_location: "gs://apache-beam-testing/temp"
# Expected:
# Row(shipment_id='S1', customer_id='C1', shipment_date='2023-05-01', shipment_cost=150.0, customer_name='Alice', customer_email='alice@example.com')
# Row(shipment_id='S2', customer_id='C2', shipment_date='2023-06-12', shipment_cost=300.0, customer_name='Bob', customer_email='bob@example.com')
# Row(shipment_id='S3', customer_id='C1', shipment_date='2023-05-10', shipment_cost=20.0, customer_name='Alice', customer_email='alice@example.com')
# Row(shipment_id='S4', customer_id='C4', shipment_date='2024-07-01', shipment_cost=150.0, customer_name='Derek', customer_email='derek@example.com')
# Row(shipment_id='S5', customer_id='C5', shipment_date='2023-05-09', shipment_cost=300.0, customer_name='Erin', customer_email='erin@example.com')
# Row(shipment_id='S6', customer_id='C4', shipment_date='2024-07-02', shipment_cost=150.0, customer_name='Derek', customer_email='derek@example.com')

Where It is Registered:

@YamlExamplesTestSuite.register_test_preprocessor([
'test_sqlserver_to_bigquery_yaml',
])
def __sqlserver_io_read_test_preprocessor(
test_spec: dict, expected: List[str], env: TestEnvironment):
"""
Preprocessor for tests that involve reading from SqlServer.
url syntax: 'jdbc:sqlserver://<host>:<port>;databaseName=<database>;
user=<user>;password=<password>;encrypt=false;trustServerCertificate=true'
"""
return _db_io_read_test_processor(
test_spec, lambda url: url.split(';')[1].split('=')[-1], 'SqlServer')

'test_sqlserver_to_bigquery_yaml',

Relatedly:
https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples#testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment