fix(sql_generation): handle scenario where table columns have "from" keyword in query #1600

ArslanSaleem · 2025-02-11T11:27:07Z

Important

Refactor SQL parsing and sanitization to handle 'from' keyword in table names and add dialect support for SQL safety checks.

Behavior:
- Refactor extract_table_names to SQLParser in sql_parser.py for better SQL parsing.
- Add dialect support to is_sql_query_safe in sql_sanitizer.py and serialize in dataframe_serializer.py.
- Handle SQL queries with 'from' keyword in table names.
Refactoring:
- Remove extract_table_names from helpers/sql.py and integrate into SQLParser.
- Update imports and usage in code_cleaning.py, local_loader.py, sql_loader.py, and view_loader.py.
Testing:
- Add tests for extract_table_names and dialect support in test_sql_parser.py.
- Update tests in test_code_cleaning.py, test_sql_loader.py, and test_dataframe_serializer.py to reflect changes.

^{This description was created by}^{for bb24501. It will automatically update as commits are pushed.}

…anspile later

ellipsis-dev

👍 Looks good to me! Reviewed everything up to bb24501 in 1 minute and 44 seconds

More details

Looked at 367 lines of code in 13 files
Skipped 0 files when reviewing.
Skipped posting 12 drafted comments based on config settings.

1. pandasai/core/code_generation/code_cleaning.py:56

Draft comment:
Consider whether to explicitly pass a dialect to SQLParser.extract_table_names for clarity. Currently it defaults to 'postgres', which is fine if that’s intended.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

2. pandasai/data_loader/local_loader.py:96

Draft comment:
Good job passing the dialect argument ('duckdb') to is_sql_query_safe to ensure query validation specific to DuckDB.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

3. pandasai/data_loader/sql_loader.py:46

Draft comment:
Passing the source_type as the dialect to is_sql_query_safe is a clear improvement. Make sure that source_type always aligns with the dialect expected by sqlglot.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

4. pandasai/data_loader/view_loader.py:95

Draft comment:
Passing dialect=source_type to is_sql_query_safe in execute_query helps maintain consistency across source types. Ensure that the dialect names in source_type meet the sqlglot specifications.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

5. pandasai/helpers/dataframe_serializer.py:12

Draft comment:
The addition of the 'dialect' parameter to serialize enhances flexibility. Please ensure the new parameter is documented in the module docs.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

6. pandasai/query_builders/sql_parser.py:67

Draft comment:
Extending extract_table_names with a dialect parameter is a useful enhancement. Confirm that test cases cover dialect-specific differences adequately.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

7. tests/unit_tests/data_loader/test_sql_loader.py:125

Draft comment:
The test now verifies that is_sql_query_safe is called with the correct dialect ("mysql"). This improves clarity and robustness in SQL sanitization.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

8. tests/unit_tests/prompts/test_sql_prompt.py:63

Draft comment:
The prompt now includes the dialect attribute in the table tag. Ensure consistency in formatting and consider using a utility for normalizing line breaks in tests across platforms.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

9. pandasai/core/code_generation/code_cleaning.py:55

Draft comment:
Consider whether the SQL dialect should be configurable when extracting table names. Currently, SQLParser.extract_table_names is called without an explicit dialect (defaults to 'postgres'). If supporting multiple dialects, pass the appropriate dialect from context.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

10. pandasai/helpers/dataframe_serializer.py:11

Draft comment:
Update the docstring for the serialize method to mention the new 'dialect' parameter and its default value to improve clarity for API users.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50%
None

11. pandasai/data_loader/sql_loader.py:46

Draft comment:
Ensure that the 'source_type' used as the dialect parameter for is_sql_query_safe is supported by sqlglot. If there is variability in supported dialects (e.g., 'mysql', 'duckdb'), consider adding validation or documentation.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50%
None

12. pandasai/query_builders/sql_parser.py:67

Draft comment:
Good update adding the dialect parameter to extract_table_names. Verify that all invocations of SQLParser methods across the project consistently pass or use the intended dialect.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50%
None

Workflow ID: wflow_jGpmImMEK40Yu2PE

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

codecov · 2025-02-11T11:30:06Z

Codecov Report

Attention: Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.66%. Comparing base (d37a2ca) to head (bb24501).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
pandasai/data_loader/local_loader.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1600      +/-   ##
==========================================
+ Coverage   89.62%   89.66%   +0.03%     
==========================================
  Files          72       71       -1     
  Lines        2594     2604      +10     
==========================================
+ Hits         2325     2335      +10     
  Misses        269      269

Flag	Coverage Δ
unittests	`89.66% <95.65%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gventuri and others added 5 commits February 10, 2025 12:35

chore: remove leftover sqlite export from sql conector

9113e19

Merge branch 'main' of https://github.com/gventuri/pandas-ai

bce1803

fix(from_statement): refactor to generate query in one dialect and tr…

7b99be2

…anspile later

fix: remove extra push

50fc2d1

remove print statement

bb24501

ArslanSaleem requested a review from gventuri February 11, 2025 11:27

ellipsis-dev bot reviewed Feb 11, 2025

View reviewed changes

gventuri requested a review from scaliseraoul-sinaptik February 11, 2025 11:41

gventuri approved these changes Feb 11, 2025

View reviewed changes

scaliseraoul approved these changes Feb 11, 2025

View reviewed changes

gventuri merged commit 266c79e into main Feb 11, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sql_generation): handle scenario where table columns have "from" keyword in query #1600

fix(sql_generation): handle scenario where table columns have "from" keyword in query #1600

ArslanSaleem commented Feb 11, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

codecov bot commented Feb 11, 2025 •

edited

Loading

fix(sql_generation): handle scenario where table columns have "from" keyword in query #1600

fix(sql_generation): handle scenario where table columns have "from" keyword in query #1600

Conversation

ArslanSaleem commented Feb 11, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 11, 2025 • edited Loading

Codecov Report

ArslanSaleem commented Feb 11, 2025 •

edited by ellipsis-dev bot

Loading

codecov bot commented Feb 11, 2025 •

edited

Loading