Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sql_generation): handle scenario where table columns have "from" keyword in query #1600

Merged
merged 5 commits into from
Feb 11, 2025

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Feb 11, 2025

Important

Refactor SQL parsing and sanitization to handle 'from' keyword in table names and add dialect support for SQL safety checks.

  • Behavior:
    • Refactor extract_table_names to SQLParser in sql_parser.py for better SQL parsing.
    • Add dialect support to is_sql_query_safe in sql_sanitizer.py and serialize in dataframe_serializer.py.
    • Handle SQL queries with 'from' keyword in table names.
  • Refactoring:
    • Remove extract_table_names from helpers/sql.py and integrate into SQLParser.
    • Update imports and usage in code_cleaning.py, local_loader.py, sql_loader.py, and view_loader.py.
  • Testing:
    • Add tests for extract_table_names and dialect support in test_sql_parser.py.
    • Update tests in test_code_cleaning.py, test_sql_loader.py, and test_dataframe_serializer.py to reflect changes.

This description was created by Ellipsis for bb24501. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to bb24501 in 1 minute and 44 seconds

More details
  • Looked at 367 lines of code in 13 files
  • Skipped 0 files when reviewing.
  • Skipped posting 12 drafted comments based on config settings.
1. pandasai/core/code_generation/code_cleaning.py:56
  • Draft comment:
    Consider whether to explicitly pass a dialect to SQLParser.extract_table_names for clarity. Currently it defaults to 'postgres', which is fine if that’s intended.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
2. pandasai/data_loader/local_loader.py:96
  • Draft comment:
    Good job passing the dialect argument ('duckdb') to is_sql_query_safe to ensure query validation specific to DuckDB.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
3. pandasai/data_loader/sql_loader.py:46
  • Draft comment:
    Passing the source_type as the dialect to is_sql_query_safe is a clear improvement. Make sure that source_type always aligns with the dialect expected by sqlglot.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
4. pandasai/data_loader/view_loader.py:95
  • Draft comment:
    Passing dialect=source_type to is_sql_query_safe in execute_query helps maintain consistency across source types. Ensure that the dialect names in source_type meet the sqlglot specifications.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
5. pandasai/helpers/dataframe_serializer.py:12
  • Draft comment:
    The addition of the 'dialect' parameter to serialize enhances flexibility. Please ensure the new parameter is documented in the module docs.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
6. pandasai/query_builders/sql_parser.py:67
  • Draft comment:
    Extending extract_table_names with a dialect parameter is a useful enhancement. Confirm that test cases cover dialect-specific differences adequately.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
7. tests/unit_tests/data_loader/test_sql_loader.py:125
  • Draft comment:
    The test now verifies that is_sql_query_safe is called with the correct dialect ("mysql"). This improves clarity and robustness in SQL sanitization.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
8. tests/unit_tests/prompts/test_sql_prompt.py:63
  • Draft comment:
    The prompt now includes the dialect attribute in the table tag. Ensure consistency in formatting and consider using a utility for normalizing line breaks in tests across platforms.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
9. pandasai/core/code_generation/code_cleaning.py:55
  • Draft comment:
    Consider whether the SQL dialect should be configurable when extracting table names. Currently, SQLParser.extract_table_names is called without an explicit dialect (defaults to 'postgres'). If supporting multiple dialects, pass the appropriate dialect from context.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
10. pandasai/helpers/dataframe_serializer.py:11
  • Draft comment:
    Update the docstring for the serialize method to mention the new 'dialect' parameter and its default value to improve clarity for API users.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
11. pandasai/data_loader/sql_loader.py:46
  • Draft comment:
    Ensure that the 'source_type' used as the dialect parameter for is_sql_query_safe is supported by sqlglot. If there is variability in supported dialects (e.g., 'mysql', 'duckdb'), consider adding validation or documentation.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
12. pandasai/query_builders/sql_parser.py:67
  • Draft comment:
    Good update adding the dialect parameter to extract_table_names. Verify that all invocations of SQLParser methods across the project consistently pass or use the intended dialect.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None

Workflow ID: wflow_jGpmImMEK40Yu2PE


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

codecov bot commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.66%. Comparing base (d37a2ca) to head (bb24501).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pandasai/data_loader/local_loader.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1600      +/-   ##
==========================================
+ Coverage   89.62%   89.66%   +0.03%     
==========================================
  Files          72       71       -1     
  Lines        2594     2604      +10     
==========================================
+ Hits         2325     2335      +10     
  Misses        269      269              
Flag Coverage Δ
unittests 89.66% <95.65%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gventuri gventuri merged commit 266c79e into main Feb 11, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants