-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sql_generation): handle scenario where table columns have "from" keyword in query #1600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to bb24501 in 1 minute and 44 seconds
More details
- Looked at
367
lines of code in13
files - Skipped
0
files when reviewing. - Skipped posting
12
drafted comments based on config settings.
1. pandasai/core/code_generation/code_cleaning.py:56
- Draft comment:
Consider whether to explicitly pass a dialect to SQLParser.extract_table_names for clarity. Currently it defaults to 'postgres', which is fine if that’s intended. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
2. pandasai/data_loader/local_loader.py:96
- Draft comment:
Good job passing the dialect argument ('duckdb') to is_sql_query_safe to ensure query validation specific to DuckDB. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
3. pandasai/data_loader/sql_loader.py:46
- Draft comment:
Passing the source_type as the dialect to is_sql_query_safe is a clear improvement. Make sure that source_type always aligns with the dialect expected by sqlglot. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
4. pandasai/data_loader/view_loader.py:95
- Draft comment:
Passing dialect=source_type to is_sql_query_safe in execute_query helps maintain consistency across source types. Ensure that the dialect names in source_type meet the sqlglot specifications. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
5. pandasai/helpers/dataframe_serializer.py:12
- Draft comment:
The addition of the 'dialect' parameter to serialize enhances flexibility. Please ensure the new parameter is documented in the module docs. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
6. pandasai/query_builders/sql_parser.py:67
- Draft comment:
Extending extract_table_names with a dialect parameter is a useful enhancement. Confirm that test cases cover dialect-specific differences adequately. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
7. tests/unit_tests/data_loader/test_sql_loader.py:125
- Draft comment:
The test now verifies that is_sql_query_safe is called with the correct dialect ("mysql"). This improves clarity and robustness in SQL sanitization. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
8. tests/unit_tests/prompts/test_sql_prompt.py:63
- Draft comment:
The prompt now includes the dialect attribute in the table tag. Ensure consistency in formatting and consider using a utility for normalizing line breaks in tests across platforms. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
9. pandasai/core/code_generation/code_cleaning.py:55
- Draft comment:
Consider whether the SQL dialect should be configurable when extracting table names. Currently, SQLParser.extract_table_names is called without an explicit dialect (defaults to 'postgres'). If supporting multiple dialects, pass the appropriate dialect from context. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
10. pandasai/helpers/dataframe_serializer.py:11
- Draft comment:
Update the docstring for the serialize method to mention the new 'dialect' parameter and its default value to improve clarity for API users. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
11. pandasai/data_loader/sql_loader.py:46
- Draft comment:
Ensure that the 'source_type' used as the dialect parameter for is_sql_query_safe is supported by sqlglot. If there is variability in supported dialects (e.g., 'mysql', 'duckdb'), consider adding validation or documentation. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
12. pandasai/query_builders/sql_parser.py:67
- Draft comment:
Good update adding the dialect parameter to extract_table_names. Verify that all invocations of SQLParser methods across the project consistently pass or use the intended dialect. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
Workflow ID: wflow_jGpmImMEK40Yu2PE
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1600 +/- ##
==========================================
+ Coverage 89.62% 89.66% +0.03%
==========================================
Files 72 71 -1
Lines 2594 2604 +10
==========================================
+ Hits 2325 2335 +10
Misses 269 269
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Important
Refactor SQL parsing and sanitization to handle 'from' keyword in table names and add dialect support for SQL safety checks.
extract_table_names
toSQLParser
insql_parser.py
for better SQL parsing.is_sql_query_safe
insql_sanitizer.py
andserialize
indataframe_serializer.py
.extract_table_names
fromhelpers/sql.py
and integrate intoSQLParser
.code_cleaning.py
,local_loader.py
,sql_loader.py
, andview_loader.py
.extract_table_names
and dialect support intest_sql_parser.py
.test_code_cleaning.py
,test_sql_loader.py
, andtest_dataframe_serializer.py
to reflect changes.This description was created by
for bb24501. It will automatically update as commits are pushed.