Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Export to CSV with UTF-8 encoding #826

Closed

Conversation

kinshuksinghbist
Copy link

Export a pandas DataFrame to a CSV file with proper UTF-8 encoding.
This ensures that non-ASCII text(such as Chinese characters) are correctly rendered in the exported file.
Fixes #784

Screenshot from 2025-03-27 16-14-09

Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

  • Primary purpose and scope: Implements CSV export functionality with UTF-8 encoding to resolve character encoding issues for non-ASCII text
  • Key components modified: Added export_to_csv method in base.py
  • Cross-component impacts: Affects all DataFrame export operations
  • Business value alignment: Enhances internationalization support and resolves critical user-reported encoding issues

1.2 Technical Architecture

  • System design modifications: New utility method added to core base class
  • Integration points impact: Direct integration with pandas DataFrame operations

2. Critical Findings

2.1 Must Fix (P0🔴)

No critical blocking issues found. The implementation correctly addresses the reported encoding problem.

2.2 Should Fix (P1🟡)

Issue: Missing input validation in export_to_csv

  • Analysis Confidence: High
  • Impact: Potential TypeErrors and unclear error messages for invalid inputs
  • Suggested Solution:
  if not isinstance(df, pd.DataFrame):
      raise TypeError("df must be a pandas DataFrame")
  if not isinstance(filepath, str):
      raise TypeError("filepath must be a string")

Issue: Lack of test coverage

  • Analysis Confidence: Medium
  • Impact: Risk of regression and undetected edge cases
  • Suggested Solution: Add unit tests verifying UTF-8 encoding, error handling, and kwargs override

2.3 Consider (P2🟢)

Area: Docstring improvement

  • Analysis Confidence: High
  • Improvement Opportunity: Better communicate encoding solution to users

Area: Error handling specificity

  • Analysis Confidence: Medium
  • Improvement Opportunity: More granular error catching for file operations

2.4 Summary of Action Items

  1. Add input validation (P1 - immediate)
  2. Implement test cases (P1 - pre-release)
  3. Enhance documentation (P2 - next sprint)

3. Technical Analysis

3.1 Code Logic Analysis

📁 src/vanna/base/base.py - export_to_csv

  • Submitted PR Code:
    def export_to_csv(self, df: pd.DataFrame, filepath: str, **kwargs) -> None:
        if 'encoding' not in kwargs:
            kwargs['encoding'] = 'utf-8'
        if 'index' not in kwargs:
            kwargs['index'] = False
        try:
            df.to_csv(filepath, **kwargs)
        except Exception as e:
            self.log(f"Error exporting DataFrame to CSV: {e}", title="Export Error")
            raise
  • Analysis:
    • Current implementation lacks input validation for DataFrame and filepath
    • No handling of directory creation/path validation
    • Generic exception catching may obscure root causes
  • LlamaPReview Suggested Improvements:
    def export_to_csv(self, df: pd.DataFrame, filepath: str, **kwargs) -> None:
        # Input validation
        if not isinstance(df, pd.DataFrame):
            raise TypeError("Expected pandas DataFrame")
        if not isinstance(filepath, str):
            raise TypeError("Filepath must be string")
        
        # Set defaults
        kwargs.setdefault('encoding', 'utf-8')
        kwargs.setdefault('index', False)
        
        try:
            # Create parent directories if needed
            Path(filepath).parent.mkdir(parents=True, exist_ok=True)
            df.to_csv(filepath, **kwargs)
        except OSError as e:
            self.log(f"File system error: {e}", title="Export Error")
            raise
        except pd.errors.PandasError as e:
            self.log(f"Data export error: {e}", title="Export Error")
            raise
  • Improvement rationale:
    • Prevents invalid parameter types upfront
    • Handles common file system errors explicitly
    • Provides more actionable error categorization

3.2 Key Quality Aspects

  • Testing strategy: Requires validation of various encoding scenarios
  • Documentation needs: Explicit mention of UTF-8 default in public docs

4. Overall Evaluation

  • Technical assessment: Well-structured solution addressing core issue
  • Business impact: Critical improvement for international users
  • Risk evaluation: Low risk with suggested validations added
  • Notable positives: Clean implementation with proper default handling
  • Implementation quality: Good foundation needing complementary tests
  • Final recommendation: Approve with follow-up issues for validation and testing

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The generated CSV does not support Chinese
1 participant