Skip to content

Files

Latest commit

 

History

History
244 lines (158 loc) · 4.59 KB

DOCUMENTATION.md

File metadata and controls

244 lines (158 loc) · 4.59 KB

Explorytics Documentation

Table of Contents

  1. Introduction
  2. Getting Started
  3. Key Components
  4. Usage Examples
  5. API Reference
  6. Contributing
  7. License

Introduction

Explorytics is a Python library designed to simplify exploratory data analysis (EDA). It automates statistical evaluations, visualizations, and key insights extraction, providing an intuitive interface for analyzing complex datasets quickly and effectively.


Getting Started

Installation

Prerequisites

  • Python 3.8 or higher
  • Recommended: Jupyter Notebook for interactive exploration

Installation Steps

Install Explorytics via pip:

pip install explorytics

To upgrade an existing installation:

pip install --upgrade explorytics

Key Components

DataAnalyzer

DataAnalyzer is the primary interface for analyzing datasets.

Initialization

from explorytics import DataAnalyzer

analyzer = DataAnalyzer(dataframe)

Key Features

  1. Statistical Analysis: Computes numeric summaries, including mean, median, standard deviation, and percentiles.
  2. Correlation Analysis: Identifies relationships between variables.
  3. Outlier Detection: Detects outliers based on statistical thresholds.

Visualizer

The Visualizer submodule provides tools for creating plots.

Common Visualizations

  • Distribution Plots:

    analyzer.visualizer.plot_distribution(feature_name, kde=True)
  • Correlation Heatmaps:

    analyzer.visualizer.plot_correlation_matrix()
  • Scatter Plots:

    analyzer.visualizer.plot_scatter(x, y, color="column_name")
  • Box Plots:

    analyzer.visualizer.plot_boxplot(feature_name)

Usage Examples

Dataset Preparation

  1. Load the dataset:

    import pandas as pd
    from sklearn.datasets import load_wine
    
    # Load the wine dataset
    wine = load_wine()
    df = pd.DataFrame(wine.data, columns=wine.feature_names)
    df['wine_class'] = wine.target
  2. Initialize the analyzer:

    from explorytics import DataAnalyzer
    
    analyzer = DataAnalyzer(df)

Comprehensive Analysis

  1. Perform analysis:

    results = analyzer.analyze()
  2. Display basic statistics:

    print("Basic Statistics:")
    print(results.basic_stats)
  3. Identify correlations:

    print("Correlations:")
    print(results.correlations)
  4. Analyze outliers:

    print("Outlier Information:")
    print(results.outliers)

Visualization

  1. Plot feature distribution:

    analyzer.visualizer.plot_distribution('alcohol', kde=True).show()
  2. Plot correlation matrix:

    analyzer.visualizer.plot_correlation_matrix().show()
  3. Visualize scatter relationships:

    analyzer.visualizer.plot_scatter('alcohol', 'color_intensity', color='wine_class').show()
  4. Create a boxplot for outliers:

    analyzer.visualizer.plot_boxplot('malic_acid').show()

API Reference

DataAnalyzer

Initialization

DataAnalyzer(dataframe)
  • Parameters:
    • dataframe (pandas.DataFrame): Dataset to analyze.

Methods

  • analyze(): Performs comprehensive analysis, returning:

    • basic_stats: Summary statistics for numeric columns.
    • correlations: Pairwise correlation matrix.
    • outliers: Information on detected outliers.
  • get_feature_summary(feature_name): Returns detailed statistics for a specific feature.

Visualizer

Methods

  • plot_distribution(feature_name, kde=False): Plots the distribution of a feature.

  • plot_correlation_matrix(): Displays a heatmap of correlations.

  • plot_scatter(x, y, color=None): Creates a scatter plot for two features.

  • plot_boxplot(feature_name): Generates a boxplot for a feature.


Contributing

Contributions are welcome! To contribute:

  1. Fork the repository.

  2. Create a new branch:

    git checkout -b feature-name
  3. Submit a pull request.


License

Explorytics is licensed under the MIT License. See LICENSE for details.