Explorytics is a Python library designed to simplify exploratory data analysis (EDA). It automates statistical evaluations, visualizations, and key insights extraction, providing an intuitive interface for analyzing complex datasets quickly and effectively.
- Python 3.8 or higher
- Recommended: Jupyter Notebook for interactive exploration
Install Explorytics via pip:
pip install explorytics
To upgrade an existing installation:
pip install --upgrade explorytics
DataAnalyzer
is the primary interface for analyzing datasets.
from explorytics import DataAnalyzer
analyzer = DataAnalyzer(dataframe)
- Statistical Analysis: Computes numeric summaries, including mean, median, standard deviation, and percentiles.
- Correlation Analysis: Identifies relationships between variables.
- Outlier Detection: Detects outliers based on statistical thresholds.
The Visualizer
submodule provides tools for creating plots.
-
Distribution Plots:
analyzer.visualizer.plot_distribution(feature_name, kde=True)
-
Correlation Heatmaps:
analyzer.visualizer.plot_correlation_matrix()
-
Scatter Plots:
analyzer.visualizer.plot_scatter(x, y, color="column_name")
-
Box Plots:
analyzer.visualizer.plot_boxplot(feature_name)
-
Load the dataset:
import pandas as pd from sklearn.datasets import load_wine # Load the wine dataset wine = load_wine() df = pd.DataFrame(wine.data, columns=wine.feature_names) df['wine_class'] = wine.target
-
Initialize the analyzer:
from explorytics import DataAnalyzer analyzer = DataAnalyzer(df)
-
Perform analysis:
results = analyzer.analyze()
-
Display basic statistics:
print("Basic Statistics:") print(results.basic_stats)
-
Identify correlations:
print("Correlations:") print(results.correlations)
-
Analyze outliers:
print("Outlier Information:") print(results.outliers)
-
Plot feature distribution:
analyzer.visualizer.plot_distribution('alcohol', kde=True).show()
-
Plot correlation matrix:
analyzer.visualizer.plot_correlation_matrix().show()
-
Visualize scatter relationships:
analyzer.visualizer.plot_scatter('alcohol', 'color_intensity', color='wine_class').show()
-
Create a boxplot for outliers:
analyzer.visualizer.plot_boxplot('malic_acid').show()
DataAnalyzer(dataframe)
- Parameters:
dataframe
(pandas.DataFrame): Dataset to analyze.
-
analyze()
: Performs comprehensive analysis, returning:basic_stats
: Summary statistics for numeric columns.correlations
: Pairwise correlation matrix.outliers
: Information on detected outliers.
-
get_feature_summary(feature_name)
: Returns detailed statistics for a specific feature.
-
plot_distribution(feature_name, kde=False)
: Plots the distribution of a feature. -
plot_correlation_matrix()
: Displays a heatmap of correlations. -
plot_scatter(x, y, color=None)
: Creates a scatter plot for two features. -
plot_boxplot(feature_name)
: Generates a boxplot for a feature.
Contributions are welcome! To contribute:
-
Fork the repository.
-
Create a new branch:
git checkout -b feature-name
-
Submit a pull request.
Explorytics is licensed under the MIT License. See LICENSE for details.