Skip to content

mr-pylin/datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🗃️ Datasets

A collection of datasets for Data Visualization, Data Analysis, and Machine Learning tasks

📖 Table of Contents

🔗 Useful Links

Dataset Retrieval

Here are some popular resources to download a wide range of datasets for your projects:

Source Description
Kaggle Datasets A comprehensive collection of datasets across various domains, including machine learning, computer vision, NLP, and more.
UCI Machine Learning Repository A well-known collection of datasets for machine learning tasks.
Google Dataset Search A search engine that helps you find datasets stored across the web.
AWS Public Datasets Amazon's collection of public datasets, including data related to machine learning, genomics, and more.
OpenML An open platform for sharing datasets, machine learning algorithms, and experiments.
Data.gov A U.S. government site offering datasets for a wide range of public sectors, including health, agriculture, and energy.
Zenodo A general-purpose repository for research datasets, articles, and software, with great support for open data.

Data Tools

Here are some essential tools and libraries for working with datasets:

Category Library Description
Data Manipulation & Analysis Pandas A powerful library for data manipulation and analysis, providing data structures like DataFrames.
NumPy A fundamental library for numerical computing in Python, supporting large, multi-dimensional arrays and matrices.
Dask A parallel computing library that scales Pandas and NumPy workflows for larger-than-memory datasets.
Data Visualization Matplotlib A popular library for creating static, animated, and interactive visualizations in Python.
Seaborn Built on top of Matplotlib, it provides a high-level interface for drawing attractive statistical graphics.
Plotly An interactive graphing library for Python, useful for creating web-based visualizations.
Bokeh A visualization library for creating interactive plots and dashboards.
Machine Learning & Data Science Scikit-learn A simple and efficient library for machine learning in Python.
TensorFlow An open-source framework for building and training machine learning models.
PyTorch A deep learning framework offering flexibility and speed.
XGBoost A highly efficient gradient boosting library for regression, classification, and ranking tasks.
Data Preprocessing OpenCV A powerful library for computer vision tasks, including image processing and feature extraction.
Librosa A Python package for music and audio analysis.
nltk A library for natural language processing with easy-to-use tools for text analysis.
Data Cleaning & Transformation Cleanlab A library for automatically detecting and correcting data errors in real-world datasets.
Great Expectations A framework for data testing, documentation, and profiling.
Pyjanitor A Python library for cleaning and transforming datasets, providing a simple and efficient API.

🔍 Find Me

Any mistakes, suggestions, or contributions? Feel free to reach out to me at:

I look forward to connecting with you! 🏃‍♂️

📄 License

This project is licensed under the MIT License.
The datasets in the ./data/ directory may be subject to their own licenses and usage restrictions, which are specified within each respective folder.

About

A collection of datasets for Data Visualization, Data Analysis, and Machine Learning tasks.

Topics

Resources

License

Stars

Watchers

Forks