This project conducts a comprehensive analysis of the USA real estate market using data from Realtor.com, a leading real estate listing website. The analysis employs various statistical and machine learning techniques to uncover insights into housing market trends, price determinants, and market segmentation. The project aims to provide valuable insights for real estate professionals, investors, and policymakers to make data-driven decisions.
- Exploratory Data Analysis
- Feature Engineering
- Statistical Testing (Chi-Square, ANOVA, T-tests)
- Data Visualization
- Clustering Analysis
- Linear Regression
- Time Series Analysis
- Correlation Analysis
- Python
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
- SciPy
- Jupyter Notebooks
The analysis is based on a dataset containing over 2.2 million real estate listings across the United States. Key features include:
- Property characteristics (bedrooms, bathrooms, lot size, living area)
- Location data (city, state, zip code)
- Price information
- Property status (for sale, sold, ready to build)
-
Exploratory Data Analysis (
notebooks/Exploratory_Data_Analysis_of_of_USA_Real_Estate_Properties.ipynb
)- Statistical analysis of property characteristics
- Distribution analysis of prices and features
- Correlation analysis between variables
- Geographical distribution of listings
-
High-Level Analysis (
notebooks/High_level_analysis.ipynb
)- Clustering Analysis
- Identified 5 distinct market segments
- Market segmentation insights
- Statistical Tests
- Chi-square tests for state-status relationships
- ANOVA for price differences
- T-tests for price comparisons
- Linear Regression
- Price prediction model
- Feature importance analysis
- Time Series Analysis
- Price trends and patterns
- Clustering Analysis
- Significant price variations across different states and property statuses
- Strong market segmentation with distinct property clusters
- Dynamic market conditions in states like Texas, with high construction activity
- Weak to moderate correlations between price and physical characteristics
- Python 3.7+
- Required packages listed in
requirements.txt
- Clone this repository
git clone https://github.com/slfagrouche/usa-real-estate-analysis.git
- Install required packages
pip install -r requirements.txt
- Source: Realtor.com dataset (via Kaggle)
- Format: CSV file containing 2,226,382 entries
- Key columns: price, bedrooms, bathrooms, location data, property status
- Raw data location:
data/raw/realtor-data.zip.csv
usa-real-estate-analysis/
├── Final Report.pdf # Comprehensive project report
├── LICENSE # MIT License
├── README.md # Project documentation
├── data/
│ └── raw/ # Original dataset
│ └── realtor-data.zip.csv
├── docs/ # Additional documentation
├── notebooks/
│ ├── Exploratory_Data_Analysis_of_of_USA_Real_Estate_Properties.ipynb
│ └── High_level_analysis.ipynb
├── requirements.txt # Project dependencies
└── src/
├── __init__.py
├── data/ # Data processing scripts
│ ├── __init__.py
│ └── process_data.py
└── visualization/ # Visualization functions
├── __init__.py
└── visualize.py
- Final Report - Comprehensive analysis and findings
- Exploratory Data Analysis
- High-Level Analysis
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- Said Lfagrouche - GitHub Profile
This project is licensed under the MIT License - see the LICENSE file for details
- Data provided by Realtor.com
- Analysis conducted for educational purposes and hands on experience