📈 S&P 500 Stocks Time Series Regression

🚀 Introduction

Hey there! 👋 I'm diving deep into the world of S&P 500 stock market trends, building a time series regression model to predict stock movements. This project is a mix of data exploration, feature engineering, and machine learning to gain valuable financial insights.

📊 Data Overview

I've got three datasets that power this analysis:

1️⃣ S&P 500 Companies (`sp500_companies.csv`)

This dataset contains details about S&P 500 companies, including their sector, industry, market cap, financials, and stock weights.

🔍 Initial Findings:

The big tech giants dominate: Apple, NVIDIA, Microsoft, and Alphabet hold the highest market cap.
NVIDIA is a growth beast, showing 122.4% revenue growth, outpacing others.
Apple has the highest weight (0.0635) in the index, meaning it significantly impacts overall movements.

2️⃣ S&P 500 Index (`sp500_index.csv`)

This dataset tracks the historical performance of the S&P 500 index.

📌 First few entries:

The earliest date is October 6, 2014, with the index at 1,964.82 points.
The daily fluctuations reflect economic trends, earnings reports, and investor sentiment.

💡 Next step: Plot the S&P 500 index over time to see the long-term trend.

3️⃣ S&P 500 Stock Prices (`sp500_stocks.csv`)

This dataset contains historical stock prices for every S&P 500 company.

🔍 Example (3M - MMM on Jan 4, 2010):

Open: 69.47, Close: 69.41
High: 69.77, Low: 69.12
Volume: 3.64M shares traded

📌 Why this matters:

The "Adj Close" column accounts for dividends/splits, making it crucial for accurate return calculations.

Here's how I'd write this step in GitHub README.md format while keeping the tone like yours:

🛠️ Step 2: Data Preprocessing

Before diving into analysis, I cleaned and prepped the data to make it ready for modeling. Here’s what I did:

🔄 Convert Date Columns to Datetime Format

Since I'm working with time series data, it's crucial to ensure that dates are in the proper format.

sp500_index['Date'] = pd.to_datetime(sp500_index['Date'])
sp500_stocks['Date'] = pd.to_datetime(sp500_stocks['Date'])

📌 Sort Data by Date

Sorting helps in sequential analysis and forecasting.

sp500_index = sp500_index.sort_values('Date')
sp500_stocks = sp500_stocks.sort_values('Date')

🚨 Handle Missing Values

Missing values can mess up ML models, so I filled them with zero (0) for now.

sp500_companies.fillna(0, inplace=True)
sp500_stocks.fillna(0, inplace=True)

📊 Set the Index for Time Series Analysis

Setting the 'Date' column as an index makes it easier to visualize and apply time series models.

sp500_index.set_index('Date', inplace=True)

✅ Now the data is clean and ready for exploration!

📊 Step 3: Exploratory Data Analysis (EDA)

📈 S&P 500 Index Trend

To understand the historical performance of the S&P 500 Index, I plotted the trend over time.

📉 AAPL Stock Price Trend

I also analyzed Apple's (AAPL) stock price movements over time.

Code Snippets

Below are the Python code snippets used for visualization:

# S&P 500 Index Trend
plt.figure(figsize=(12, 6))
plt.plot(sp500_index.index, sp500_index['S&P500'], label="S&P 500 Index")
plt.title("S&P 500 Index Trend")
plt.xlabel('Year')
plt.ylabel('Index Value')
plt.legend()
plt.show()

# AAPL Stock Price Trend
aapl_stock = sp500_stocks[sp500_stocks['Symbol'] == 'AAPL']
plt.figure(figsize=(12, 6))


plt.plot(aapl_stock['Date'], aapl_stock['Close'], label='AAPL Close Price')
plt.title('AAPL Stock Price Trend')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.legend()
plt.show()

⏳ Step 4: Time Series Forecasting

🔮 ARIMA Model for S&P 500 Prediction

To predict future trends in the S&P 500 Index, I used the Auto ARIMA model, which automatically selects the best ARIMA parameters based on AIC (Akaike Information Criterion).

📌 Stationarity Check (ADF Test)

Before applying ARIMA, we checked if the time series is stationary using the Augmented Dickey-Fuller (ADF) Test:

result = sm.tsa.adfuller(sp500_index['S&P500'])
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')

📊 Results:

ADF Statistic: 0.4040 p-value: 0.9816 (Greater than 0.05 → The time series is non-stationary) Since the p-value is high, we apply differencing to make the series stationary.

⚙️ Auto ARIMA Model The Auto ARIMA function automatically finds the best parameters:

sp500_diff = sp500_index['S&P500'].diff().dropna()
arima_model = auto_arima(sp500_diff, seasonal=False, trace=True, stepwise=True)

📌 Best ARIMA Model Found: ARIMA(2,0,2) with intercept This model was selected based on the lowest AIC score (25188.691).

n_periods = 30
forecast, conf_int = arima_model.predict(n_periods=n_periods, return_conf_int=True)
forecast_dates = pd.date_range(sp500_index.index[-1], periods=n_periods, freq='B')

plt.figure(figsize=(10,6))
plt.plot(sp500_index.index, sp500_index['S&P500'], label='S&P 500')
plt.plot(forecast_dates, forecast.cumsum() + sp500_index['S&P500'].iloc[-1], 
         label='Forecast', color='red')
plt.fill_between(forecast_dates, conf_int[:, 0].cumsum() + sp500_index['S&P500'].iloc[-1], 
                 conf_int[:, 1].cumsum() + sp500_index['S&P500'].iloc[-1], color='red', alpha=0.3)
plt.title('S&P 500 Forecast')
plt.legend()
plt.show()

📊 Visualization: S&P 500 Forecast Below is the forecasted trend for the next 30 business days:

🔍 Insights:

The forecast shows a projected upward/downward trend for the S&P 500. The red shaded region represents the confidence interval, showing possible fluctuations.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
SP 500 index trend.png		SP 500 index trend.png
SP appl stock.png		SP appl stock.png
s-p-500-stocks-time-series-regression.ipynb		s-p-500-stocks-time-series-regression.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📈 S&P 500 Stocks Time Series Regression

🚀 Introduction

📊 Data Overview

1️⃣ S&P 500 Companies (`sp500_companies.csv`)

2️⃣ S&P 500 Index (`sp500_index.csv`)

3️⃣ S&P 500 Stock Prices (`sp500_stocks.csv`)

🛠️ Step 2: Data Preprocessing

🔄 Convert Date Columns to Datetime Format

📌 Sort Data by Date

🚨 Handle Missing Values

📊 Set the Index for Time Series Analysis

📊 Step 3: Exploratory Data Analysis (EDA)

📈 S&P 500 Index Trend

📉 AAPL Stock Price Trend

Code Snippets

⏳ Step 4: Time Series Forecasting

🔮 ARIMA Model for S&P 500 Prediction

📌 Stationarity Check (ADF Test)

About

Releases

Packages

Languages

Salma0-8/S-P-500-Stocks-Time-Series-Regression

Folders and files

Latest commit

History

Repository files navigation

📈 S&P 500 Stocks Time Series Regression

🚀 Introduction

📊 Data Overview

1️⃣ S&P 500 Companies (sp500_companies.csv)

2️⃣ S&P 500 Index (sp500_index.csv)

3️⃣ S&P 500 Stock Prices (sp500_stocks.csv)

🛠️ Step 2: Data Preprocessing

🔄 Convert Date Columns to Datetime Format

📌 Sort Data by Date

🚨 Handle Missing Values

📊 Set the Index for Time Series Analysis

📊 Step 3: Exploratory Data Analysis (EDA)

📈 S&P 500 Index Trend

📉 AAPL Stock Price Trend

Code Snippets

⏳ Step 4: Time Series Forecasting

🔮 ARIMA Model for S&P 500 Prediction

📌 Stationarity Check (ADF Test)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

1️⃣ S&P 500 Companies (`sp500_companies.csv`)

2️⃣ S&P 500 Index (`sp500_index.csv`)

3️⃣ S&P 500 Stock Prices (`sp500_stocks.csv`)

Packages