Skip to content

Using economic data to try and predict housing market trends or crashes. We also mix sentiment analysis using live tweets to track general feelings about housing market and what the talk could be.

Notifications You must be signed in to change notification settings

jrrameau2000/Housing_Market_Prediction

Repository files navigation

Housing Market Prediction

We use economic data to predict housing market trends, more specifically crashes. We also use sentiment analysis using live tweets to track general feelings about housing market and what the general talk around the project is. We start off with the sentiment analysis and then dive into the economic data.

Sentiment Analysis on Housing Market

The sentiment analysis is largely dependent on Twitter. We user the vader analysis for the sentiment analysis, however we do create a custom list of stopwords. Those custom words are then added to a list of custom words imported from the NLTK library to later create a word cloud.

Set Up API

Set up Tweepy with required tokens and access keys. Using Api, we created a function that pulls Tweets from Twitter and does a sentiment analysis of those Tweets. image

Keyword search

Created function that allows the input of any keyword (can also be hashtag) and searches a requested amount of Tweets, as related to keyword and number inputed. The output is a list of raw Tweets containing the inputted keyword image

Sentiment Analysis

Dataframe was created containing Tweets with "positive", "negative", and "neutral" sentiment. Created a function that spits out the count of how many Tweets are in each dataframe image

Raw Tweets

The variable "tweet_list" contains a list of the most recent tweets as described by the parameters inputted in the "keyword search" image

Stopwords

Stopwords were imported from nltk.corpus. We also created a for loop that iterated through each tweet to find words that were frequently mentionned. These words could have been a list of adverbs, hashtags, or verbs that don't add much syntax to the project, for example: "a, #housingmarket, realestate, isn't." The goal in finding frequently mentionned words was to create a custom list of stopwords, so we could find more "valuable" words that are mentionnend when a specifici key word is mentionned. image

Processing Tweets

Created a function that cleans tweets and removes stopwords image

Wordcloud

After each tweet has been processed and cleaned for stopwords, a wordcloud is generated containing words that showed up often. The goal of the wordcloud is to see what people say when a specific keyword is searched. image

Wordcloud #2

This wordcloud was conducted a week later to see if their were common words that showedup. image

Economic Data on Housing Market

Most of the data used is public data from Fanny Mac. The data contains fixed and adjusted mortgage rates for houses starting from 1971. The data also contains 15 year and 30 year interest rates, as well as the margin of profit that banks make on those loans. We then run various regression models to create predictions and understand trends.

Median Home Price

image

image

Number of Homes Sold (in Millions)

image

image

Number of New Homes Sold

image

Mortgage Applications Submitted

image

Interest Rates

The data from interests rates was later merged with another dataframe containing the number of houses purchased in each region of the US, starting from the 1970s. The data was merged in order to facilitate the view of the dataframe and to also create a linear regression model. image

Random Forest Regressor

Using the previously mentionned dataframe, we run a random forest regression to create a predictive model.A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree. X contained interest rates and margin, and y contained houses bought in the US. According to the result, purchases were mostly related to margin than they were related to housing interest rates. image

Linear Regression

We also ran a linear regression and concluded that about 48% of the time, the data can can explain the trend in houses bought image

Deep Learning Model

Understanding the links between multiple economic indicators and their influence on mortgage rates we used 8 datasets to create this model including Inflation(CPI), Changes in Mortgage Back Securities Prices, Avg Wages, the Fed Funds rate, number of houses sold, Unemployment rates, and average adjustable and fixed rated mortgages.

DL_Code DL_Code2 DL_Code3

Inflation

Inflation_df

Mortgage Backed Securities

MBS_df

Fed Funds Rate

fed_funds

All Dataframes Combined

Library_data Combined_df

Relationship between Fixed and Adjustable Rate Mortgages

FvsA_df FvsA

Price to Interest Rate Relationship

PricevsInterest

Results

DL_Results DL_df

Conclusion

Although sentiment may say the US housing market is on the verge of a crash. The data says otherwise. With the Fed keeping interest rates astronomically low, there is no reason to predict that prices will go down. Despite other economic indicators including rising GDP, rising inflation, low unemployment, more government spending, and wages increasing the Federal Reserve is intent on keeping interest low to keep both stock and housing markets on the rise.

About

Using economic data to try and predict housing market trends or crashes. We also mix sentiment analysis using live tweets to track general feelings about housing market and what the talk could be.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •