We use economic data to predict housing market trends, more specifically crashes. We also use sentiment analysis using live tweets to track general feelings about housing market and what the general talk around the project is. We start off with the sentiment analysis and then dive into the economic data.
The sentiment analysis is largely dependent on Twitter. We user the vader analysis for the sentiment analysis, however we do create a custom list of stopwords. Those custom words are then added to a list of custom words imported from the NLTK library to later create a word cloud.
Set up Tweepy with required tokens and access keys. Using Api, we created a function that pulls Tweets from Twitter and does a sentiment analysis of those Tweets.
Created function that allows the input of any keyword (can also be hashtag) and searches a requested amount of Tweets, as related to keyword and number inputed. The output is a list of raw Tweets containing the inputted keyword
Dataframe was created containing Tweets with "positive", "negative", and "neutral" sentiment. Created a function that spits out the count of how many Tweets are in each dataframe
The variable "tweet_list" contains a list of the most recent tweets as described by the parameters inputted in the "keyword search"
Stopwords were imported from nltk.corpus. We also created a for loop that iterated through each tweet to find words that were frequently mentionned. These words could have been a list of adverbs, hashtags, or verbs that don't add much syntax to the project, for example: "a, #housingmarket, realestate, isn't." The goal in finding frequently mentionned words was to create a custom list of stopwords, so we could find more "valuable" words that are mentionnend when a specifici key word is mentionned.
Created a function that cleans tweets and removes stopwords
After each tweet has been processed and cleaned for stopwords, a wordcloud is generated containing words that showed up often. The goal of the wordcloud is to see what people say when a specific keyword is searched.
This wordcloud was conducted a week later to see if their were common words that showedup.
Most of the data used is public data from Fanny Mac. The data contains fixed and adjusted mortgage rates for houses starting from 1971. The data also contains 15 year and 30 year interest rates, as well as the margin of profit that banks make on those loans. We then run various regression models to create predictions and understand trends.
The data from interests rates was later merged with another dataframe containing the number of houses purchased in each region of the US, starting from the 1970s. The data was merged in order to facilitate the view of the dataframe and to also create a linear regression model.
Using the previously mentionned dataframe, we run a random forest regression to create a predictive model.A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree. X contained interest rates and margin, and y contained houses bought in the US. According to the result, purchases were mostly related to margin than they were related to housing interest rates.
We also ran a linear regression and concluded that about 48% of the time, the data can can explain the trend in houses bought
Understanding the links between multiple economic indicators and their influence on mortgage rates we used 8 datasets to create this model including Inflation(CPI), Changes in Mortgage Back Securities Prices, Avg Wages, the Fed Funds rate, number of houses sold, Unemployment rates, and average adjustable and fixed rated mortgages.
Although sentiment may say the US housing market is on the verge of a crash. The data says otherwise. With the Fed keeping interest rates astronomically low, there is no reason to predict that prices will go down. Despite other economic indicators including rising GDP, rising inflation, low unemployment, more government spending, and wages increasing the Federal Reserve is intent on keeping interest low to keep both stock and housing markets on the rise.