https://www.youtube.com/watch?v=O5vtwq2URxQ
On Wall Street, the industry standard for financial data is the Bloomberg Terminal, which offers customers lots of in-depth data. However, it is expensive and hard to use, requiring a $24,000 annual subscription, a specialized keyboard, and lots of practice to master its keyboard commands and cluttered interface.
With this project, we wanted to bring the benefit of using financial data to retail investors in a more familiar form - simply asking a question. Using a Bidirectional Attentive Memory Network (described in Core Model), we created SimpliFi, a tool with which users can simply ask for details about a company in natural language.
However, during the process of implementing SimpliFi, we realized that the process of building a knowledge base was complicated and very time-consuming, and that we had built the parsing, training and querying tools from scratch. We believe that both the creation and use of KBQA models ought to be made easy. Since we already had code that we used to build the knowledge base for SimpliFi, we decided to also create a service that automatically creates and trains a BAMnet model from a single simplified file input.
Both SimpliFi and the KBQA SaaS use a Bidirectional Attentive Memory Network to answer questions. This model architecture significantly outperformed previous information-retrieval based methods while remaining competitive with (hand-crafted) semantic parsing based methods.
Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases
- pip
- python3.6+
- virtualenv
- npm
- wget (cli tool to download large files)
- docker
-
Train the model through the KBQA SaaS React frontend
-
Download the built data and pretrained model (see running SimpliFi Step 2)
NOTE (9/21/20)- I switched hosts for my website, and the pretrained model was lost in transition. You can train your own with the files in this repository if you're so inclined. If you aren't, I also added a demo video made shortly after the completion of this project so you can see it in action.
Input: A dataset in the specified format. One will be provided in the root directory called result_spy.json which contains data from stocks in the $SPY ETF.
Ouput: A trained model that can be queried via the api endpoint. The same model will also be used for SimpliFi. Ex: localhost:5000/answer?question=what_is_the_revenue_of_$aapl_?
https://github.com/AndrewAcomb/KBQA-SaaS-SimpliFi.git
cd KBQA-SaaS-SimpliFi
cd kbqa-saas-flask/qa&& { wget --no-check-certificate -r 'https://drive.google.com/uc?id=1DVouJLo_K5cs4iVjNlkF5Ed9NlsP9G9G&export=download' -O glove.840B.300d.w2v.zip; unzip glove.840B.300d.w2v.zip; rm glove.840B.300d.w2v.zip ; cd -;}
The download should take about 4 - 6 minutes.
virtualenv venv&& { source venv/bin/activate ; pip install --no-cache-dir -r requirements.txt ; python3 -m nltk.downloader stopwords ;}
cd kbqa-saas-flask
python3 app.py
In a new CLI window, navigate back to this repository.
cd kbqa-saas-react && { npm install ; npm start; }
Step 6. Go to http://localhost:3000/, click 'Select File', select the starter data, and click 'Start Upload'
Starter Data: result_spy.json in root directory.
Reformatting the Word2Vec embeddings to the data takes about 10 minutes.
Training the model takes about 15 minutes.
The model you just trained is now availible to be queried at the given address. Enter your question as a url in the following format: localhost:5000/answer?question=what_is_the_revenue_of_$aapl_?
Input: A question containing a valid stock ticker (Ex: What is the number of employees $AAPL has?)
Output: An answer containing the requested detail about the company (revenue, industry, market cap, etc.)
Docker Image: View
Dockerfile: View
If you run the two lines below, skip to Step 4
docker pull aca7964/simplifi:initialcommit
docker run -p 5000:5000 aca7964/simplifi:initialcommit
git clone https://github.com/AndrewAcomb/KBQA-SaaS-SimpliFi.git
cd KBQA-SaaS-SimpliFi
cd kbqa-saas-flask && { curl -O http://andrewacomb.me/data.zip ; unzip data.zip ; rm data.zip ; cd models ; curl -O http://andrewacomb.me/bamnet.md ; cd ..; cd ..}
docker build -t simplifi .
docker run -d -p 5000:5000 simplifi
Step 4. Open your browser and go to http://localhost:5000/
Type your query into the search bar. Make sure to include the stock ticker of a public company such as $FB (Facebook) or $XOM (Exxon-Mobil).
Yu Chen, Lingfei Wu, Mohammed J. Zaki. "Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases." In Proc. 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT2019). June 2019.