Visualizing Crisis News Briefs

The goal of this project was to take texts containing summaries of important news events around the world (”Crisis News Briefs”) and extract meaningful data to reveal spatial, temporal and categorical trends. Working with our client, UNICEF, we found that it was important to show broad trends while still making the most granular parts of the data accessible for users. Natural language processing tools, including Latent Dirichlet Allocation (LDA), Name Entity Recognition (NER), text parsing tools and k-means clustering were used to process the raw textual data provided by the client. Briefs are sent out on a daily basis, which provided reliable date information for temporal trends. Geospatial data was extracted from the texts and mapped on the country level, unless documents specified the news stories as regional or global in scope.

Link to working demo

http://adysevy.github.io/unicef/WebApp/

Our full GitHub respository, including original data and scripts to parse text files, generate entities, etc is here.

A live demo can also be found through this repository here.

Run visualization localy

Clone the repository
Navigate to unicef/WebApp
Run a local Web server python -m SimpleHTTPServer 8888 &

Data Pre Proccessing

Running the preprocessing scripts are not necessary for running the app, but for reproducibility purposes, we outlined the stages below. The data preproccessing is comprised out of several Python scripts that should be ran from the "Preproccessing" folder:

Parsing the data, assigning categories and getting geo location: create_map_categories_df.py
Extracting entities: extract_ner.py
Normalizing entities: normalize_entities.py

In order to run those scripts, some environment configurations are required:

Downloding Packages

    easy_install python-docx
    sudo pip install -U nltk
    easy_install -U gensim

Configure Packages

Python_docx: edit file according to: python-openxml/python-docx#85
NLTK: Download NTLK stuff:

    import nltk
    nltk.download() #download stop words from Corpora and punkt from Models

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
WebApp		WebApp
css		css
data		data
images		images
javascript		javascript
scripts		scripts
.gitignore		.gitignore
README.md		README.md
deploy.sh		deploy.sh
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visualizing Crisis News Briefs

Link to working demo

Run visualization localy

Data Pre Proccessing

Downloding Packages

Configure Packages

About

Releases

Packages

Contributors 3

Languages

NYU-CS6313-Projects/sp2015-group9

Folders and files

Latest commit

History

Repository files navigation

Visualizing Crisis News Briefs

Link to working demo

Run visualization localy

Data Pre Proccessing

Downloding Packages

Configure Packages

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages