Clic me to read the NYC SafeGo Application Article
Click me for presentation slide.
NAME | RAW DATA LOCATION ON DUMBO |
---|---|
crime | hdfs:///user/yl6127/project/NYPD_Complaint_Data_Current.csv hdfs:///user/yl6127/project/NYPD_Complaint_Data_Historic.csv |
311 | hdfs:///user/jl8456/BDAD/project/311_data.csv |
Weather | hdfs:///user/cy1505/proj/noaa_nyc_weather_2010-2019.csv |
NYC Street Centerline (CSCL) | hdfs:///user/cy1505/proj/Centerline.csv |
├── data ingest
│ └── crime_data_Ingest.txt
│ └── 311_data_Ingest.txt
│ └── weather_data_Ingest.txt
│ └── map_data_Injest.txt
|
├── profiling_code
│ ├── 311
│ │ ├── 311_profile.scala
│ ├── NYPD_Complaint
│ │ ├── crime_profile.scala
│ ├── weather
│ │ ├── weather_profile.scala
│ ├── map
│ │ ├── map_profile.scala
│ │
├── etl_code
│ ├── 311
│ │ ├── 311_etl.scala
│ ├── NYPD_Complaint
│ │ ├── crime_etl.scala
│ ├── weather
│ │ ├── weather_etl.scala
│ ├── map
│ │ ├── map_etl.scala
│ │
├── app_code
│ ├── crime_analysis.scala
│ ├── machine_learning
│ ├── backend
│ ├── frontend
├── screenshots
└── Readme
NAME | DESCRIPTION |
---|---|
data ingest | Upload data from local file system to Dumbo Cluster, and then ingest to HDFS |
profiling_code | Profile the data using Spark-shell |
etl_code | Clean the data using Spark-shell |
code_iteration | Conduct data analytics using Spark |
-
data ingest:
Utilizescp
command to upload the datasets from local file system to Dumbo Cluster. In Dumbo, usinghdfs dfs -put file /user/yourNetID/project
to ingest the data into Hadoop HDFS. -
profiling_code: Utilize Spark to profile three datasets.
- Compile and run code for 311 data:
- spark2-shell -i crime_profile.scala
- Compile and run code for NYPD Complaint data:
- spark2-shell -i 311_profile.scala
- Compile and run code for Weather data:
spark2-shell -i weather_profile.scala
- Compile and run code for Map data:
spark2-shell -i street(centerline)_profiling.scala
- etl_code: Utilize Spark to clean three datasets.
- Compile and run code for 311 data:
spark2-shell -i 311_etl.scala
- Compile and run code for NYPD Complaint data:
spark2-shell -i crime_etl.scala
- Compile and run code for Weather data:
spark2-shell -i weather_etl.scala
- Compile and run code for Map data:
spark2-shell -i map_etl.scala
- app_code:
- code_analysis.scala:
- spark2-shell -i code_analysis.scala
- backend
-
The backend of the server is implemented using Apache Flask.
-
app.py
- Our application (backend) that implements Safe Route API. -
API inputs:
-
- start point latitude
-
- start point longitude
-
- end point latitude
-
- end point longitude
-
-
API output:
- A json object with two fields that describes the safest route and the shortest route between start point and end point. One field is named
safest
and another one is namedshortest
. Each field is an array of street objects (please see more about the street object below.)
{ "safest":[...], ... "shortest":[...], ... }
- A json object with two fields that describes the safest route and the shortest route between start point and end point. One field is named
-
Both routes are implemented using Dijkstra's SSSP (Single Source Shortest Path) Algorithm.
-
The street object returned by the api contains the following fields:
-
sid
: Each street has a unique number id, this field is unique. -
streetLen
: a number denoting the physical street length of each street -
crimeIndex
: a number denoting the safety extent of a street -
Polygons
: an array of latitudes and longitudes numbers that represents the vertex of the street.
-
-
How to run this application:
-
We have deploy this application online. Simply visit the deployed website of this application.
-
Or, if you prefer only to run the backend locally, you may deploy the backend code using Apache Flask use the following command: ``` python3 app.py
-
-
How to extract the results:
-
The backend api will take
get
requests from the following path:/index?startLat=[a]&startLng=[b]&endLat=[c]&endLng=[d]
-
Please replace
[a]
to start point latitude,[b]
to start point longitude,[c]
to end point latitude, and[d]
to end point latitude. -
Once the parameters are inputted correctly, the user will receive a 200 status code and the result will be send back in the json form mentioned above. Otherwise there will be a 400 status code and a return value of
Bad Request
.
-
- frontend
-
The webpage frontend of the overall application.
-
src/components/GoogleMapLoader.vue
: -
This is a component that uses the google map API
-
src/components/Search.vue
: -
Search component that enable users to type in start point and destination
-
src/components/Route.vue
: -
Display the route sent back by backend
-
If you want to run the frontend locally and separately, use the following command:
npm install
npm run serve
* If you want to run the frontend districbutedly, use the following command:
npm install
npm run build
- machine-learning
- The machine learning portion is implemented using Spark MLLib. The model is trained on DUMBO, and then downloaded and integrated into the application.
- To run the machine learning inference independently, please use the following command (on dumbo)
- Under the root, use
sbt package
to get*.jar
files - To train the crime model
spark2-shell -i train.scala
- If run inference in shell, use
spark2-shell -i inference.scala
- If run inference as spark job, use
spark2-submit --class inference --master yarn --deploy-mode client infernce.jar
- Under the root, use
- Yue Luo: 1/3 analytics + website frontend
- Jiaqi Liu: 1/3 analytics + backend
- Cong Yu: 1/3 analytics + Machine Learning