You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: tutorial/setup_project.md
+169-26
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,29 @@
1
1
Setup a project using MLV-tools
2
2
===============================
3
3
4
-
**GOAL**: TODO
4
+
5
+
The aim of this tutorial is to understand how to setup a Machine Learning project
6
+
development environment using MLV-tools. It explains how to:
7
+
8
+
- Generate Python 3 scripts and DVC pipeline from Jupyter Notebooks
9
+
- Re-use pipeline steps with different I/O and parameters
10
+
- Create an experiment using git branches
11
+
- Re-run a pipeline with input changes
12
+
13
+
5
14
6
15
Project example
7
16
----------------
8
17
9
18
This tutorial is based on a text classification pipeline.
10
19
11
-
**Dataset:** a set of labeled reviews from TripAdvisor (TODO ref) (review + rating)
20
+
**Dataset:** a set of labeled reviews from Trip Advisor.
21
+
22
+
> This dataset is a cleaned extract (2) of the publicly available TripAdvisor dataset(1).
23
+
24
+
>(1) Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis on review text data: A rating regression approach. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010). pp. 783–792. Washington, US (2010))
25
+
26
+
>(2) Marcheggiani, D., Täckström, O., Esuli, A, Sebastiani, F.: Hierarchical Multi-Label Conditional Random Fields for Aspect-Oriented Opinion Mining. In: Proceedings of the 36th European Conference on Information Retrieval (ECIR 2014).
12
27
13
28
To each review is associated a star rating from 1 to 5. We treat these values as categorical and tackle the problem as
14
29
a classification problem given the small number of labels.
@@ -45,9 +60,6 @@ Create a workdir and copy resources:
45
60
git add .
46
61
git commit -m 'Project Initialization'
47
62
48
-
dvc init
49
-
git commit -m 'Project DVC Initialization'
50
-
51
63
52
64
**Project structure:**
53
65
@@ -75,19 +87,31 @@ Setup the environment
75
87
76
88
77
89
cd ..
78
-
virtualenv venv -p /usr/bin/python3
90
+
virtualenv venv -p /usr/bin/python3.6 (in provided docker: /usr/local/bin/python3.6)
79
91
. ./venv/bin/activate
80
92
81
93
- Or a conda env
82
94
95
+
83
96
cd ..
84
97
conda create -n venv python=3 pip
85
98
conda activate venv
86
99
87
100
Install dependencies:
88
101
89
-
make -C project setup
90
102
cd ./project
103
+
make setup
104
+
105
+
106
+
Initialize DVC
107
+
---------------
108
+
109
+
In `sandbox/project`, run:
110
+
111
+
dvc init
112
+
git commit -m 'Project DVC Initialization'
113
+
114
+
91
115
92
116
Step 1: create the project configuration
93
117
----------------------------------------
@@ -280,6 +304,16 @@ Perform the **Step 3** for all remaining notebooks.
280
304
MLV-tools notebooks are availables in: `./resources/setup_project/solution/mlvtools`
0 commit comments