Fixing Release 003

mzattera · mzattera · commit 6edc4fb5339e · 2021-09-09T15:03:07.000+02:00
diff --git a/README.md b/README.md
@@ -145,9 +145,9 @@ The class can build a BoW where dimensions can be (see `BagOfWordsMode`):
 Notice this class is `Clusterable`, thus can be used with the Apache clustering API where subclasses of `Clusterer<T extends Clusterable>`
 are used to cluster set of `Clusterable` instances.
 
-#### K-Means Clustering
+#### K-Means Clustering - `io.github.mattera.v4j.util.clustering`
 
-Below an example of how BoW insances can be clustered:
+Below an example of how BoW instances can be clustered:
 
 ```Java
 // Distance measure for clustering
@@ -191,10 +191,6 @@ clusters.get(i).getPoints();
 ...
 ```
 
-#### K-Means Clustering
-
-TODO
-
 
 ### Useful Stuff - `io.github.mattera.v4j.util`
 
diff --git a/docs/003/index.md b/docs/003/index.md
@@ -232,7 +232,7 @@ Courier's languages reflect language differences in the underlying "clear" text.
 these differences should be kept in mind when performing statistical analysis of the text or when trying it decipherment.
 
   For this reason v4j library provides means to classify pages accordingly to above considerations, the resulting clusters are shown below
-  (also refer to[`PageHeader`]() class.
+  (also refer to[`PageHeader`](https://github.com/mzattera/v4j/blob/v.3.0.0/eclipse/io.github.mattera.v4j/src/main/java/io/github/mattera/v4j/text/ivtff/PageHeader.java) class.
  
   ![Cluster size in words](images/Clusters.PNG)
 
@@ -243,17 +243,19 @@ these differences should be kept in mind when performing statistical analysis of
 
 <a id="Note1">**{1}**</a> See [v4j README](https://github.com/mzattera/v4j#alphabet).
 
-<a id="Note2">**{2}**</a> The class [`OutlierDetection`]() is used to calculate average distance of each page from other
-pages in the text. The output of the class (`PageEmbeddingDistance.xlsx`) can be found in the [analysis folder]().
+<a id="Note2">**{2}**</a> The class
+[`OutlierDetection`](https://github.com/mzattera/v4j/blob/v.3.0.0/eclipse/io.github.mzattera.v4j-apps/src/main/java/io/github/mzattera/v4j/applications/clustering/OutlierDetection.java)
+is used to calculate average distance of each page from other pages in the text. The output of the class (`PageEmbeddingDistance.xlsx`) can be found in the
+[analysis folder](https://github.com/mzattera/v4j/tree/master/resources/analysis).
 
 <a id="Note3">**{3}**</a> The class [`BuildBoW`]() can be used to generate data suitable for visualization that can
 be uploaded to the TensorFlow projector. The output of this class, in the form of a "vector" and "metadata" .TSV files,
-can be found in [this folder]() both for single pages or entire parchments.
+can be found in [this folder](https://github.com/mzattera/v4j/tree/master/docs/003/data) both for single pages or entire parchments.
 
 <a id="Note4">**{4}**</a> Class `KMeansClusterByWords`
 does the K-Means clustering and prints out a report that can be easily converted in an Excel file.
 The class can be parameterized to run different types of experiments; its outputs, with some additional data,
-can be found as Excel files in the [analysis folder]().
+can be found as Excel files in the [analysis folder](https://github.com/mzattera/v4j/tree/master/resources/analysis).
 Keep in mind K-Means algorithm include some randomness, therefore slightly different clustering might result at each experiment.
  
 ---
diff --git a/docs/index.md b/docs/index.md
@@ -1,4 +1,4 @@
-## Welcome to GitHub Pages
+## Welcome
 
 Hi, in these pages I store thoughts, working notes, rants and frustrations about the [Voynich manuscript](https://en.wikipedia.org/wiki/Voynich_manuscript)
 as resulting from my work with the [v4j library](https://github.com/mzattera/v4j).
@@ -7,20 +7,20 @@ as resulting from my work with the [v4j library](https://github.com/mzattera/v4j
 
 In the below notes, we will try to be consistent with following terminology.
 
-- A **token** in the Voynich is a single sequence of characters, separated by spaces. A **term** represent the set of identical tokens.
+- A **token** in a text is a single sequence of characters, separated by spaces. A **term** represents a set of identical tokens.
 In other terms, a token is an instance of a term. For example the below line in the Voynich text:
 
   ```
   <f1r.15,+P0;m> daiin shckhey ckhor chor shey kol chol chol kor chol
   ```
   
   Contains 10 tokens ("daiin", "shckhey", "ckhor", "chor", "shey", "kol", "chol", "chol", "kor", "chol") which are instances of 
-  8 terms ("daiin", "shckhey", "ckhor", "chor", "shey", "kol", "chol" "kor").
+  8 terms ("daiin", "shckhey", "ckhor", "chor", "shey", "kol", "chol", "kor").
   
   When the distinction is not relevant, I might loosely use "word" (often in quotes) to refer to either tokens or terms. 
 
-- Terms "transcription" and "transliteration" are used more or less interchangeably, though the latter is more correct.
-In both case we refer either to the process of capturing a text (typically the Voynich) in a file or to the outcome of such process.
+- The terms "**transcription**" and "**transliteration**" are used more or less interchangeably, though the latter is more correct.
+In both cases, we refer either to the process of capturing a text (typically the Voynich) in a file or to the outcome of such process.
 
 ### Working Notes