Skip to content

Commit 1ff6594

Browse files
committed
Finalising release 3.0
1 parent 24477f6 commit 1ff6594

File tree

7 files changed

+24
-76
lines changed

7 files changed

+24
-76
lines changed

README.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ punctuation) one of them is chosen as the default "space" character (returned by
3434

3535
Special characters also include "unreadable" characters that are used (e.g. in the EVA alphabet) to mark illegible characters in the original text.
3636

37-
The `Alphabet` class is abstract; to provide an actual implementation simply extend this class and provide methods that
37+
The `Alphabet` class is abstract; to provide an actual implementation, simply extend this class and provide methods that
3838
list characters accordingly to their category.
3939

4040
The `Alphabet` class provides some static fields to access already defined alphabets:
@@ -98,13 +98,25 @@ which contains IVTFF metadata for the line, namely the locus identifier and the
9898
copies of the same line exists with different transcribers.
9999

100100
In addition to inherited methods `filterElements()` and `splitElements()`, the methods `filterPages()`, `filterLines()`, `splitPages()`, and `splitLines()`
101-
can be used to create IVTFF documents by filtering and/or splitting content of an existing document. Again, please refer to JavaDoc fro more details.
101+
can be used to create IVTFF documents by filtering and/or splitting content of an existing document. Again, please refer to JavaDoc for more details.
102+
Also notice that, based on [working note 003](https://mzattera.github.io/v4j/003/), `PageHeader` exposes a cluster for each page in the manuscript;
103+
this information can be used to filter or split the manuscripts into clusters.
102104

103105
```Java
104-
/* Get all biological pages (MAJORITY transcription) */
106+
/* Get a document containing all and only biological pages (MAJORITY transcription) */
105107

106108
IvtffText doc = VoynichFactory.getDocument(TranscriptionType.MAJORITY);
107109
doc = doc.filterPages(new PageFilter.Builder().illustrationType("B").build());
110+
111+
/*
112+
Split the manuscript into clusters (see https://mzattera.github.io/v4j/003/)
113+
114+
clusterMap will match any cluster name (see PageHeader.CLUSTERS) with a IvfttText
115+
with pages in that cluster.
116+
*/
117+
118+
IvtffText = VoynichFactory.getDocument(TranscriptionType.CONCORDANCE);
119+
Map<String, IvtffText> clusterMap = doc.splitPages(new PageSplitter.Builder().byCluster().build());
108120
```
109121

110122
### Other (Regular) Texts - `io.github.mattera.v4j.text.txt`

docs/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,13 @@ In other terms, a token is an instance of a term. For example the below line in
3030
- [Note 002 - Some Basic Statistics](./002)
3131

3232
An Excel file with basic page statistics, useful to build pivots.
33+
34+
- [Note 003 - Clustering](./003)
3335

36+
Application of t-SNE visualization and K-Means clustering to the Voynich, showing how page with same illustration type and
37+
Courier's language also share similar words.
38+
39+
This should be considered when applying statistical analysis methods to the manuscript.
3440

3541

3642
---

eclipse/io.github.mattera.v4j/.classpath

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@
2525
</classpathentry>
2626
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.8">
2727
<attributes>
28-
<attribute name="module" value="true"/>
2928
<attribute name="maven.pomderived" value="true"/>
3029
</attributes>
3130
</classpathentry>

eclipse/io.github.mattera.v4j/src/main/java/io/github/mattera/v4j/text/alphabet/Alphabet.java

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,10 @@
1616
*/
1717
public abstract class Alphabet {
1818

19-
public final static Alphabet EVA = new Eva();
19+
public final static Alphabet EVA = new EvaAlphabet();
2020

2121
public final static Alphabet UTF_16 = new JavaCharset();
2222

23-
public final static Alphabet SLOT = new Slot();
24-
2523
/**
2624
* @return a string code for this alphabet, same as that used in the IVTFF file.
2725
*/

eclipse/io.github.mattera.v4j/src/main/java/io/github/mattera/v4j/text/alphabet/Eva.java renamed to eclipse/io.github.mattera.v4j/src/main/java/io/github/mattera/v4j/text/alphabet/EvaAlphabet.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
* @author Massimiliano "Maxi" Zattera
1313
*/
1414
// TODO rename to EVA or EVA extended based on what we really support
15-
public final class Eva extends Alphabet {
15+
public final class EvaAlphabet extends Alphabet {
1616

1717
@Override
1818
public String getCodeString() {
@@ -82,6 +82,6 @@ public char[] getUnreadableChars() {
8282
return unreadableChars;
8383
}
8484

85-
protected Eva() {
85+
protected EvaAlphabet() {
8686
}
8787
}

eclipse/io.github.mattera.v4j/src/main/java/io/github/mattera/v4j/text/alphabet/Slot.java

Lines changed: 0 additions & 66 deletions
This file was deleted.

eclipse/io.github.mzattera.v4j-apps/.classpath

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@
2626
</classpathentry>
2727
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.8">
2828
<attributes>
29-
<attribute name="module" value="true"/>
3029
<attribute name="maven.pomderived" value="true"/>
3130
</attributes>
3231
</classpathentry>

0 commit comments

Comments
 (0)