Skip to content

Commit f3bfd31

Browse files
committed
Release 4.0
1 parent e04c3dc commit f3bfd31

File tree

11 files changed

+114
-9
lines changed

11 files changed

+114
-9
lines changed

docs/001/index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22

33
_Last updated Sep. 6th, 2021._
44

5-
_This note refers to [release v.1.0.0](https://github.com/mzattera/v4j/tree/v.1.0.0) of v4j.
6-
Some of the content might not apply to more recent versions of the library._
5+
_This note refers to [release v.1.0.0](https://github.com/mzattera/v4j/tree/v.1.0.0) of v4j;
6+
**links to classes and files refer to this release** and files might have been deleted or removed in the current master branch.
7+
In addition, some of this note content might have become obsolete in more recent versions of the library._
78

89
_Working notes are not providing detailed description of algorithms and classes used, for this, please refer to the
910
library code and JavaDoc._

docs/002/index.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22

33
_Last updated Sep. 6th, 2021._
44

5-
_This note refers to [release v.2.0.0](https://github.com/mzattera/v4j/tree/v.2.0.0) of v4j.
6-
Some of the content might not apply to more recent versions of the library._
5+
_This note refers to [release v.2.0.0](https://github.com/mzattera/v4j/tree/v.2.0.0) of v4j;
6+
**links to classes and files refer to this release** and files might have been deleted or removed in the current master branch.
7+
In addition, some of this note content might have become obsolete in more recent versions of the library._
78

89
_Working notes are not providing detailed description of algorithms and classes used, for this, please refer to the
910
library code and JavaDoc._
@@ -17,7 +18,7 @@ The class
1718
prints out some statistics for the Voynich pages, such as the illustration type, the Voynich "language", etc. in .CSV format.
1819

1920
An Excel file ("`PageStatistics.xlsx`") with a collection of these statistics can be found under the
20-
[analysis folder](https://github.com/mzattera/v4j/tree/v.2.0.0/resources/analysis).
21+
[analysis folder](https://github.com/mzattera/v4j/tree/master/resources/analysis).
2122

2223
---
2324

docs/003/index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22

33
_Last updated Sep. 9th, 2021._
44

5-
_This note refers to [release v.3.0.0](https://github.com/mzattera/v4j/tree/v.3.0.0) of v4j.
6-
Some of the content might not apply to more recent versions of the library._
5+
_This note refers to [release v.3.0.0](https://github.com/mzattera/v4j/tree/v.3.0.0) of v4j;
6+
**links to classes and files refer to this release** and files might have been deleted or removed in the current master branch.
7+
In addition, some of this note content might have become obsolete in more recent versions of the library._
78

89
_Working notes are not providing detailed description of algorithms and classes used, for this, please refer to the
910
library code and JavaDoc._

docs/004/images/Terms.PNG

24.3 KB
Loading

docs/004/images/Unique.PNG

7.3 KB
Loading

docs/004/index.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
## Note 004 - On Terms
2+
3+
_Last updated Sep. 9th, 2021._
4+
5+
_This note refers to [release v.4.0.0](https://github.com/mzattera/v4j/tree/v.4.0.0) of v4j;
6+
**links to classes and files refer to this release** and files might have been deleted or removed in the current master branch.
7+
In addition, some of this note content might have become obsolete in more recent versions of the library._
8+
9+
_Working notes are not providing detailed description of algorithms and classes used, for this, please refer to the
10+
library code and JavaDoc._
11+
12+
[**<< Home**](..)
13+
14+
---
15+
16+
The class
17+
['MostUsedTerms']()
18+
finds top 20 most used terms for each cluster defined in [Note 003](../003) and prints out the result in .CSV format.
19+
20+
An Excel file ("`MostUsedTerms.xlsx`") containing this data can be found under the
21+
[analysis folder](https://github.com/mzattera/v4j/tree/master/resources/analysis).
22+
23+
The below table summarizes the results.
24+
25+
![Most used terms](images/Terms.PNG)
26+
27+
As expected from cluster analysis, beside terms that appear frequently in all clusters (such as **daiin**, **dar**, **dy**, **ol**, **or**),
28+
there are terms characteristic of a single cluster; the table below shows them.
29+
30+
![Most used terms](images/Unique.PNG)
31+
32+
---
33+
34+
[**<< Home**](..)
35+
36+
Copyright Massimiliano Zattera.
37+
38+
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

docs/index.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@ In both cases, we refer either to the process of capturing a text (typically the
4040

4141
This should be considered when applying statistical analysis methods to the manuscript.
4242

43+
- [Note 004 - On Terms](./004)
44+
45+
List of most common Voynichese terms and how they are split across different clusters.
4346

4447
---
4548

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
/**
2+
*
3+
*/
4+
package io.github.mattera.v4j.applications;
5+
6+
import java.util.List;
7+
import java.util.Map.Entry;
8+
9+
import io.github.mattera.v4j.text.ivtff.IvtffText;
10+
import io.github.mattera.v4j.text.ivtff.PageFilter;
11+
import io.github.mattera.v4j.text.ivtff.PageHeader;
12+
import io.github.mattera.v4j.text.ivtff.VoynichFactory;
13+
import io.github.mattera.v4j.text.ivtff.VoynichFactory.Transcription;
14+
import io.github.mattera.v4j.text.ivtff.VoynichFactory.TranscriptionType;
15+
import io.github.mattera.v4j.util.Counter;
16+
17+
/**
18+
* Shows for each cluster the list of most frequent words.
19+
*
20+
* @author Massimiliano "Maxi" Zattera
21+
*
22+
*/
23+
public final class MostUsedTerms {
24+
25+
private final static Transcription TRANSCRIPTION = Transcription.MZ;
26+
private final static TranscriptionType TRANSCRIPTION_TYPE = TranscriptionType.MAJORITY;
27+
28+
private MostUsedTerms() {
29+
}
30+
31+
/**
32+
* @param args
33+
*/
34+
public static void main(String[] args) {
35+
try {
36+
System.out.println("Using transcription : " + TRANSCRIPTION);
37+
System.out.println("Using transcription type: " + TRANSCRIPTION_TYPE);
38+
39+
IvtffText voy = VoynichFactory.getDocument(TRANSCRIPTION, TRANSCRIPTION_TYPE);
40+
System.out.println("\nCluster;Term;Count;Rel. Freq.");
41+
42+
// Entropy for Voynich sections
43+
for (String cluster : PageHeader.CLUSTERS) {
44+
IvtffText doc = voy.filterPages(new PageFilter.Builder().cluster(cluster).build());
45+
46+
Counter<String> readableTokens = doc.getWords(true);
47+
int totTokens = doc.getWords(false).getTotalCounted();
48+
List<Entry<String, Integer>> words = readableTokens.reversed();
49+
int m = Math.min(words.size(), 20);
50+
for (int i = 0; i < m; ++i) {
51+
System.out.println(cluster + ";" + words.get(i).getKey() + ";" + words.get(i).getValue() + ";"
52+
+ ((double) words.get(i).getValue() / totTokens));
53+
}
54+
}
55+
} catch (Exception e) {
56+
e.printStackTrace();
57+
} finally {
58+
System.out.println("Completed.");
59+
}
60+
}
61+
}
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
/**
22
*
33
*/
4-
package io.github.mattera.v4j.applications;
4+
package io.github.mattera.v4j.applications.text;
55

66
import java.io.File;
77
import java.io.IOException;
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
package io.github.mattera.v4j.applications;
1+
package io.github.mattera.v4j.applications.text;
22

33
import java.io.BufferedReader;
44
import java.io.BufferedWriter;

resources/analysis/MostUsedTerms.xlsx

27.2 KB
Binary file not shown.

0 commit comments

Comments
 (0)