Skip to content

Commit ba25bf0

Browse files
committed
Preparing Release 5.0
1 parent 9c26195 commit ba25bf0

File tree

18 files changed

+50217
-157
lines changed

18 files changed

+50217
-157
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ The `Alphabet` class provides some static fields to access already defined alpha
4242
- `Alphabet.EVA` is the Basic EVA alphabet.
4343

4444
- `Alphabet.UTF_16` is the UTF-16 char-set used in Java. This is the alphabet to be used to process "normal" (as non-Voynich) text files and strings.
45+
46+
- `Alphabet.SLOT` is the Slot alphabet as defined in [this working note](https://mzattera.github.io/v4j/005/).
4547

4648

4749
### `io.github.mattera.v4j.text`
@@ -78,7 +80,7 @@ where multiple versions of each line in the manuscript are provided, one per aut
7880

7981
- **`AUGMENTED`**: This is an "augmented" version of the LSI transliteration where two "artificial" transcribers were created,
8082
each corresponding to one of `IvtffText.TranscriptionType` values; `IvtffText.TranscriptionType` can be used in factory methods described below to
81-
get one of these transcriptions.
83+
get one of these transcriptions. This transliteration is available both in EVA and Slot alphabet.
8284

8385
- **`CONCORDANCE`**: each line of this transliteration is created by merging readings from all available transcribers. Only characters that appears to be read
8486
in the same way by all authors are considered; other characters (read differently by one ore more transcribers) are marked as unreadable.

docs/001/index.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
## Note 001 - The Text
22

3-
_Last updated Sep. 6th, 2021._
3+
_Last updated Sep. 19th, 2021._
44

5-
_This note refers to [release v.1.0.0](https://github.com/mzattera/v4j/tree/v.1.0.0) of v4j;
5+
_This note refers to [release v.5.0.0](https://github.com/mzattera/v4j/tree/v.5.0.0) of v4j;
66
**links to classes and files refer to this release** and files might have been changed, deleted or moved in the current master branch.
77
In addition, some of this note content might have become obsolete in more recent versions of the library._
88

@@ -22,25 +22,29 @@ obtain an `IvtffText` instance with the Voynich text. At present the library pro
2222
Landini-Stolfi Interlinear file (**LSI**) and an augmented version of it, containing concordance and majority versions of the text.
2323

2424
The corresponding IVTFF files (which are read by the factory) can be found in the
25-
[resource folder](https://github.com/mzattera/v4j/tree/v.1.0.0/eclipse/io.github.mattera.v4j/src/main/resources/Transcriptions)
25+
[resource folder]()
2626
of the library.
2727

28-
The "augmented" version is created using class
29-
[`BuildConcordanceVersion`](https://github.com/mzattera/v4j/blob/d7b349c08c780214bebe3b515623f54951bb3886/eclipse/io.github.mzattera.v4j-apps/src/main/java/io/github/mattera/v4j/applications/BuildConcordanceVersion.java);
28+
The "augmented" EVA version is created using class
29+
[`BuildConcordanceVersion`]();
3030
the input for the class is a slightly modified version of LSI that can be found in the
31-
[v4j-apps resource folder](https://github.com/mzattera/v4j/tree/v.1.0.0/eclipse/io.github.mzattera.v4j-apps/src/main/resources/Transcriptions).
31+
[v4j-apps resource folder]().
3232
In this version, minor changes are done, that do not change the text content, in order to make sure
3333
all the different versions of the lines align properly, as required by `BuildConcordanceVersion` code.
3434

35+
Class
36+
[`BuildSlotVersion`]();
37+
is then used to transcribe the "augmented" version from EVA into Slot alphabeth.
38+
3539
### The Bible Text
3640

3741
Similarly, class
38-
[`BuildBibleTranscription`](https://github.com/mzattera/v4j/blob/d7b349c08c780214bebe3b515623f54951bb3886/eclipse/io.github.mzattera.v4j-apps/src/main/java/io/github/mattera/v4j/applications/BuildBibleTranscription.java)
42+
[`BuildBibleTranscription`]()
3943
is used to produce .txt version if the Bible from XML files that can be found in the
4044
[v4j-apps resource folder](https://github.com/mzattera/v4j/tree/v.1.0.0/eclipse/io.github.mzattera.v4j-apps/src/main/resources/Transcriptions).
4145

4246
The corresponding IVTFF files (which are read by the factory) can be found in the
43-
[resource folder](https://github.com/mzattera/v4j/tree/v.1.0.0/eclipse/io.github.mattera.v4j/src/main/resources/Transcriptions)
47+
[resource folder]()
4448
of the library.
4549

4650
---
64.2 KB
Loading

docs/005/images/Rare.PNG

1.53 KB
Loading

docs/005/images/Slot Alphabet.PNG

-442 Bytes
Loading

docs/005/index.md

Lines changed: 23 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Note 005 - Slots and a New Alphabet
22

3-
_Last updated Sep. 18th, 2021._
3+
_Last updated Sep. 19th, 2021._
44

55
_This note refers to [release v.5.0.0](https://github.com/mzattera/v4j/tree/v.5.0.0) of v4j;
66
**links to classes and files refer to this release**; files might have been changed, deleted or moved in the current master branch.
@@ -29,11 +29,9 @@ exists in any modern text as well. However, I will try to focus on claims that a
2929

3030
## Previous Works
3131

32-
Either here or at the end as "Comparison with other works".
32+
I am not the first one analyzing the internal structure of Voynich words.
3333

34-
**TODO** https://briancham1994.com/2014/12/17/curve-line-system/.
35-
36-
- This approach is easier to explain and has more implications.
34+
One day I will create a [working note](../006) to compare this analysis with others.
3735

3836

3937
## Methodology
@@ -112,11 +110,9 @@ where each of these parts is a regular term. I will call these tokens "**separab
112110
(or the space between them was not read correctly by the transcriber of the text).
113111
When I need to distinguish these terms from other separable terms, I will call them **verified separable** or simply **verified**.
114112

115-
**TODO** check the length of the parts and see if only short terms are joined. Check if separable tends to appear in tight spaces.
116-
117113
- Remaining 618 tokens (2.0% of total), corresponding to 429 different terms (8.4% of total), are marked as "**unstructured**".
118114

119-
**TODO** Show that vast majority of unstructured words appear only once in the text. This is probably true for separable too.
115+
Notice that 366 out of these 429 terms appear only once in the text.
120116

121117
- Sometime I contrast regular and separable terms to unstructured ones by calling the former ***structured***.
122118

@@ -129,12 +125,12 @@ The below table summarizes these findings.
129125
In short, almost 9 out of 10 tokens in the Voynich text exhibit a "slot" structure. Of the remaining, a fair amount can be decomposed in two parts each corresponding to regular terms
130126
appearing elsewhere in the text. The remaining cases (2 out of 100) are mostly words appearing only once in the text.
131127

132-
**TODO** Char count by slot
128+
The below table shows occurrences of glyphs in slots for the regular terms [{2}](#Note2).
133129

134-
**TODO** Decomposition by cluster.
130+
![Table with glyph count by slot.](images/Char Count by Slot.PNG)
135131

136132

137-
### The Voynich Alphabet
133+
## The Voynich Alphabet
138134

139135
The definition of the Voynich alphabet, that is of which glyphs should be considered a single Voynich character in the text, is still open.
140136
Each transcriber must continuously decide what symbols in the manuscript constitute instances of the same glyph and how each glyph needs to be mapped into
@@ -148,10 +144,11 @@ Below I analyze more in detail some relationships between glyphs, as they appear
148144

149145
#### Rare Characters
150146

151-
The EVA characters 'g', 'x', 'v', and 'u' appear in the text only very few times, mostly as single characters, as shown in the table below.
147+
Some EVA characters appears in the original interlinear transliteration very seldom, end even less frequently in the concordance version used,
148+
where they appear mostly as single characters, as shown in the table below.
152149
For this reason, I decided to ignore these characters and mark them as "unreadable character" for this analysis.
153150

154-
![Statistics about 'g', 'x', 'v', and 'u'](images/Rare.PNG)
151+
![Statistics about rare characters](images/Rare.PNG)
155152

156153
Notice that through the Voynich there are several glyphs which cannot be directly transliterated into EVA characters (so called "weirdoes");
157154
they are mostly ignored in any analysis of the text.
@@ -202,27 +199,23 @@ The below table defines the Slots alphabet and compares it with other transliter
202199

203200
![The Slot alphabet and a comparison with other transliteration alphabets](images/Slot Alphabet.PNG)
204201

205-
**TODO** i ii iii in alcuni alfabeti cambiano a seconda di come m r n sono trattate....evidenziarlo nella tabella.
206-
207-
**TODO** Create transliteration.
202+
* These alphabets treat sequence of EVA 'i' differently, depending on the letter following the sequence. Therefore there is no unique way to transliterate
203+
sequences of 'i' into these alphabets.
208204

209-
**TODO** Create HTML version.
205+
A transliteration of the Landini-Stolfi interlinear file is available within [v4j library](https://github.com/mzattera/v4j) and accessible using `VoynichFactory` factory methods.
210206

211207

212208
## Conclusions
213209

214-
- Inner structure of words, easier fro me to explain than Core or automata.
215-
216-
- Excludes any (simple) substitution cypher.
217-
218-
- This is the only alphabet that uses data-backed evidence in defining the char-set
210+
- I think the slots easily describe the inner structure of Voynich words.
219211

220-
- Voynich/EVA chars in slot cells constitute a morphological unit (character).
212+
- Given they prove a structure in Voynich words that is not found in other languages, any attempt to propose a substitution cypher fro the Voynich should not be accepted.
221213

222-
- It is important both for attacking the cypher and performing statistical analysis to have 1:1 mapping between Voynich and transliteration characters.
223-
224-
225-
214+
- I think it is important, both for attacking the Voynich cypher and performing statistical analysis of the manuscript,
215+
to have a one-to-one mapping between the Voynich characters and those in the transliteration alphabet.
216+
As far as I know, Slot alphabet is the first one that is created by empirical data about the structure of Voynich words, trying to capture the intent of the
217+
Voynich author.
218+
226219

227220
---
228221

@@ -231,6 +224,9 @@ The below table defines the Slots alphabet and compares it with other transliter
231224
<a id="Note1">**{1}**</a> Class [`Slots`]() has been used to perform this analysis. An Excel with its output can be found in the
232225
[analysis folder]().
233226

227+
<a id="Note2">**{2}**</a> Class [`CountCharsBySlot`]() has been used to produce this table.
228+
229+
234230
---
235231

236232
[**<< Home**](..)

docs/006/index.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Note 006 - Other Works on Word Structure
2+
3+
_Last updated Sep. 19th, 2021._
4+
5+
_This note refers to [release v.5.0.0](https://github.com/mzattera/v4j/tree/v.5.0.0) of v4j;
6+
**links to classes and files refer to this release**; files might have been changed, deleted or moved in the current master branch.
7+
In addition, some of this note content might have become obsolete in more recent versions of the library._
8+
9+
_Working notes are not providing detailed description of algorithms and classes used; for this, please refer to the
10+
library code and JavaDoc._
11+
12+
_Please refer to the [home page](..) for a set of definitions that might be relevant for this working note._
13+
14+
[**<< Home**](..)
15+
16+
---
17+
18+
19+
## Abstract
20+
21+
This is Work in Progress.
22+
23+
The idea is to compare the [slot concept](https://briancham1994.com/2014/12/17/curve-line-system/) with other works in this area.
24+
25+
26+
27+
---
28+
29+
[**<< Home**](..)
30+
31+
Copyright Massimiliano Zattera.
32+
33+
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

docs/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ In other words, a token is an instance of a term. For example; the below line in
6161

6262
List of most common Voynichese terms and how they are split across different clusters.
6363

64+
- [Note 005 - Slots and a New Alphabet](./004)
65+
66+
I show how the structure of Voynich words can be explained by some simple rules, and how these can be used to derive the original Voynich alphabet.
67+
6468
---
6569

6670
Copyright Massimiliano Zattera.

eclipse/io.github.mattera.v4j/src/main/java/io/github/mattera/v4j/text/alphabet/SlotAlphabet.java

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@
66
import java.util.List;
77
import java.util.Map;
88

9+
import io.github.mattera.v4j.text.ivtff.IvtffLine;
10+
import io.github.mattera.v4j.text.ivtff.ParseException;
11+
912
/**
1013
* "Slot" alphabet based on "slot" theory.
1114
*
@@ -86,7 +89,7 @@ public String toString() {
8689

8790
@Override
8891
public String getCodeString() {
89-
return "Slt-";
92+
return "Slot";
9093
}
9194

9295
private final static char[] regularChars = { 'o', 'e', 'E', 'B', 'C', 'S', 'y', 'a', 'd', 'i', 'J', 'U', 'k', 'K',
@@ -220,15 +223,15 @@ protected SlotAlphabet() {
220223
}
221224

222225
/**
223-
* Converts a text from Basic EVA alphabet. It only works for plain texts (see
224-
* Text.getPlainText()).
226+
* Converts a text from Basic EVA alphabet.
225227
*
226-
* @param txt Plain text to be converted.
228+
* @param txt text to be converted.
229+
* @throws ParseException if text is not proper IVTFF text.
227230
*/
228-
public static String fromEva(String txt) {
229-
for (char c : txt.toCharArray())
230-
if (!Alphabet.EVA.isRegularOrSeparator(c) && !Alphabet.EVA.isUreadableChar(c))
231-
throw new IllegalArgumentException("Text is not a plain EVA text.");
231+
public static String fromEva(String txt) throws ParseException {
232+
233+
// Remove comments as they migth interfer with replacement
234+
txt = IvtffLine.removeComments(txt);
232235

233236
// TODO add support for illegible words
234237

@@ -257,13 +260,11 @@ public static String fromEva(String txt) {
257260
txt = txt.replace("v", "?");
258261
txt = txt.replace("x", "?");
259262
txt = txt.replace("u", "?");
263+
txt = txt.replace("j", "?");
264+
txt = txt.replace("b", "?");
265+
txt = txt.replace("z", "?");
260266
txt = txt.replace("'", "?");
261267

262-
// TODO test - REMOVEME
263-
for (char c : txt.toCharArray())
264-
if (!Alphabet.SLOT.isRegularOrSeparator(c) && !Alphabet.SLOT.isUreadableChar(c))
265-
throw new UnsupportedOperationException("Something went wrong in conversion");
266-
267268
return txt;
268269
}
269270

eclipse/io.github.mattera.v4j/src/main/java/io/github/mattera/v4j/text/ivtff/IvtffLine.java

Lines changed: 9 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -70,30 +70,6 @@ public IvtffLine(IvtffLine other) {
7070
setParent(other.getParent());
7171
}
7272

73-
/**
74-
* Locus identifiers have the following format:
75-
*
76-
* < page . num , code >
77-
*
78-
* Or : < page . num , code ; T >
79-
*
80-
* Whitespace is not allowed inside locus identifiers, but it is used in the
81-
* patterns above for clarity. The fields have the following meaning:
82-
*
83-
* page The page name, which has to match the most recent page header.
84-
*
85-
* num A sequence number, incrementing from 1 for each page. The highest number
86-
* that presently occurs is 160.
87-
*
88-
* code A 3-character code, which is a 1-character locator followed by a
89-
* 2-character locus type
90-
*
91-
* T An optional single-character transcriber ID. Only used in interlinear files
92-
* that include several parallel transcriptions.
93-
*/
94-
private final static Pattern locusIdentifier = Pattern
95-
.compile("<(f[0-9]{1,3}[rv][0-9]?|fRos)\\.([0-9]{1,3}[a-z]?),([\\+\\*\\-=&~@/][PLCR].)(;.)?>");
96-
9773
/**
9874
* Creates a new instance parsing given input string.
9975
*
@@ -124,13 +100,8 @@ public IvtffLine(String txt) throws ParseException {
124100
public IvtffLine(String row, int rowNum, Alphabet a) throws ParseException {
125101
super(a);
126102

127-
if (!row.startsWith("<"))
128-
throw new ParseException("Missing locus indentifier", row, rowNum);
129-
130-
row = row.trim();
131-
132103
// TODO check right combination of generic and complete type for the locus type
133-
Matcher m = locusIdentifier.matcher(row);
104+
Matcher m = IvtffText.LOCUS_IDENTIFIER_PATTERN.matcher(row);
134105
if (!m.find() || (m.start() != 0)) {
135106
throw new ParseException("Missing or malformed locus identifier", row, rowNum);
136107
}
@@ -169,7 +140,7 @@ private String normalizeText(String text) throws ParseException {
169140
for (int i = 0; i < txt.length(); ++i)
170141
if (!alphabet.isRegular(txt.charAt(i)) && !alphabet.isWordSeparator(txt.charAt(i))
171142
&& !alphabet.isUreadableChar(txt.charAt(i)))
172-
throw new ParseException("Line contains invalid characters", text);
143+
throw new ParseException("Line contains invalid characters", text + " ['" + txt.charAt(i) + "']");
173144

174145
return getAlphabet().toPlainText(txt);
175146
}
@@ -485,8 +456,14 @@ public static IvtffLine merge(List<IvtffLine> lines, TranscriptionType type) thr
485456
for (IvtffLine l : lines)
486457
copy.add(new IvtffLine(l));
487458

488-
if (!align(copy))
459+
if (!align(copy)) {
460+
461+
// TODO remove debug code
462+
for (IvtffLine l : lines)
463+
System.out.println(l);
464+
489465
throw new ParseException("Cannot align the transcriptions.");
466+
}
490467

491468
IvtffLine merged = null;
492469
switch (type) {

0 commit comments

Comments
 (0)