You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/006/index.md
+58-14Lines changed: 58 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# Note 006 - Works on Word Structure
2
2
3
-
_Last updated Oct. 23rd, 2021._
3
+
_Last updated Jan. 9th, 2022._
4
4
5
-
_This note refers to [release v.5.0.0](https://github.com/mzattera/v4j/tree/v.5.0.0) of v4j;
5
+
_This note refers to [release v.6.0.0](https://github.com/mzattera/v4j/tree/v.6.0.0) of v4j;
6
6
**links to classes and files refer to this release**; files might have been changed, deleted or moved in the current master branch.
7
7
In addition, some of this note content might have become obsolete in more recent versions of the library._
8
8
@@ -16,10 +16,11 @@ _Please refer to the [home page](..) for a set of definitions that might be rele
16
16
---
17
17
18
18
19
-
In this page I will list, review, and comment works from different authors about the structure of Voynich words.
20
-
When appropriate I will compare their findings with my [slots concept](../005).
19
+
In this page I will list, review, and comment works from different authors about the inner structure of Voynich words.
20
+
When appropriate, I will compare their findings with my [slots concept](../005).
21
21
22
22
I expect these notes to grow and refine over time (as for the others, to be honest).
23
+
Number in square brackets indicate the date when corresponding works were published (as far as I can determine it).
23
24
24
25
25
26
# John H. Tiltman [1967]
@@ -31,6 +32,7 @@ place in an "order of precedence" within words; some symbols such as
31
32
'o' and 'y' seem to be able to occupy two functionally different places._"
32
33
33
34
35
+
34
36
# Mike Roe [1997]
35
37
36
38
I found the below "generic word" grammar by Roe quoted by [Zandbergen](http://www.voynich.nu/a3_para.html) as published to the Voynich MS mailing list. Roe suggested that this could perhaps present evidence of grammar of the Voynich language:
@@ -40,19 +42,63 @@ Image from Zandbergen's website.
40
42

41
43
42
44
45
+
43
46
# Jorge Stolfi [2000]
44
47
45
-
[Describes](https://www.ic.unicamp.br/~stolfi/voynich/97-11-12-pms/) a decomposition of Voynichese words into three parts; prefix, midfix, and suffix.
48
+
Stolfi initially describes a [decomposition of Voynichese words](https://www.ic.unicamp.br/~stolfi/voynich/97-11-12-pms/) into three parts; prefix, midfix, and suffix.
46
49
Based on a classification of EVA characters into soft and hard letters, he then shows how Voynichese words can be decomposed into
47
50
a prefix and suffix made entirely of soft letters, and a midfix made entirely of hard letters.
48
51
49
-
This is in line with the slots model, the picture below shows glyphs in their corresponding slots and how they map
50
-
into Stolfi definitions (red glyphs are "soft" letteres).
52
+
This is well in line with the slots model. The picture below shows glyphs in their corresponding slots and how they map
53
+
into Stolfi definitions (red glyphs are "hard" letters while blue represents "soft" ones).
54
+
55
+

56
+
57
+
He continues his analysis with the "[OKOKOKO](https://www.ic.unicamp.br/~stolfi/voynich/Notes/017/Note-017.html)"
58
+
paradigm, to describe the fine structure of Voynichese words; finally,
59
+
Stolfi develops these concepts into his well known "[crust-mantle-core](https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/00-06-07-word-grammar/)"
60
+
decomposition that he describes by using a [formal grammar](https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/00-06-07-word-grammar/txt.n.html).
61
+
62
+
Accordingly to this model, each Voynich word can be divided into three layers, each containing the others in an onion-skin pattern, so, the core is at the center of words,
63
+
surrounded by the mantle, which in turn is surrounded by the crust. Each layer can be optionally empty and is, in general, defined by the letters it contains.
64
+
65
+
Leaving aside letters 'a', 'o', 'e', and 'y', for which Stolfi has a separate treatment, the layers can be defined as follows:

73
+
The below image shows how glyphs in slots map into crust-core-mantle definitions:
74
+
75
+

76
+
77
+
Stolfi comments: "_The distribution of the "circles", the EVA letters { a o y }, is rather complex. They may occur anywhere within the three main layers_ ... _We have arbitrarily chosen to parse each circle as if it were a modifier of the next non-circle letter; except that a circle at the end of the word (usually a y) is parsed as a letter by itself. ... the rules about which circles may appear in each position seem to be fairly complex_". I think, in light of the slots model, this is an unnecessary complication
78
+
as 'a', 'o' and 'y' can be unambiguously assigned to the crust layer in most of cases; furthermore,
79
+
it is clear in which position they can appear (slots 1, 8, and 11). Similarly, I do not understand the complicated parsing of isolated 'e' ("_we have chosen to parse isolated e letters as part of the preceding mantle or core letter_ ... Very rarely ... e occurs alone, surrounded by crust letters; in which case we parse it as the only letter in the mantle layer_", when the slots model
80
+
indicates 'e', 'ee', 'eee' play the same role in word structure.
81
+
82
+
Undoubtedly, the interesting aspect of this model is that it proposes an "onion-like" structure
83
+
for Voynich words. In Stolfi's own words: "_The grammar not only specifies the valid words, but also defines a parse tree for each word, which in turn implies a nested division of the same into smaller parts ... we believe that our parsing of each word into three nested layers must correspond to a major feature of the VMS encoding or of its underlying plaintext_"; however, I would argue this is not what the grammar indicates.
84
+
85
+
For example, again by comparison with the slots model, and as Stolfi admits "_the crust is not homogeneous_"; it is composed by a "left" part, which constitutes word prefixes, and a "right" part that constitutes word suffixes and these parts are quite different; e.g. 'q' appears only in prefixes, while the 'ai*' or 'oi*' sequences (like '-aiin', '-am', etc., that Stolfi calls IN clusters) appears only in suffixes.
86
+
87
+
Similarly, it can be seen that gallows in slots 3 and 7, which belong to the core layer, could well enclose pedestals or 'ee' in slots 4 and 6 that are classified as mantle. Again, Stolfi comments: "_The implied structure of the mantle is probably the weakest part of our paradigm. Actually, we still do not know whether the isolated e after the core is indeed a modifier for the gallows letter (as the grammar implies); or whether the pedestal of a platform gallows is to be counted as part of the mantle_".
88
+
89
+
Stolfi notes: "_When designing the grammar, we tried to strike a useful balance between a simple and informative model and one that would cover as much of the corpus as possible. ... Conversely, the grammar is probably too permissive in many points, so that many words that it classifies as normal are in fact errors or non-word constructs_". It should be noted that the grammar is really good in parsing Voynichese
90
+
(accordingly to Solfi it covers "_over 96.5% of all the tokens (word instances) in the text_") but,
91
+
on the other side, it is also very bad in recognizing what is not Voynichese; the grammar accepts something in the order of 1.4e20 (100 billions of billions) different terms, only about 4'500 of which are terms in the manuscript ([concordance version](https://github.com/mzattera/v4j#ivtff)). Just for comparison, all the words that can be generated by the slot model amount at a total of 16'753'291 (13 order of magnitude less) of which around 2'800 are Voynich terms; the model covers slightly more than 88% of tokens (98% considering separable terms) but it is much easier to describe and understand.
92
+
93
+
I summary, I do agree with Stolfi (and other authors) that the order in which characters appears in Voynich
94
+
words is not arbitrary, but I think his model is misleading in suggesting a "layered" structure; for example,
95
+
word prefixes and suffixes, which in Solfi's model both belong to the same layer (the crust), are indeed very different and assigning them to the same word structure looks completely arbitrary; ultimately, it seems
96
+
the grammar suggests a "sequence" of possible characters, rather than a "onion-like" structure for words.
97
+
If this is the case, it must be said the other, much simpler, models in this page show the same overall
98
+
structure of Voynich words even if in less details or with less coverage of Voynich terms.
99
+
Regarding the fine details, these might not be as relevant as Stolfi admits that "_one should not give too much weight to the finer divisions and associations implied by our parse trees_". It should also be mentioned that the grammar
100
+
looks unnecessary complex, mostly because of the way it handles "circles"; this makes very difficult to grasp the structure of Voynichese below the most superficial levels by looking at the grammar. This is further complicated by the fact that the huge majority of the words the grammar describes, are clearly very different by those we found in the text.
53
101
54
-
Stolfi develops some "paradigms" of Voynich words, like the [OKOKOKO](https://www.ic.unicamp.br/~stolfi/voynich/Notes/017/Note-017.html) paradigm and the crust-core-mantle decomposition
55
-
which, in his words, are incorporated and refined into a [grammar for Voynichese words](https://www.ic.unicamp.br/~stolfi/voynich/00-06-07-word-grammar/).
56
102
57
103
58
104
# Philip Neal [?]
@@ -103,7 +149,7 @@ So NEVA and the Slot alphabet have different objectives, as my proposal aims at
103
149
104
150
# Sean B. Palmer [2004?]
105
151
106
-
I found the below grammar attributed to Palmer by Pelling:
152
+
I found the below grammar attributed to Palmer by [Pelling](http://ciphermysteries.com/2010/11/22/sean-palmers-voynichese-word-generator) (see also below):
107
153
108
154
```
109
155
^
@@ -121,7 +167,7 @@ A = ai*n*
121
167
O = o
122
168
```
123
169
124
-
Accordingly to Pelling, Palmer claims this grammar can generate 97% of Voynichese words, but this is clearly (as Pelling says) this generates a lot of words (potentially infinite strictly looking at the grammar).
170
+
Accordingly to Pelling, Palmer claims this grammar can generate 97% of Voynichese words, but this is clearly (as Pelling says) because it generates a lot of words (potentially infinite strictly looking at the grammar).
125
171
126
172
127
173
# Elmar Vogt [2009?]
@@ -135,8 +181,6 @@ stars section of the Voynich, which is written in Currier's B language.
135
181
Proposes a [Markov state machine](http://www.ciphermysteries.com/2010/11/22/sean-palmers-voynichese-word-generator)
136
182
to generate Voynichese words.
137
183
138
-
In his page he mentions grammars attributed to Sean Palmer, which I should investigate and describe here in more detail.
*[Voynich Manuscript Project](https://ambertide.github.io/VoynichExplorer/index.html) by Ege Özkan.
20
20
21
21
22
22
@@ -38,7 +38,7 @@ Each symbol in the alphabet is referred as a **transliteration character** or si
38
38
39
39
- Unless stated otherwise, pieces of transliterated Voynich script I quote use the "Basic Eva" as transliteration alphabet and are enclosed in single quotes (e.g. 'qockhey').
40
40
41
-
- A **token** in a text is a single sequence of characters, separated by spaces. The list of **terms** is the list of tokens, without repetitions.
41
+
- A **token** in a text is a single sequence of characters, separated by spaces. The list of **terms** is the list of tokens without repetitions.
42
42
In other words, a token is an instance of a term. For example; the below line in the Voynich
Copy file name to clipboardExpand all lines: eclipse/io.github.mzattera.v4j-apps/src/main/java/io/github/mzattera/v4j/applications/slot/CountCharsBySlot.java
0 commit comments