You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The below table summarizes the results, showing, the relative frequency of terms in each cluster.
25
+
The below table summarizes the results, showing, the relative frequency of word types in each cluster.
26
26
27
-

27
+

28
28
29
-
As expected from cluster analysis, beside terms that appear frequently in all clusters (such as 'chey', 'daiin', 'dar', 'dy', and 'or'),
30
-
there are terms characteristic of a single cluster; the table below shows them.
29
+
As expected from cluster analysis, beside word types that appear frequently in all clusters (such as 'chey', 'daiin', 'dar', 'dy', and 'or'),
30
+
there are word types characteristic of a single cluster; the table below shows them.
31
31
32
-

32
+

33
33
34
34
It might be interesting to note that:
35
35
36
-
- Most common terms in Herbal A pages (HA cluster) start with 'ch-' or 'sh-'; the latter prefix appearing only here,
36
+
- Most common word types in Herbal A pages (HA cluster) start with 'ch-' or 'sh-'; the latter prefix appearing only here,
37
37
38
-
- Pharmaceutical (PA cluster) common terms end in '-ol', which is rare for other clusters. In addition, they seem to prefer the 'ok-' or 'qok-' prefix.
38
+
- Pharmaceutical (PA cluster) common word types end in '-ol', which is rare for other clusters. In addition, they seem to prefer the 'ok-' or 'qok-' prefix.
39
39
40
-
- Herbal B pages (HB cluster) prefer terms starting with 'qo-' and 'qok-'.
40
+
- Herbal B pages (HB cluster) prefer word types starting with 'qo-' and 'qok-'.
41
41
42
-
- Zodiac (ZZ) common terms mostly start with 'ot-', this is uncommon for clusters above. Moreover, these pages feature single characters as common terms.
42
+
- Zodiac (ZZ) common word types mostly start with 'ot-', this is uncommon for clusters above. Moreover, these pages feature single characters as common word types.
Copy file name to clipboardExpand all lines: docs/005/index.md
+33-33Lines changed: 33 additions & 33 deletions
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ _Please refer to the [home page](..) for a set of definitions that might be rele
18
18
19
19
## Abstract
20
20
21
-
I show how the structure of Voynich words can be easily described by assuming each term is composed by "slots" that can be filled
21
+
I show how the structure of Voynich words can be easily described by assuming each word type is composed by "slots" that can be filled
22
22
accordingly to simple rules, which are described below.
23
23
24
24
This in turn sheds some lights on the definition of what might constitute a Voynich character (the Voynich alphabet).
@@ -32,27 +32,27 @@ exists in any modern text as well. However, I will try to focus on claims that a
32
32
33
33
I start my analysis from a concordance version of the Voynich text (see [Note 001](../001)); this is obtained from the
34
34
Landini-Stolfi Interlinear file by merging available interlinear transcriptions for each transcriber. In the merging, characters that are not
35
-
read by all authors in the same way are marked as unreadable. This to ensure the terms I will extract from the text are the most accurate.
35
+
read by all authors in the same way are marked as unreadable. This to ensure the word types I will extract from the text are the most accurate.
36
36
37
37
For reasons explained below, any occurrence of the following characters is also marked with an unreadable character:
38
38
39
39
- 'g', 'x', 'v', 'u', 'j', 'b', 'z' (47 occurrences in total, 13 of them are single-letter words).
40
40
41
-
- 'c' and 'h', when they do not appear in combinations such as 'ch', 'sh', 'cth', 'ckh', 'cph', 'cfh'; this sums up to 11 terms.
41
+
- 'c' and 'h', when they do not appear in combinations such as 'ch', 'sh', 'cth', 'ckh', 'cph', 'cfh'; this sums up to 11 word types.
42
42
43
43
As a second step, **tokens** are created by splitting the text where a space was detected by at least one of the transcribers; there are 31'317 tokens in the text,
44
44
ignoring those that contain an unreadable character.
45
45
46
-
The list of **terms** is the list of tokens without repetitions (this would be the "vocabulary" of the Voynich).
47
-
These 5'105 total terms have then been analyzed as explained below.
46
+
The list of **word types** is the list of tokens without repetitions (this would be the "vocabulary" of the Voynich).
47
+
These 5'105 total word types have then been analyzed as explained below.
48
48
49
49
50
50
## Considerations
51
51
52
-
By looking at the terms in the Voynich, we can see their structure (that is, the sequence of Voynich glyphs used to write them) can be easily described
52
+
By looking at the word types in the Voynich, we can see their structure (that is, the sequence of Voynich glyphs used to write them) can be easily described
53
53
as follows:
54
54
55
-
- each term can be considered as composed by 12 "slots"; for convenience I will number them from 0 to 11.
55
+
- each word type can be considered as composed by 12 "slots"; for convenience I will number them from 0 to 11.
56
56
57
57
- each slot can be empty or contain a single glyph.
58
58
@@ -65,12 +65,12 @@ The below table summarizes all of these rules, showing the 12 slots and the glyp
65
65

66
66
67
67
In some cases, the word structure can be ambiguous, since a glyph can occupy any of 2 available slots
68
-
(e.g. the term 'y' can be seen as a 'y' either in slot 1 or slot 11); following some further
68
+
(e.g. the word type 'y' can be seen as a 'y' either in slot 1 or slot 11); following some further
69
69
[analysis on word structure](../007), when decomposing a word, I always put each glyph in the rightmost possible position.
70
70
Notice this is a "weak" rule that is quite arbitrary and has no impact on which
71
-
terms can or cannot be described by this model.
71
+
word types can or cannot be described by this model.
72
72
73
-
To exemplify this concept, I show how some common terms can be decomposed in slots;
73
+
To exemplify this concept, I show how some common word types can be decomposed in slots;
74
74
75
75
```
76
76
'daiin'
@@ -97,34 +97,34 @@ To exemplify this concept, I show how some common terms can be decomposed in slo
97
97
98
98
We can then see [{2}](#Note2) that tokens can be classified as follows:
99
99
100
-
- 27'114 tokens (86.6% of total), corresponding to 2'617 different terms (51.3% of total), can be decomposed in slots accordingly to the above rules. I will call these tokens "**regular**".
100
+
- 27'114 tokens (86.6% of total), corresponding to 2'617 different word types (51.3% of total), can be decomposed in slots accordingly to the above rules. I will call these tokens "**regular**".
101
101
102
-
- 3'249 tokens (10.4% of total), corresponding to 1'892 different terms (37.1% of total), can be divided in two parts, each composed by at least two Voynich glyphs,
103
-
where each of these parts is a regular term. I will call these tokens "**separable**".
102
+
- 3'249 tokens (10.4% of total), corresponding to 1'892 different word types (37.1% of total), can be divided in two parts, each composed by at least two Voynich glyphs,
103
+
where each of these parts is a regular word type. I will call these tokens "**separable**".
104
104
105
-
Moreover, we can see that for 2'219 separable terms (75.2% of total separable terms) their constituent parts appear as tokens in the text at least as often as the whole
106
-
separable term. For example, the term 'chockhy' appears 18 times in the text; it is a separable term that can be divided in two parts, each one being a regular term, as
107
-
'cho' - 'ckhy' which appears in the text 79 and 39 times respectively. I think this is an indication that many separable terms are possibly just two regular words that were written together
105
+
Moreover, we can see that for 2'219 separable word types (75.2% of total separable word types) their constituent parts appear as tokens in the text at least as often as the whole
106
+
separable word type. For example, the word type 'chockhy' appears 18 times in the text; it is a separable word type that can be divided in two parts, each one being a regular word type, as
107
+
'cho' - 'ckhy' which appears in the text 79 and 39 times respectively. I think this is an indication that many separable word types are possibly just two regular words that were written together
108
108
(or the space between them was not transcribed correctly).
109
-
When I need to distinguish these terms from other separable terms, I will call them "**verified separable**" or simply "**verified**".
109
+
When I need to distinguish these word types from other separable word types, I will call them "**verified separable**" or simply "**verified**".
110
110
111
-
- Remaining 954 tokens (3.0% of total), corresponding to 596 different terms (11.7% of total), are marked as "**unstructured**".
111
+
- Remaining 954 tokens (3.0% of total), corresponding to 596 different word types (11.7% of total), are marked as "**unstructured**".
112
112
113
-
Notice that 489 out of these 596 terms, or 82%, appear only once in the text; this percentage is 59.8% for regular and separable terms considered together.
113
+
Notice that 489 out of these 596 word types, or 82%, appear only once in the text; this percentage is 59.8% for regular and separable word types considered together.
114
114
This might suggest that unstructured words are either typos or special words that are encoded differently than other words.
115
115
116
-
- Sometime I contrast regular and separable terms to unstructured ones by calling the former "**structured**".
116
+
- Sometime I contrast regular and separable word types to unstructured ones by calling the former "**structured**".
117
117
118
118
The below tables summarize these findings.
119
119
120
120

121
121
122
122

123
123
124
-
In short, almost 9 out of 10 tokens in the Voynich text exhibit a "slot" structure. Of the remaining, a fair amount can be decomposed in two parts each corresponding to regular terms
124
+
In short, almost 9 out of 10 tokens in the Voynich text exhibit a "slot" structure. Of the remaining, a fair amount can be decomposed in two parts each corresponding to regular word types
125
125
appearing elsewhere in the text. The remaining cases (3 out of 100) are mostly words appearing only once in the text.
126
126
127
-
The below table shows percentage occurrence of glyphs in slots for regular terms[{3}](#Note3).
127
+
The below table shows percentage occurrence of glyphs in slots for regular word types[{3}](#Note3).
128
128
129
129
<aid="GliphCountImg" />
130
130

@@ -139,9 +139,9 @@ The definition of the Voynich alphabet, that is of which glyphs should be consid
139
139
Each transcriber must continuously decide what symbols in the manuscript constitute instances of the same glyph and how each glyph needs to be mapped into
140
140
one or more transliteration characters.
141
141
142
-
However, if we consider the above defined slots as relevant for the structure of terms, we can reasonably assume that each glyph appearing in a slot constitutes
142
+
However, if we consider the above defined slots as relevant for the structure of word types, we can reasonably assume that each glyph appearing in a slot constitutes
143
143
a basic unit of information, that is a character in the Voynich alphabet.
144
-
As far as I know, this is the first time that a possible Voynich alphabet is supported by empirical evidence of an inner structure of Voynich terms.
144
+
As far as I know, this is the first time that a possible Voynich alphabet is supported by empirical evidence of an inner structure of Voynich word types.
145
145
146
146
Below, I analyze more in detail some relationships between glyphs, as they appear in slots, and EVA characters.
147
147
@@ -172,7 +172,7 @@ that is a more compact from of writing a combination of the pedestal and a gallo
172
172
If we look at slots 3 through 5, we might think that pedestalled gallows can be indeed a combination of a gallows character followed by the pedestal, in this specific order.
173
173
However,
174
174
175
-
- The combination of gallows in slot 3 followed by a pedestal in slot 4 is quite common in the text. 2'185 tokens, or 419 regular terms, that is 16% of regular terms,
175
+
- The combination of gallows in slot 3 followed by a pedestal in slot 4 is quite common in the text. 2'185 tokens, or 419 regular word types, that is 16% of regular word types,
176
176
and written explicitly as two glyphs.
177
177
178
178
- In 332 tokens, we have a pedestal followed by a pedestalled gallows. This would correspond to a double pedestal is in a word (or a separable word), which contrasts with the
@@ -206,7 +206,7 @@ Based on the above, I assume each sequence of 'e' and 'i' is probably a characte
206
206
207
207
Finally, drawing from the above considerations, I propose a new transliteration alphabet, which I will call the **Slot alphabet** for obvious reasons.
208
208
209
-
I think that, being based on the inner structure of Voynich terms, this alphabet is more suitable than others when performing statistical analysis that relies on characters in words or when attempting
209
+
I think that, being based on the inner structure of Voynich word types, this alphabet is more suitable than others when performing statistical analysis that relies on characters in words or when attempting
210
210
to decipher the Voynich, where a one-to-one correspondence between the transliteration characters and the Voynich characters is paramount.
211
211
212
212
In addition, the alphabet can be easily converted into EVA, and vice-versa, therefore being used interchangeably.
@@ -234,20 +234,20 @@ I created a [separate page](../006).
234
234
235
235
## Conclusions
236
236
237
-
- Majority of words in the Voynich exhibits an inner structure described here, where terms can be represented as composed by 12 "slots" that can be left empty or
237
+
- Majority of words in the Voynich exhibits an inner structure described here, where word types can be represented as composed by 12 "slots" that can be left empty or
238
238
populated by a single glyph chosen among a very limited group of glyphs (usually 2-3).
239
239
240
-
- 86.6% of tokens (51.3% of terms) exhibit this structure (**regular**terms).
240
+
- 86.6% of tokens (51.3% of word types) exhibit this structure (**regular**word types).
241
241
242
-
- 10.4% of tokens (37.1% of terms) can be divided in two parts, each presenting the inner structure described above (**separable**terms).
242
+
- 10.4% of tokens (37.1% of word types) can be divided in two parts, each presenting the inner structure described above (**separable**word types).
243
243
244
-
For 68.3% of separable terms, their two constituents appear in the text more often than the separable term itself (**verified separable**terms).
244
+
For 68.3% of separable word types, their two constituents appear in the text more often than the separable word type itself (**verified separable**word types).
245
245
246
-
This seems a strong indication that separable terms are made by two regular terms written or transcribed together.
246
+
This seems a strong indication that separable word types are made by two regular word types written or transcribed together.
247
247
248
-
- only 3.0% of tokens (11.7% of terms) do not exhibit this structure (**unstructured**terms).
248
+
- only 3.0% of tokens (11.7% of word types) do not exhibit this structure (**unstructured**word types).
249
249
250
-
82% of unstructured terms appears only once in the text. In other words, **only 1.5% of tokens (2.1% of terms) are unstructured terms appearing at least twice in the text**.
250
+
82% of unstructured word types appears only once in the text. In other words, **only 1.5% of tokens (2.1% of word types) are unstructured word types appearing at least twice in the text**.
251
251
252
252
I argue that these can be typos or plain text words encoded in a different way than the majority of the text (e.g. because they represent proper names or uncommon words).
0 commit comments