You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -45,21 +45,23 @@ The set of experiments is as follows:
45
45
* First line in paragraph - first lines of paragraphs are compared with the rest of the text.
46
46
* Last line in paragraph - last lines of paragraphs are compared with the rest of the text.
47
47
* First letter in a line - initial character of first token in a line is compared with initial characters of all other tokens.
48
+
For reasons that will be clearer later, the first line of each paragraph (thus the first token) is ignored.
48
49
* Last letter in a line - final character of last token in a line is compared with last characters of all other tokens.
49
50
50
51
The results are shown in the below table[{1}](#Note1)[{2}](#Note2):
51
52
52
53

53
54
54
-
As a test, experiments have been repeated with a shuffled version of the Voynich where the layout (number of tokens in each line) has been preserved but tokens were shuffled around randomly, and the anomalies in distribution disappeared.
55
+
As a test, experiments have been repeated with a shuffled version of the Voynich where the layout (number of tokens in each line) has been preserved but tokens were shuffled around randomly,
56
+
and the anomalies in distribution disappeared.
55
57
56
58
# Considerations and Previous Works
57
59
58
60
## (Pedestalled) Gallows
59
61
60
-
Before analyzing the above result, I want to discuss the distribution of "gallows" characters ('p', 't', 'k', 'f', 'P', 'T', 'K', 'F').
62
+
Before analyzing the above results, I want to discuss the distribution of "gallows" characters ('p', 't', 'k', 'f', 'P', 'T', 'K', 'F').
61
63
For the purpose, I have prepared the following set of tables[{3}](#Note3)[{4}](#Note4);
62
-
all tables have been prepared using the majority transliteration of the Voynich, using the Slot alphabet [see v4j README](https://github.com/mzattera/v4j#ivtff).
64
+
all tables have been prepared using the majority transliteration of the Voynich, using the Slot alphabet ([see v4j README](https://github.com/mzattera/v4j#ivtff)).
63
65
The analysis has been done by splitting the text in [clusters](../003) and then considering different parts the text:
64
66
65
67
* First Word of a Paragraph
@@ -98,7 +100,8 @@ Comparing these three tables and considering the one with characters distributio
98
100
99
101
6. The "pedestalled gallows" seem to follow the same behavior of their "non-pedestalled" counterparts; thus 'T' behaves like 't' (its
100
102
distribution seems more uniform across lines though), 'P' like 'p', 'F' like 'f', and 'K' like 'k',
101
-
with the exception that they tend to avoid being token initials (especially 'K').
103
+
with the exception that they tend to avoid being token initials (especially 'K');
104
+
notice how 'K', 'P', and 'T' appear significantly less as first character in a line.
102
105
However, this last part is difficult to confirm, given the small number of these glyphs.
103
106
104
107
7. From 75% (for Pharmaceutical) to 95% (for Herbal B) of paragraphs begins with "gallows".
@@ -112,8 +115,6 @@ see, among others,[TILTMAN (1967)](../biblio.md)[{5}](#Note5), [CURRIER (1976)](
112
115
It is interesting that, even with variations and very few exceptions, the above rules apply to all of the different clusters,
113
116
this is somewhat surprising, given that we know that "languages" for each cluster are structurally different (see [Note 009](../009)).
114
117
115
-
116
-
117
118
118
119
## First Line in a Page
119
120
@@ -122,124 +123,76 @@ For the time being, I will assume that the differences between first line of a p
122
123
being the sample much smaller for beginning of pages, the trends are just less marked.
123
124
124
125
125
-
126
126
## First Line in a Paragraph
127
127
128
-
** See separate class doen for gallows
129
-
** S appears more frequently; nobody noticed so far, probably because of using EVA; do the test using EVA and see if c s h have some anomalies....
130
-
131
-
tokens starting with non-pedestalled gallows are almost always found as first token in a paragraph.
132
-
tokens starting with pedestalled gallows are more rare and distributed more or less evenly, but tend not to appear at the beginning of a line other than first line of paragraphs.
133
-
134
-
75-95% of first tokens in paragraphs start with a (pedestalled) gallows.
135
-
136
-
p, f, P, F appear almost exclusively in first line of a paragraph; p and f, when appearing in the first token, are almost always initials.
137
-
138
-
Other (pedestalled) gallows tend to not appear as token initials.
139
-
140
-
141
-
128
+
1. As already discussed above, 'f', 'F', 'p', 'P' tend to appear more frequently in first line of paragraphs, same holds true for 't', except for
129
+
the Herbal pages, 'k' and 'K' have the opposite behavior, tending to appear more frequently outside the first line, finally, not much can be said about 'T'.
130
+
2. There is also a preference for 'S' to appear in first line of paragraphs.
131
+
3. 'e' seems to appear more frequently without repetitions in first line (see low frequencies of 'E' and 'B').
132
+
4. 'n' avoids the first line of paragraphs.
133
+
5. With the exception of the Pharmaceutical section, 'J' avoids the first line of paragraphs.
134
+
6. For the Biological and Stars sections only, 'r' and 'o' seems over-represented in first line, the opposite is true for 's'.
142
135
136
+
To my knowledge, with the exception of point 1., these are new findings which are due to:
143
137
138
+
a. Using the Slot alphabet for the analysis rather than EVA.
139
+
140
+
For example, Slot 'S' is a single character represented in EVA as two characters ('sh'), this makes difficult, if not impossible,
141
+
for analysis based on EVA to spot the abundance of 'sh' in first line, as the statistics will be skewed by single occurrences of EVA 's'
142
+
or EVA sequences like 'ch' 'cth', etc.
143
+
144
+
b. Performing a separate analysis for each cluster (for point 6).
144
145
145
146
146
147
## Last Line in a Paragraph
147
148
148
-
149
+
1. 'f', 't' and especially 'q' and 'p' tend to avoid last line of paragraphs.
150
+
151
+
I found no mention of this before.
149
152
150
153
151
154
## First Letter in a Line
152
155
153
-
[TILTMAN (1967)](../biblio.md) 'y' occurs quite frequently as the initial symbol of a line followed immediately by a combination of symbols which seem
154
-
to be happy without it in any part of a line away from the beginning (d).
1. "The frequency counts of the beginnings and endings of lines are markedly different from the counts of the same characters internally".
158
-
159
-
[CURRIER (1976)](../biblio.md)
160
-
* The 'ligatures' [ cKh cTh cFh cPh ] can never occur as paragraph initial, and almost never line initial.
161
-
162
-
[CURRIER (1976)](../biblio.md)
163
-
* Skewed frequencies at beginnings of lines may be illustrated by the two letters ch and Sh.
164
-
If its occurrence as an initial were random, we would expect it to occur one seventh of the time in each token position of a line.
165
-
Actually, it is a very infrequent token initial at the beginning of a line, except when there is an intercalated o. This applies only to 'Language' A.
166
-
Other ‘tokens’ occur in this position far more frequently than expected, particularly ‘tokens’ with initial ‘dC,’ ‘qC’ etc.,
167
-
which have the appearance of ‘C’-initial ‘tokens’ suitably modified for line-initial use
168
-
-> Nobody noticed, maybe because in EVA this is treated as two characters ('sh'), which skews the statistics.
169
-
except for Currier who transcripes this as S Z.
170
-
->Guarda comunque anche le differenze nelle percentuali
156
+
To perform this analysis, the first token of each paragraph has been ignored, as we already know from the analysis above that
157
+
that token will most likely start with gallows (thus skewing our analysis).
158
+
159
+
1. 't' and less markedly 'p', are over-represented at line start; the opposite is true for 'k', confirming our analysis of gallows above.
160
+
2. 's', 'y', and 'd' (with exception for Herbal A) are also over-represented at line start.
161
+
3. 'C' and, less markedly, 'S' are under-represented at line start.
162
+
4. 'a', 'o' (with exception of Herbal A where it shows opposite behavior), and, less markedly, 'r' are under-represented at beginning of a line.
163
+
164
+
Again, much of this is not new: [CURRIER (1976)](../biblio.md) states that
165
+
"The frequency counts of the beginnings and endings of lines are markedly different from the counts of the same characters internally" and he noticed how
166
+
'C' and 'S' are under-represented (unless followed by 'o').
167
+
168
+
[TILTMAN (1967)](../biblio.md) noticed that "'y' occurs quite frequently as the initial symbol of a line followed immediately by a combination of symbols which seem
169
+
to be happy without it in any part of a line away from the beginning".
171
170
172
-
[BOWERN (2020)](../biblio.md)
173
-
There is a similar but less robust pattern associated with the beginning of each line. The
171
+
[BOWERN (2020)](../biblio.md) mentions that "The
174
172
first token is somewhat more likely to begin with s- s. This may be another orthographic
175
173
variant, but it appears to only occur with tokens that otherwise begin with o- o or a- a. Thus
176
-
aiin aiin, ol ol, and or or are replaced with saiin saiin, sol sol, and sor sor.
177
-
178
-
174
+
aiin aiin, ol ol, and or or are replaced with saiin saiin, sol sol, and sor sor." this is consistent with
175
+
points 2. and 4. above.
179
176
180
177
181
178
## Last Letter in a Line
182
179
183
-
[TILTMAN (1967)](../biblio.md) 'm' appears most commonly at the end of a line, rarely elsewhere (b).
1. "The frequency counts of the beginnings and endings of lines are markedly different from the counts of the same characters internally".
188
-
189
-
2. There is, for instance, one symbol that, while it does occur elsewhere, occurs at the
190
-
end of the last ‘tokens’ of lines 85% of the time".
191
-
192
-
[BOWERN (2020)](../biblio.md)
193
-
There are also characters which usually appear at the end of the last token of the line,
194
-
particularly m. It is plausible that m m and g g are variant forms of the token-final glyphs -iin iin and -y y
195
-
However, if this is an orthographic convention, it is not applied in a consistent manner: the forms -iin iin and -y
196
-
y are also found line-finally, albeit somewhat less frequently.
197
-
198
-
[ZANDBERGEN (2021)](../biblio.md)
199
-
The third feature is similar to the second, but it is less pronounced, and could be easier to explain. This is
200
-
the character m that is a token-final character that predominantly (but again not always) appears at the
201
-
ends of lines. In this case, the letter could conceivably be a line final variant form of either r or l , but
202
-
there are some issues with that hypothesis.
203
-
204
-
205
-
206
-
## Other Patterns
207
-
208
-
[KNIGHT]
209
-
Confirms uneven char distribution but does it for the entire text
210
-
It is particularly interesting that lower frequency characters occur more at line-ends,
211
-
and higher-frequency ones at the beginnings of lines.
212
-
-> DAVVERO!?!?!? INTERESSANTE DA TESTARE vedi io.github.mzattera.v4j.applications.chars.CharByPositionTest
213
-
214
-
Patrick Feaster CONFERENZA
215
-
Rightward and Downward Grapheme Distributions in the Voynich Manuscript.
180
+
1. 'm' is over represented at the end of lines.
181
+
2. Conversely, 'l' and 'r' are under-represented.
182
+
3. For some clusters, 'd', 'o', 'n', and 'y' shows a significant deviation in their distribution.
183
+
184
+
Point 1. is a well known fact in [TILTMAN (1967)](../biblio.md), [CURRIER (1976)](../biblio.md), [BOWERN (2020)](../biblio.md),
185
+
and [ZANDBERGEN (2021)](../biblio.md).
216
186
217
187
# Conclusions
218
188
219
189
The distribution of characters across the page presents some anomalies which are statistically significant and are summarized in the table above.
220
-
May of these anomalies have been detected by several authors in the past.
221
-
222
-
However, this is possibly the first time when it is shown that the list of characters presenting anomalies in their distribution, the extent and the direction of these anomalies
223
-
differ across different sections of the Voynich. By looking at each cluster separately, I also identified some anomalies which, as far as I know, are new.
224
-
225
-
We summarize below the main trends, but we invite to refer to the above table for a detailed analysis, case by case.
226
-
-> Cluster piu' aprticolare HA
227
-
228
-
**Little progress has been made since Tillman and currier on char distribution until now**
229
-
230
-
** Casi piu evidenti q d l o n che si comportano in modo marcatamente opposto in cluster diversi**
190
+
Many of these anomalies have been detected by several authors in the past, but some are possibly new:
231
191
232
-
**Highlight char anomalies which nobody discovered before (e.g., 'a' or 'y' as first char in a line)**
233
-
234
-
If we look to behaviors that appear consistently across clusters, we can see that:
235
-
236
-
* 'k' does not appear in first line of pages and in first line of paragraphs (with a slightly less significance for BB cluster).
237
-
* 'S' and 'p' appear with high frequency in first line of paragraphs.
238
-
* 'y', 't', and 'd' tend to appear as first letter in a line; with the exception of cluster HA where 'd' has the opposite behavior.
239
-
'C', 'S', 'o', and 'a' hardly do; with the exception of cluster HA again where 'o' appears with high frequency.
240
-
* 'l' and 'r' tend not to appear as terminal letter of last token in a line.
192
+
1. 'k' and 'K' behaving differently then other gallows.
193
+
2. 'f', 't' and especially 'q' and 'p' tend to avoid last line of paragraphs.
241
194
242
-
Is Currier's lien as a functional entity valid?
195
+
In addition, worth mentioning as some characters behave differently in different clusters.
243
196
244
197
245
198
---
@@ -274,6 +227,10 @@ On this point, please see [Note 005](../005) where I show, given the slot struct
274
227
Still, I think there is good evidence that the initial gallows in paragraphs might be an addition to the actual token. If this is done for aesthetic reasons or is part of the encoding scheme
275
228
(as Grove suggests) I cannot tell.
276
229
230
+
<aid="Note8">**{9}**</a>John Grove seems to be the first person to notice that "First Gallows on a page can normally be detached from the first word to form a relatively normal VMS word",
231
+
suggesting these characters might be additions to the token (see also [this message](http://voynich.net/Arch/2004/09/msg00442.html) from Stolfi, which picks up on this).
Copy file name to clipboardExpand all lines: eclipse/io.github.mzattera.v4j-apps/src/main/java/io/github/mzattera/v4j/applications/chars/CharDistributionAnalysis.java
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -93,7 +93,7 @@ public static void main(String[] args) {
93
93
System.out.print("\n\n[ Last line in paragraph ];\n");
0 commit comments