You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/010/index.md
+16-12Lines changed: 16 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# Note 010 - Character distribution through the page
1
+
# Note 010 - Character Distribution Through Clusters
2
2
3
3
_Last updated Dec. 28th, 2024._
4
4
@@ -35,7 +35,7 @@ the character distribution of the two parts is compared using a chi-squared test
35
35
If the test shows a statistically significant variation, then each character is tested individually (using a binomial test),
36
36
to highlight which characters behave significantly differently in the two parts of the text.
37
37
38
-
In all experiments, the [concordance version](https://github.com/mzattera/v4j/blob/master/eclipse/io.github.mzattera.v4j/src/main/resources/Transcriptions/Interlinear_slot_ivtff_1.5.txt)
38
+
In all experiments, the [concordance version](https://github.com/mzattera/v4j#ivtff)
39
39
of the Voynich in the Slot alphabet is used; only text appearing in paragraphs is considered (IVTFF locus type = P0 or P1).
40
40
The experiments are done separately for each [cluster](../003).
41
41
@@ -61,8 +61,8 @@ and the anomalies in distribution disappeared.
61
61
62
62
Before analyzing the above results, I want to discuss the distribution of "gallows" characters ('p', 't', 'k', 'f', 'P', 'T', 'K', 'F').
63
63
For the purpose, I have prepared the following set of tables[{3}](#Note3)[{4}](#Note4);
64
-
all tables have been prepared using the majority transliteration of the Voynich, using the Slot alphabet ([see v4j README](https://github.com/mzattera/v4j#ivtff)).
65
-
The analysis has been done by splitting the text in [clusters](../003) and then considering different parts the text:
64
+
all tables have been prepared using the [majority transliteration](https://github.com/mzattera/v4j#ivtff) of the Voynich, using the Slot alphabet.
65
+
The analysis has been done by splitting the text in clusters and then considering different parts the text:
66
66
67
67
* First Word of a Paragraph
68
68
* Remaining Words in First Line of each paragraph
@@ -135,13 +135,13 @@ being the sample much smaller for beginning of pages, the trends are just less m
135
135
136
136
To my knowledge, with the exception of point 1., these are new findings which are due to:
137
137
138
-
a. Using the Slot alphabet for the analysis rather than EVA.
138
+
- Using the Slot alphabet for the analysis rather than EVA.
139
139
140
140
For example, Slot 'S' is a single character represented in EVA as two characters ('sh'), this makes difficult, if not impossible,
141
141
for analysis based on EVA to spot the abundance of 'sh' in first line, as the statistics will be skewed by single occurrences of EVA 's'
142
142
or EVA sequences like 'ch' 'cth', etc.
143
143
144
-
b. Performing a separate analysis for each cluster (for point 6).
144
+
- Performing a separate analysis for each cluster (for point 6).
145
145
146
146
147
147
## Last Line in a Paragraph
@@ -191,8 +191,15 @@ Many of these anomalies have been detected by several authors in the past, but s
191
191
192
192
1. 'k' and 'K' behaving differently then other gallows.
193
193
2. 'f', 't' and especially 'q' and 'p' tend to avoid last line of paragraphs.
194
+
3. 'S' prefers to appear in first line of paragraphs.
195
+
4. 'e' seems to appear more frequently without repetitions in first line (see low frequencies of ‘E’ and ‘B’).
196
+
5. 'n avoids the first line of paragraphs.
197
+
6. With the exception of the Pharmaceutical section, 'J' avoids the first line of paragraphs.
198
+
7. For the Biological and Stars sections only, 'r' and 'o' seems over-represented in first line, the opposite is true for 's'.
194
199
195
-
In addition, worth mentioning as some characters behave differently in different clusters.
200
+
Again, these possibly new insights were only possible becasue the slot alphabet was used for analysis and the analysis has been conducted for each cluster separately.
201
+
202
+
Last but not least, worth mentioning as a few characters behave differently in different clusters.
196
203
197
204
198
205
---
@@ -217,19 +224,16 @@ the top lines of paragraphs where there is some extra space." (p. 7).
217
224
218
225
"The 'ligatures' [ "pedestalled gallows" nda ] can never occur as paragraph initial, and almost never line initial." (p. 9) this is only partially true.
219
226
220
-
<aid="Note7">**{7}**</a>"(14) On most herbal folios the first line of the first paragraph begins with a very small set of symbols,
227
+
<aid="Note7">**{7}**</a>"(14) On most herbal folios the first line of the first paragraph begins with a very small set of symbols,
221
228
primarily 't', 'k', 'p', 'f'" (p. 28).
222
229
223
-
<aid="Note8">**{8}**</a>John Grove seems to be the first person to notice that "First Gallows on a page can normally be detached from the first word to form a relatively normal VMS word",
230
+
<aid="Note8">**{8}**</a>John Grove seems to be the first person to notice that "First Gallows on a page can normally be detached from the first word to form a relatively normal VMS word",
224
231
suggesting these characters might be additions to the token (see also [this message](http://voynich.net/Arch/2004/09/msg00442.html) from Stolfi, which picks up on this).
225
232
226
233
On this point, please see [Note 005](../005) where I show, given the slot structure of Voynich words, that removing the initial character from a word results in another valid word in a minimum of 60% of cases.
227
234
Still, I think there is good evidence that the initial gallows in paragraphs might be an addition to the actual token. If this is done for aesthetic reasons or is part of the encoding scheme
228
235
(as Grove suggests) I cannot tell.
229
236
230
-
<aid="Note8">**{9}**</a>John Grove seems to be the first person to notice that "First Gallows on a page can normally be detached from the first word to form a relatively normal VMS word",
231
-
suggesting these characters might be additions to the token (see also [this message](http://voynich.net/Arch/2004/09/msg00442.html) from Stolfi, which picks up on this).
0 commit comments