Skip to content

Commit 149e9de

Browse files
committed
Finished note 008
1 parent 3aba835 commit 149e9de

File tree

5 files changed

+35
-101
lines changed

5 files changed

+35
-101
lines changed

docs/006/index.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,6 @@ Brian Cham proposes a new pattern in the text of the Voynich Manuscript named th
220220
This pattern is fundamentally based on shapes of individual glyphs but also informs the structure of words.
221221

222222

223-
224223
---
225224

226225
**Notes**

docs/007/index.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,17 @@ Noticeable difference is that, while 'l' and 'r' can be followed by the word fin
177177

178178
This slot contains the word ending 'y' alone.
179179

180+
181+
# Conclusions
182+
183+
This analysis shows that there is a dependency between one character and those preceding it. In other terms,
184+
Voynich words are not generated by randomly putting proper chars into slots.
185+
186+
It also shows that, given a character, we have a limited choice of options for the characters following it;
187+
this means that the information encoded by each single character is not much. Compared to modern languages,
188+
where a position in a text can encode 4-5 bits (it can be occupied by any of about 25 letters),
189+
a position in a Voynich word can be filled only by a smaller set of symbols, this encoding less information.
190+
180191

181192
---
182193

docs/008/index.md

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Note 008 - The Best Grammar for Voynichese (as far as I know)
1+
# Note 008 - Simply the Best Grammar for Voynichese (as far as I know)
22

33
_Last updated Jan. 31st, 2021._
44

@@ -169,17 +169,16 @@ The table below compares our grammar with other models described in [Note 006](.
169169
| Model | Generated strings | True Positives | Positive Tokens | Precision | Recall | F1 |
170170
| :--- | ---: | ---: | ---: | ---: | ---: | ---: |
171171
| ROE | 120 | 112 | 15.954% | <span style="color:red">0.933</span> | 0.022 | 0.043 |
172-
| STOLFI | 143,124,560,075,240,080,000 | 4,527 | 97.813% | 0.000 | <span style="color:red">0.881</span> | 0.000 |
173-
| NEAL_1a | 87,480 | 535 | 20.083% | 0.006 | 0.104 | 0.012 |
174-
| NEAL_1b | 174,818 | 1,782 | 66.013% | 0.010 | 0.347 | 0.020 |
175-
| NEAL_2 | 1,311,345 | 1,049 | 45.248% | 0.001 | 0.204 | 0.002 |
176-
| PALMER || 4,547 | 97.280% | 0.000 | <span style="color:red">0.884</span> | 0.000 |
177-
| VOGT (Recipes) | 32,575 | 424 | 58.697% | 0.013 | 0.190 | 0.024 |
178-
| VOGT | 32,575 | 565 | 55.734% | 0.017 | 0.110 | 0.030 |
172+
| STOLFI | 143'124'560'075'240'080'000 | 4'527 | 97.813% | 0.000 | <span style="color:red">0.881</span> | 0.000 |
173+
| NEAL_1a | 87'480 | 535 | 20.083% | 0.006 | 0.104 | 0.012 |
174+
| NEAL_1b | 174'818 | 1'782 | 66.013% | 0.010 | 0.347 | 0.020 |
175+
| NEAL_2 | 1'311'345 | 1'049 | 45.248% | 0.001 | 0.204 | 0.002 |
176+
| PALMER || 4'547 | 97.280% | 0.000 | <span style="color:red">0.884</span> | 0.000 |
177+
| VOGT (Recipes) | 32'575 | 424 | 58.697% | 0.013 | 0.190 | 0.024 |
178+
| VOGT | 32'575 | 565 | 55.734% | 0.017 | 0.110 | 0.030 |
179179
| PELLING || 259 | 32.099% | 0.000 | 0.050 | 0.000 |
180-
| SLOT | 4,643,467 | 2,617 | 86.447% | 0.001 | <span style="color:orange">0.509</span> | 0.001 |
181-
| SM | 3,110 | 1,113 | 62.040% | <span style="color:orange">0.358</span> | 0.216 | <span style="color:red">0.270</span> |
182-
180+
| SLOT | 4'643'467 | 2'617 | 86.447% | 0.001 | <span style="color:orange">0.509</span> | 0.001 |
181+
| <span style="color:green">**SLOT MACHINE**</span> | 3'110 | 1'113 | 62.040% | <span style="color:orange">0.358</span> | 0.216 | <span style="color:red">0.270</span> |
183182

184183
- **STOLFI**: Jorge Stolfi's "crust-mantle-core" model. As it is impossible to generate and test all words for this model, I assume any term in the Voynich that is not listed in Solfi's `AbnormalWord` is a true positive.
185184
- There are three versions of grammars described by Philip Neal:
@@ -189,13 +188,17 @@ The table below compares our grammar with other models described in [Note 006](.
189188
- Vogt's model was created only for the "recipes" section (Stars B); here a comparison is provided both limited to that section and for the entire text.
190189
- When implementing Pelling's state machine, I assumed all arrows have the same meaning (even if some are dashed) and the red boxes are non-emitting states.
191190
- **SLOT** considers all terms generated by the [Slot model](../005 +).
192-
- **SM** Is the state machine I describe above.
193-
194-
# Considerations
195-
191+
- **SLOT MACHINE** Is the state machine I describe above in this note.
196192

193+
197194
# Conclusions
198195

196+
- The proposed grammar has the best F1, an order of magnitude above any other model I know. It is able to
197+
model 62% of tokens that appear in the Voynich (that is 1'113 terms, or 21.6% of them).
198+
- The proposed grammar has the second best precision, topped only by Roe's model, which generates only 120 words (and 112 terms of the Voynich).
199+
- Models with a recall higher than the proposed grammar (Stolfi's, Palmer's, and my Slot model) generate almost infinite number of words.
200+
If we ignore these, Neal's model has a slightly higher recall than the proposed one, but generates more than 1.3 million words (compared to the 3 thousand generated by my model).
201+
199202

200203
---
201204

@@ -212,7 +215,7 @@ I this discussion, I am ignoring it, as it is also slightly more complex that th
212215

213216
<a id="Note3">**{3}**</a> A version of this graph that can be visualized using [Gephi](https://gephi.org/) (`StateMachine.gephi`) is stored [here](https://github.com/mzattera/v4j/blob/v.9.0.0/resources/analysis/slots/).
214217

215-
<a id="Note1">**{4}**</a> Class [`WordModelEvaluator`](https://github.com/mzattera/v4j/blob/v.9.0.0/eclipse/io.github.mzattera.v4j-apps/src/main/java/io/github/mzattera/v4j/applications/slot/WordModelEvaluator.java) was used for
218+
<a id="Note4">**{4}**</a> Class [`WordModelEvaluator`](https://github.com/mzattera/v4j/blob/v.9.0.0/eclipse/io.github.mzattera.v4j-apps/src/main/java/io/github/mzattera/v4j/applications/slot/WordModelEvaluator.java) was used for
216219
this purpose.
217220

218221
---

docs/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,10 @@ In other words, a token is an instance of a term. For example; the below line in
8686

8787
I create a graph showing how characters in words are connected, based on "Slots" concept.
8888

89+
- [Note 008 - Simply the Best Grammar for Voynichese (as far as I know)](./008)
90+
91+
I create a grammar to explain structure of Voynich words, showing it has the best F1 among all proposed models.
92+
8993

9094
# Bibliography and Reviews
9195

eclipse/io.github.mzattera.v4j-apps/src/main/java/io/github/mzattera/v4j/applications/slot/WordModelEvaluator.java

Lines changed: 1 addition & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@
2121
import io.github.mzattera.v4j.util.Counter;
2222
import io.github.mzattera.v4j.util.statemachine.SlotBasedModel;
2323
import io.github.mzattera.v4j.util.statemachine.StateMachine;
24-
import io.github.mzattera.v4j.util.statemachine.StateMachine.TrainMode;
2524

2625
/**
2726
* Evaluates F1 score for models of Voynich words, considered as a classifiers
@@ -351,7 +350,7 @@ private static void evaluatePalmer(Counter<String> voynichTokens) throws ParseEx
351350

352351
if (p.matcher(t).matches()) {
353352
++tp;
354-
ttp += voynichTokens.getCount(t);
353+
ttp += voynichTokens.getCount(t);
355354
}
356355
}
357356

@@ -634,88 +633,6 @@ private static void evaluateSlotMachine(Counter<String> voynichTokens) throws Pa
634633
evaluate("SM", voynichTokens, SlotAlphabet.toEva(m.emit().itemSet()));
635634
}
636635

637-
/**
638-
* Evaluate Slots state machine model. This is old model written manually.
639-
*
640-
* @param voynichTokens List of Voynich terms (EVA).
641-
*/
642-
private static void evaluateSlotMachineOld(Counter<String> voynichTokens) throws ParseException {
643-
644-
StateMachine m = new StateMachine();
645-
m.setInitialState(m.addState("Start"));
646-
m.addState("Slot_0");
647-
m.addState("0_q", "q");
648-
m.addState("0_s", "s");
649-
m.addState("0_d", "d");
650-
m.addState("Slot_1");
651-
m.addState("1_y", "y");
652-
m.addState("1_o", "o");
653-
m.addState("Slot_2");
654-
m.addState("2_r", "r");
655-
m.addState("2_l", "l");
656-
m.addState("3_Gallows", new String[] { "t", "p", "k", "f" });
657-
m.addState("4_Pedestals", new String[] { "ch", "sh" });
658-
m.addState("5_PedGallows", new String[] { "cth", "cph", "ckh" }); // MISSING cfh
659-
m.addState("6_eSeq", new String[] { "e", "ee" }); // MISSING eee
660-
m.addState("Slot_7");
661-
m.addState("7_d", "d");
662-
m.addState("7_s", "s");
663-
// m.addState("7_Gallows", new String[] {"t","p","k","f"});
664-
m.addState("7_Gallows", new String[] {}); // REMOVED
665-
m.addState("Slot_8");
666-
m.addState("8_a", "a");
667-
m.addState("8_o", "o");
668-
m.addState("9_iSeq", new String[] { "i", "ii" }); // MISSING iii
669-
m.addState("Slot_10");
670-
m.addState("10_d", "d");
671-
m.addState("10_lr", new String[] { "l", "r" });
672-
m.addState("10_mn", new String[] { "m", "n" });
673-
m.addState("11_y", "y");
674-
m.addState("End", true);
675-
676-
// ***** TODO test optional states, optional characters and splitting C and S
677-
678-
m.addNext("Start", new String[] { "Slot_0", "Slot_1", "Slot_2", "3_Gallows", "4_Pedestals", "5_PedGallows",
679-
"7_d", "7_s", "8_a", "6_eSeq" }); // (Possibly slot 6) IT WORKS
680-
m.addNext("Slot_0", new String[] { "0_q", "0_d", "0_s" });
681-
m.addNext("0_q", new String[] { "1_o" });
682-
m.addNext("0_s", new String[] { "1_o", "4_Pedestals" });
683-
m.addNext("0_d", new String[] { "1_o", "1_y", "4_Pedestals" });
684-
m.addNext("Slot_1", new String[] { "1_y", "1_o" });
685-
m.addNext("1_y", new String[] { "3_Gallows", "4_Pedestals" });
686-
m.addNext("1_o", new String[] { "Slot_2", "3_Gallows", "4_Pedestals", "5_PedGallows", "6_eSeq", "7_d", "8_a" });
687-
m.addNext("Slot_2", new String[] { "2_l", "2_r" });
688-
m.addNext("2_r", new String[] { "4_Pedestals", "Slot_8" });
689-
m.addNext("2_l", new String[] { "3_Gallows", "4_Pedestals", "7_d", "Slot_8" });
690-
m.addNext("3_Gallows", new String[] { "4_Pedestals", "6_eSeq", "Slot_8", "11_y" }); // (7_d, 11_y ??) 11_y WORKS
691-
m.addNext("4_Pedestals", new String[] { "6_eSeq", "Slot_7", "Slot_8", "11_y" }); // consider keeping them
692-
// separate? S won-t connect
693-
// to 7_s
694-
m.addNext("5_PedGallows", new String[] { "6_eSeq", "Slot_8", "7_d", "11_y" }); // (possibly 7_d, 11_y) THEY BOTH
695-
// WORK
696-
m.addNext("6_eSeq", new String[] { "Slot_7", "Slot_8", "11_y", "End" }); // possibly End // WORKS
697-
m.addNext("Slot_7", new String[] { "7_d", "7_s", "7_Gallows" });
698-
m.addNext("7_d", new String[] { "8_o", "8_a", "11_y", "End" });
699-
m.addNext("7_s", new String[] { "8_a", "11_y", "End" }); // possibly 8_o? it does in slot 1 - NOT WORKING ->
700-
// looks better if 8_a is removed
701-
m.addNext("7_Gallows", new String[] { "8_a", "11_y" }); // possibly 8_o? - NOT WORKING -> Looks better if
702-
// removed completely
703-
m.addNext("Slot_8", new String[] { "8_a", "8_o" });
704-
m.addNext("8_a", new String[] { "9_iSeq", "10_lr", "10_mn" }); // Possibly END? - NOT WORKING
705-
m.addNext("8_o", new String[] { "10_lr", "10_mn", "End" }); // Possibly END? - IT WORKS VERY WELL
706-
m.addNext("9_iSeq", new String[] { "10_lr", "10_mn" });
707-
m.addNext("Slot_10", new String[] { "10_d", "10_lr", "10_mn" });
708-
m.addNext("10_d", new String[] { "11_y", "End" });
709-
m.addNext("10_lr", new String[] { "11_y", "End" });
710-
m.addNext("10_mn", new String[] { "End" });
711-
m.addNext("11_y", new String[] { "End" });
712-
713-
evaluate("SMOLD", voynichTokens, m.emit().itemSet());
714-
715-
m.train(voynichTokens.itemSet(), TrainMode.F1);
716-
evaluate("SMOLDTRN", voynichTokens, m.emit().itemSet());
717-
}
718-
719636
/**
720637
* Evaluates and prints stats for a word generation model.
721638
*

0 commit comments

Comments
 (0)