Skip to content

Commit 72ab41b

Browse files
committed
First working note added, dcoumented Bible factory related stuff.
1 parent d44353b commit 72ab41b

File tree

11 files changed

+113
-109
lines changed

11 files changed

+113
-109
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
This is a Java library i created to experiment with the [Voynich manuscript](https://en.wikipedia.org/wiki/Voynich_manuscript).
44

5+
The outcomes of my experiments are tracked on the [project pages](https://mzattera.github.io/v4j/).
6+
57
The folder `eclipse` contains an eclipse workspace. The (Maven) project `io.github.mattera.v4j` holds the actual code for the Java library.
68
The library content is described below. The (Maven) project `io.github.mattera.v4j-apps` contains classes I created to experiment with the
79
Voynich manuscript; here you cna find examples about how to use the library.
@@ -56,6 +58,7 @@ This class also provides means to get the text alphabet, split text in words, co
5658
(think of a book made of chapters made or paragraphs). `filterElements()` and `splitElements()` can be used to select parts of text,
5759
or cut the text into parts, based on rules.
5860

61+
<a id="ivtff">
5962
### Getting the Voynich Text - `io.github.mattera.v4j.text.ivtff`
6063

6164
The main class in this package is `IvtffText` that represents a text in IVTFF (Intermediate Voynich Transliteration File Format) format,
@@ -99,6 +102,15 @@ can be used to create IVTFF documents by filtering and/or splitting content of a
99102
IvtffText doc = VoynichFactory.getDocument(TranscriptionType.MAJORITY);
100103
doc = doc.filterPages(new PageFilter.Builder().illustrationType("B").build());
101104
```
105+
### Other (Regular) Texts - `io.github.mattera.v4j.text.txt`
106+
107+
`TextString` represent a Java string as a `Text` document, whilst `TextFile` represents a text files
108+
composed by numbered `TextLine`s. These classes allow processing regular (e.g. contemporary plain English)
109+
texts within the vj4 library.
110+
111+
Sometimes it is useful to compare Voynich statistics with those from a known text. For this reason
112+
`BibleFactory` provide methods to return the text of the Bible in different languages as `TextFile`
113+
instances.
102114

103115
### Testing (under src/test/java) - `io.github.mattera.v4j.test
104116

docs/001/index.md

Lines changed: 16 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,30 @@
1-
## Welcome to GitHub Pages
1+
## Note 001 - The Text
22

3-
You can use the [editor on GitHub](https://github.com/mzattera/v4j/edit/master/docs/index.md) to maintain and preview the content for your website in Markdown files.
3+
_Last updated Sep. 6th, 2021._
44

5-
Whenever you commit to this repository, GitHub Pages will run [Jekyll](https://jekyllrb.com/) to rebuild the pages in your site, from the content in your Markdown files.
5+
_This note refers to release XXXX of v4j. Some of the content might not apply to more recent versions of the library._
66

7-
### Markdown
7+
### The Voynich Text
88

9-
Markdown is a lightweight and easy-to-use syntax for styling your writing. It includes conventions for
9+
As explained in the [v4j README](https://github.com/mzattera/v4j#ivtff), the library provides factory methods to
10+
obtain an `IvtffText` instance with the Voynich text. At present the library provides means to obtain the
11+
Landini-Stolfi Interlinear file (**LSI**) and an augmented version of it, containing concordance and majority versions of the text.
1012

11-
```markdown
12-
Syntax highlighted code block
13+
The corresponding IVTFF files (which are read by the factory) can be found in the [resource folder]() of the library.
1314

14-
# Header 1
15-
## Header 2
16-
### Header 3
15+
The "augmented" version is created using class [`BuildConcordanceVersion`](); the input for the class
16+
is a slightly modified version of LSI that can be found in the [v4j-apps resource folder]().
17+
In this version, minor changes are done, that do not change the text content, in order to make sure
18+
all the different versions of the lines align properly, as required by `BuildConcordanceVersion` code.
1719

18-
- Bulleted
19-
- List
20+
### The Bible Text
2021

21-
1. Numbered
22-
2. List
22+
Similarly, class [`BuildBibleTranscription`]() is used to produce tex version if the Bible from
23+
XML files that can be found in the v4j-apps resource folder]().
2324

24-
**Bold** and _Italic_ and `Code` text
25+
The corresponding IVTFF files (which are read by the factory) can be found in the [resource folder]() of the library.
2526

26-
[Link](url) and ![Image](src)
27-
```
2827

29-
For more details see [GitHub Flavored Markdown](https://guides.github.com/features/mastering-markdown/).
3028

31-
### Jekyll Themes
3229

33-
Your Pages site will use the layout and styles from the Jekyll theme you have selected in your [repository settings](https://github.com/mzattera/v4j/settings/pages). The name of this theme is saved in the Jekyll `_config.yml` configuration file.
3430

35-
### Support or Contact
36-
37-
Having trouble with Pages? Check out our [documentation](https://docs.github.com/categories/github-pages-basics/) or [contact support](https://support.github.com/contact) and we’ll help you sort it out.

docs/index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
## Welcome to GitHub Pages
22

3-
Hi, in these pages I store thoughts, working notes, rants and frustrations about the [Voynich manuscript](https://en.wikipedia.org/wiki/Voynich_manuscript).
3+
Hi, in these pages I store thoughts, working notes, rants and frustrations about the [Voynich manuscript](https://en.wikipedia.org/wiki/Voynich_manuscript)
4+
as resulting from my work with the [v4j library](https://github.com/mzattera/v4j).
45

56
### Working Notes
67

7-
[Note 001](./001)
8+
[Note 001 - The Text](./001)

eclipse/io.github.mattera.v4j/src/main/java/io/github/mattera/v4j/support/BuildBibleTranscription.java

Lines changed: 0 additions & 69 deletions
This file was deleted.

eclipse/io.github.mattera.v4j/src/main/java/io/github/mattera/v4j/text/txt/BibleFactory.java

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,15 @@
33
*/
44
package io.github.mattera.v4j.text.txt;
55

6-
import java.io.File;
76
import java.io.IOException;
87
import java.net.URISyntaxException;
9-
import java.net.URL;
108

119
import io.github.mattera.v4j.text.alphabet.Alphabet;
10+
import io.github.mattera.v4j.util.FileUtil;
1211

1312
/**
14-
* Factory class to read different language versions of the Bible. The
15-
* documents are read from TXT files in resource folder.
13+
* Factory class to read different language versions of the Bible. The documents
14+
* are read from TXT files in resource folder.
1615
*
1716
* @author Massimiliano "Maxi" Zattera
1817
*
@@ -22,22 +21,23 @@ public class BibleFactory {
2221
/**
2322
* Languages in which the bible is available.
2423
*/
25-
public static final String[] LANGUAGES = {"Italian", "Latin", "German", "French"};
26-
24+
public static final String[] LANGUAGES = { "Italian", "Latin", "German", "French" };
25+
2726
/**
28-
* Name of folder with transcriptions (inside resource folder)
27+
* Name of folder with transcriptions (inside resource folder), including
28+
* trailing separator
2929
*/
30-
public static final String TRANSCRIPTION_FOLDER = "Transcriptions\\Bible";
30+
public static final String TRANSCRIPTION_FOLDER = "Transcriptions/Bible/";
3131

3232
/**
3333
* Returns a version of the Bible in given language.
3434
*
3535
* @return given transcription type for the MZ transcription
36-
* @throws IOException
37-
* @throws URISyntaxException
36+
* @throws IOException
37+
* @throws URISyntaxException
3838
*/
3939
public static TextFile getDocument(String language) throws IOException, URISyntaxException {
40-
URL url = ClassLoader.getSystemResource(TRANSCRIPTION_FOLDER + "/" + language+".txt");
41-
return new TextFile(new File(url.toURI()), Alphabet.UTF_16, "UTF-8");
40+
return new TextFile(FileUtil.getResourceFile(TRANSCRIPTION_FOLDER + language + ".txt"), Alphabet.UTF_16,
41+
"UTF-8");
4242
}
4343
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
/**
2+
*
3+
*/
4+
package io.github.mattera.v4j.applications;
5+
6+
import java.io.File;
7+
import java.io.IOException;
8+
import java.util.ArrayList;
9+
import java.util.List;
10+
import java.util.regex.Matcher;
11+
import java.util.regex.Pattern;
12+
13+
import io.github.mattera.v4j.util.FileUtil;
14+
15+
/**
16+
* This class takes the XML files for the Bible transcription in different
17+
* languages and transforms them in .TXT files.
18+
*
19+
* Source files are taken from: http://christos-c.com/bible/
20+
*
21+
* @author Massimiliano "Maxi" Zattera
22+
*
23+
*/
24+
public final class BuildBibleTranscription {
25+
26+
27+
/// MAKE SURE THIS IS CORRECT BUT DO NOT USE RESOURCE FOLDER AS IT IS READ
28+
/// ONLY
29+
private final static String OUTPUT_FOLDER = "D:\\";
30+
31+
private final static Pattern VERSE_PATTERN = Pattern
32+
.compile("<seg id=[\"'][^'\"]*[\"'] type=[\"']verse[\"']>([^<]+)</seg>");
33+
34+
/**
35+
* @param args
36+
*/
37+
public static void main(String[] args) {
38+
try {
39+
convert(FileUtil.getResourceFile("Transcriptions/Bible/French.xml"));
40+
} catch (Exception e) {
41+
e.printStackTrace();
42+
} finally {
43+
System.out.println("Completed.");
44+
}
45+
}
46+
47+
private static void convert(File f) throws IOException {
48+
49+
// read the text in the file
50+
List<String> lines = FileUtil.read(f, "UTF-8");
51+
String xml = String.join("\n", lines);
52+
53+
// find verses in the XML
54+
Matcher m = VERSE_PATTERN.matcher(xml);
55+
List<String> txt = new ArrayList<>();
56+
while (m.find()) {
57+
String s = m.group(1).trim();
58+
if (s.length() > 0)
59+
txt.add(s);
60+
}
61+
62+
// write the result as a simple file
63+
File out = new File (OUTPUT_FOLDER, f.getName().replace(".xml", ".txt"));
64+
FileUtil.write(txt, out.getCanonicalPath(), "UTF-8");
65+
}
66+
}

eclipse/io.github.mzattera.v4j-apps/src/main/java/io/github/mattera/v4j/applications/BuildConcordanceVersion.java

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,15 +19,16 @@
1919
import io.github.mattera.v4j.util.FileUtil;
2020

2121
/**
22-
* Processes an interlinear transcription to create concordance and majority versions of it (added as new "artificial" transcribers).
22+
* Processes an interlinear transcription to create concordance and majority
23+
* versions of it (added as new "artificial" transcribers).
2324
*
2425
* STATUS: Working & with (some) test harness.
2526
*
2627
* @author Massimiliano "Maxi" Zattera
2728
*/
28-
public class BuildConcordanceVersion {
29+
public final class BuildConcordanceVersion {
2930

30-
/// MAKE SURE THIS IS CORRECT BUT DO NOT USE RESOURCE FILES; AS THEY ARE READ
31+
/// MAKE SURE THIS IS CORRECT BUT DO NOT USE RESOURCE FOLDER AS IT IS READ
3132
/// ONLY
3233
private final static String OUTPUT_FOLDER = "D:\\";
3334

0 commit comments

Comments
 (0)