Skip to content

Example project of parsing Gutenberg books with Voikko for generating Finnish wordlists with classification. Idea blatantly stolen from Duukkis.

License

Notifications You must be signed in to change notification settings

tuminoid/gutenberg-voikko-analyser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generate Finnish wordlists from books on Gutenberg

Idea blatantly stolen from Duukkis, who shares the final word lists here. Unless you really want to do the parsing yourself, please utilize those files directly.

Usage

$ make

It'll take two hours or so. Downloading 250M books is slow, and Voikko will take its time to process 500M'ish of Finnish text.

  • data/books/ will contain downloaded books
  • data/lists/ will contain classified word lists when all is complete

About

Example project of parsing Gutenberg books with Voikko for generating Finnish wordlists with classification. Idea blatantly stolen from Duukkis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published