Skip to content

The FAQ

Colm O'Gairbhith edited this page Apr 7, 2016 · 5 revisions

Whats wrong with using just OCR ?

  • OCR is great, but it's not perfect. We need perfect so we put the human back in the loop to ensure perfect quality
    • This is an IoP project, Internet of People (not things)
  • OCR is not great for structure and to make information accessible, structure is the key to allowing navigation.

You expect someone to give their free time to help out someone they don't even know ?

  • Yes.
  • That's the way we are made, and when it's made easy, even more so.

Isn't someone already doing this ??

  • Well, no.
  • There is the Scripto project but it's got the following flaws
    • No OCR, it just makes graphics files available
    • No epub/html formats.Transcription is done in MediaWiki format.
    • No audio generation
    • Top-down, users cannot upload documents unless they want to run their own server.
    • Plugins to Wordpress/Drupal/Omeka
      • Example of Omeka. In this case we see that there is no rich text but some html tags appear allowed, e.g. < br > but no WYSIWYG interface. Omeka uses the http://openlayers.org/ for image viewing.
    • Not a community service, i.e. unclear if any user can upload content to transcribe or if this is some kind of admin function