The Ludovico Antonio Muratori (LAM) dataset is the largest line-level HTR dataset to date and contains 25,823 lines from Italian ancient manuscripts edited by a single author over 60 years. The dataset is available on Kaggle and comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available.
More info on the dataset can be found on the website: https://aimagelab.ing.unimore.it/go/lam
Please cite with the following BibTeX:
@inproceedings{cascianelli2022lam,
title={The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition},
author={Cascianelli, Silvia and Pippi, Vittorio and Martin, Maarand and Cornia, Marcella and Baraldi, Lorenzo and Christopher, Kermorvant and Cucchiara, Rita},
booktitle={International Conference on Pattern Recognition},
year={2022}
}