摘要
-
Post-processing is a crucial step in improving the performance of OCR process. In this paper, we present a novel approach which explores a modified way of candidate generating and candidate scoring at character level as well as word level. These features are combined with some important features suggested by related work for ranking candidates in a regression model. The experimental results show that our approach has comparable results with the top performing approaches in the Post-OCR text correction competition ICDAR 2017.
后处理是提高OCR过程性能的关键步骤。 在本文中,我们提出了一种新的方法,探索了在字符级别和单词级别上候选生成和候选评分的修改方式。 这些特征与相关工作建议的一些重要特征相结合,用于在回归模型中对候选者进行排序。 实验结果表明,我们的方法与后OCR文本校正竞赛ICDAR 2017中表现最佳的方法具有可比性。