vietocr.net是c#编写的字符识别程序,首先进行jpg,gif,bmp到tiff的转换,这个用自带的画图就可以。然后使用VietOCR.NET-4.3进行多张tiff的merge。VietOCR.NET是基于OCR的应用,旨在帮助您执行打字机打印或扫描的图像转换为可编辑的文本。
主窗口由2个小组,左边一个就可以查看您要处理的照片,另外可以分析从图片中提取出文字。除了预览扫描的信息,右边也是该地区在那里你可以进行必要的修改文本。
Features
Java & .NET GUI frontends for Tesseract OCR engine
Supports all languages provided by Tesseract
Supports automatic download and installation of language packs
PDF, TIFF, JPEG, GIF, PNG, BMP image formats
Paste image from clipboard
Selection box for Region of Interest (ROI)
File drag-and-drop
Bulk & batch operations
Text replacement postprocessing
Integrated scanning support
Spellcheck with Hunspell
Make Box Files。在orderNo.tif所在的目录下打开一个命令行,输入
C:Program FilesTesseract-OCR>tesseract.exe lang.jhy.exp8.TIF lang.jhy.exp8 batch.nochop makebox
使用jTessBoxEditor打开orderNo.tif文件,需要记住的是第2步生成的orderNo.box要和这个orderNo.tif文件同在一个目录下。逐个校正文字,后保存。
下载jTessBoxEditor工具进行每个自的纠正(注意有nextpage逐页进行纠正)
官方介绍:
PDF, TIFF, JPEG, GIF, PNG, BMP image formats
Multi-page TIFF images
Screenshots
Selection box
File drag-and-drop
Paste image from clipboard
Postprocessing for Vietnamese to boost accuracy rate
Vietnamese input methods
Localized user interface for many languages (Localization project)
Integrated scanning support
Watch folder monitor for support of batch processing
Custom text replacement in postprocessing
Spellcheck with Hunspell
Support for downloading and installing language data packs and appropriate spell dictionaries
Bravenet Counter Stats
Powered by Bravenet
View Statistics