トップ 差分 一覧 ソース 検索 ヘルプ RSS ログイン

tips-OCR

  Open source OCR

google-tesseract

https://github.com/tesseract-ocr/tesseract

https://github.com/tesseract-ocr

NHocr

https://ja.osdn.net/projects/nhocr/

INSTALL

zypper install  autoconf aclocal libtool autoheader automake
zypper  install gcc-c++ libjpeg8-devel libpng16-devel libtiff-devel autoconf
zypper install libicu-devel pango-devel pangomm-devel 
zypper  install leptonica-devel leptonica-tools
# zypper install leptonica
## zypper install tesseract-ocr tesseract-ocr-devel
zypper install libtesseract3 tesseract-ocr tesseract-ocr-traineddata-english, tesseract-ocr-traineddata-japanese
pip3.6 install pyocr  :: PYTHON で呼び出す為;;
# install 
$ git clone git://github.com/tesseract-ocr/tesseract.git
            https://github.com/tesseract-ocr/tesseract.git
$ git clone git://github.com/tesseract-ocr/langdata.git
zypper install automake autoconf libtool

# cd tesseract
$ git checkout 4.1 
# ver 4.1 を ( 最新は 5.0_
$ git submodule update --init --recursive   
# autoconf ( Error がでるので再実行 )
$ autoconf 
# autoreconf --install
# ./configure --prefix=$HOME/opt
# make
# make check
# make install
# make training
# make training-install
# # wget  tesseract-4.0.0.tar.gz
# # tar xvfz tesseract-4.0.0.tar.gz 
# # cd tesseract-4.0.0
# # ./autogen.sh
# # ./configure --prefix=$HOME/opt/tess
# # make
# # make install
# # make check


CPPFLAGS="-I$HOME/opt/includ -L$HOME/opt/lib64" pip3.7 install tesserocr
# 学習済み言語データをインストール
# 日本語の場合、jpnとjpn_vertの2つが必要

cd $HOME/opt/share/tessdata
wget https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata
wget https://github.com/tesseract-ocr/tessdata_best/raw/master/jpn.traineddata
wget https://github.com/tesseract-ocr/tessdata_best/raw/master/jpn_vert.traineddata
## sudo mv *.traineddata /usr/local/share/tessdata/

Tune

ToDO

前処理

二値化
解像度を上げる
文字列の傾き調整
Split Bregman (ノイズ除去 )

メモ

Google Cloud Vision API

tips-GoogleAPI

pip3.7 install --upgrade google-cloud-vision 
pip3.7 install --upgrade pymupdf
サービスアカウントの作成 ( 認証キー )

self