- 追加された行はこのように表示されます。
- 削除された行は
このように表示されます。
!! Open source OCR
! google-tesseract
https://github.com/tesseract-ocr/tesseract
https://github.com/tesseract-ocr
! NHocr
https://ja.osdn.net/projects/nhocr/
!INSTALL
zypper install autoconf aclocal libtool autoheader automake
zypper install gcc-c++ libjpeg8-devel libpng16-devel libtiff-devel autoconf
zypper install libicu-devel pango-devel pangomm-devel
zypper install leptonica-devel leptonica-tools
# zypper install leptonica
## zypper install tesseract-ocr tesseract-ocr-devel
zypper install libtesseract3 tesseract-ocr tesseract-ocr-traineddata-english, tesseract-ocr-traineddata-japanese
pip3.6 install pyocr :: PYTHON で呼び出す為;;
# install
$ git clone git://github.com/tesseract-ocr/tesseract.git
https://github.com/tesseract-ocr/tesseract.git
$ git clone git://github.com/tesseract-ocr/langdata.git
zypper install automake autoconf libtool
# cd tesseract
$ git checkout 4.1
# ver 4.1 を ( 最新は 5.0_
$ git submodule update --init --recursive
# autoconf ( Error がでるので再実行 )
$ autoconf
# autoreconf --install
# ./configure --prefix=$HOME/opt
# make
# make check
# make install
# make training
# make training-install
# # wget tesseract-4.0.0.tar.gz
# # tar xvfz tesseract-4.0.0.tar.gz
# # cd tesseract-4.0.0
# # ./autogen.sh
# # ./configure --prefix=$HOME/opt/tess
# # make
# # make install
# # make check
CPPFLAGS="-I$HOME/opt/includ -L$HOME/opt/lib64" pip3.7 install tesserocr
# 学習済み言語データをインストール
# 日本語の場合、jpnとjpn_vertの2つが必要
cd $HOME/opt/share/tessdata
wget https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata
wget https://github.com/tesseract-ocr/tessdata_best/raw/master/jpn.traineddata
wget https://github.com/tesseract-ocr/tessdata_best/raw/master/jpn_vert.traineddata
## sudo mv *.traineddata /usr/local/share/tessdata/
! Tune
*https://laplace-daemon.com/training-tesseract/
*https://qiita.com/aki_abekawa/items/418e069038fbdb77c59e
!ToDO
*https://qiita.com/atuyosi/items/c0933b5edf605c4a7c19
*https://qiita.com/bohemian916/items/67f22ee7aeac103dd205
*==http://hadashi-gensan.hatenablog.com/entry/2013/10/14/170129==
*==http://hadashi-gensan.hatenablog.com/entry/2014/01/15/135316==
*http://a244.hateblo.jp/entry/2015/07/28/060803
*https://ebi-works.com/ocr-python/
*https://www.kkaneko.jp/tools/ubuntu/tesseract_buildout.html
! 前処理
二値化
*https://kakasi.hatenablog.com/entry/2019/12/20/184812
*https://kakasi.hatenablog.com/entry/2020/03/02/151053
*https://algorithm.joho.info/programming/python/opencv-otsu-thresholding-py/
解像度を上げる
*https://tech-blog.optim.co.jp/entry/2021/02/24/100000
文字列の傾き調整
*https://base64.work/so/python/2937452
Split Bregman (ノイズ除去 )
*https://lp-tech.net/articles/CY2Kn
*https://lp-tech.net/articles/tkPFr
*https://qiita.com/MuAuan/items/3962e24ece1860759429
! メモ
*https://qiita.com/henjiganai/items/f99cdd541dacf6328d07
! Google Cloud Vision API
tips-GoogleAPI
pip3.7 install --upgrade google-cloud-vision
pip3.7 install --upgrade pymupdf
*https://valmore.work/cloud-vision-api-ocr/
!self
*https://qiita.com/tanreinama/items/8fc1c8af6554654aae00
サービスアカウントの作成 ( 認証キー )
*https://cloud.google.com/docs/authentication/getting-started?hl=ja
*https://cloud.google.com/docs/authentication/production
!self
*https://qiita.com/tanreinama/items/8fc1c8af6554654aae00
*https://qiita.com/tanreinama/items/8fc1c8af6554654aae00