トップ 一覧 検索 ヘルプ RSS ログイン

tips-OCRの変更点

  • 追加された行はこのように表示されます。
  • 削除された行はこのように表示されます。
!! Open source OCR

! google-tesseract

https://github.com/tesseract-ocr/tesseract

https://github.com/tesseract-ocr


! NHocr
https://ja.osdn.net/projects/nhocr/

!INSTALL
 zypper install  autoconf aclocal libtool autoheader automake
 zypper  install gcc-c++ libjpeg8-devel libpng16-devel libtiff-devel autoconf
 zypper install libicu-devel pango-devel pangomm-devel 
 zypper  install leptonica-devel leptonica-tools
 # zypper install leptonica
 ## zypper install tesseract-ocr tesseract-ocr-devel
 zypper install libtesseract3 tesseract-ocr tesseract-ocr-traineddata-english, tesseract-ocr-traineddata-japanese
 pip3.6 install pyocr  :: PYTHON で呼び出す為;;

 # install 
 $ git clone git://github.com/tesseract-ocr/tesseract.git
             https://github.com/tesseract-ocr/tesseract.git
 $ git clone git://github.com/tesseract-ocr/langdata.git
 zypper install automake autoconf libtool
 
 # cd tesseract
 $ git checkout 4.1 
 # ver 4.1 を ( 最新は 5.0_
 $ git submodule update --init --recursive   

 # autoconf ( Error がでるので再実行 )
 $ autoconf 
 # autoreconf --install
 # ./configure --prefix=$HOME/opt
 # make
 # make check
 # make install
 # make training
 # make training-install

 # # wget  tesseract-4.0.0.tar.gz
 # # tar xvfz tesseract-4.0.0.tar.gz 
 # # cd tesseract-4.0.0
 # # ./autogen.sh
 # # ./configure --prefix=$HOME/opt/tess
 # # make
 # # make install
 # # make check
 
 
 CPPFLAGS="-I$HOME/opt/includ -L$HOME/opt/lib64" pip3.7 install tesserocr

 # 学習済み言語データをインストール
 # 日本語の場合、jpnとjpn_vertの2つが必要
 
 cd $HOME/opt/share/tessdata
 wget https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata
 wget https://github.com/tesseract-ocr/tessdata_best/raw/master/jpn.traineddata
 wget https://github.com/tesseract-ocr/tessdata_best/raw/master/jpn_vert.traineddata
 ## sudo mv *.traineddata /usr/local/share/tessdata/

! Tune
*https://laplace-daemon.com/training-tesseract/
*https://qiita.com/aki_abekawa/items/418e069038fbdb77c59e

!ToDO 

*https://qiita.com/atuyosi/items/c0933b5edf605c4a7c19
*https://qiita.com/bohemian916/items/67f22ee7aeac103dd205

*==http://hadashi-gensan.hatenablog.com/entry/2013/10/14/170129==
*==http://hadashi-gensan.hatenablog.com/entry/2014/01/15/135316==

*http://a244.hateblo.jp/entry/2015/07/28/060803
*https://ebi-works.com/ocr-python/
*https://www.kkaneko.jp/tools/ubuntu/tesseract_buildout.html


! 前処理
 二値化
*https://kakasi.hatenablog.com/entry/2019/12/20/184812

*https://kakasi.hatenablog.com/entry/2020/03/02/151053

*https://algorithm.joho.info/programming/python/opencv-otsu-thresholding-py/

 解像度を上げる
*https://tech-blog.optim.co.jp/entry/2021/02/24/100000

 文字列の傾き調整
*https://base64.work/so/python/2937452

 Split Bregman (ノイズ除去 )
*https://lp-tech.net/articles/CY2Kn
*https://lp-tech.net/articles/tkPFr
*https://qiita.com/MuAuan/items/3962e24ece1860759429


! メモ
*https://qiita.com/henjiganai/items/f99cdd541dacf6328d07


! Google Cloud Vision API
tips-GoogleAPI 

 pip3.7 install --upgrade google-cloud-vision 
 pip3.7 install --upgrade pymupdf

*https://valmore.work/cloud-vision-api-ocr/

!self 
*https://qiita.com/tanreinama/items/8fc1c8af6554654aae00

 サービスアカウントの作成 ( 認証キー )
*https://cloud.google.com/docs/authentication/getting-started?hl=ja
*https://cloud.google.com/docs/authentication/production

!self 
*https://qiita.com/tanreinama/items/8fc1c8af6554654aae00
*https://qiita.com/tanreinama/items/8fc1c8af6554654aae00