!! Open source OCR ! google-tesseract https://github.com/tesseract-ocr/tesseract https://github.com/tesseract-ocr ! NHocr https://ja.osdn.net/projects/nhocr/ !INSTALL zypper install autoconf aclocal libtool autoheader automake zypper install gcc-c++ libjpeg8-devel libpng16-devel libtiff-devel autoconf zypper install libicu-devel pango-devel pangomm-devel zypper install leptonica-devel leptonica-tools # zypper install leptonica ## zypper install tesseract-ocr tesseract-ocr-devel zypper install libtesseract3 tesseract-ocr tesseract-ocr-traineddata-english, tesseract-ocr-traineddata-japanese pip3.6 install pyocr :: PYTHON で呼び出す為;; # install $ git clone git://github.com/tesseract-ocr/tesseract.git https://github.com/tesseract-ocr/tesseract.git $ git clone git://github.com/tesseract-ocr/langdata.git zypper install automake autoconf libtool # cd tesseract $ git checkout 4.1 # ver 4.1 を ( 最新は 5.0_ $ git submodule update --init --recursive # autoconf ( Error がでるので再実行 ) $ autoconf # autoreconf --install # ./configure --prefix=$HOME/opt # make # make check # make install # make training # make training-install # # wget tesseract-4.0.0.tar.gz # # tar xvfz tesseract-4.0.0.tar.gz # # cd tesseract-4.0.0 # # ./autogen.sh # # ./configure --prefix=$HOME/opt/tess # # make # # make install # # make check CPPFLAGS="-I$HOME/opt/includ -L$HOME/opt/lib64" pip3.7 install tesserocr # 学習済み言語データをインストール # 日本語の場合、jpnとjpn_vertの2つが必要 cd $HOME/opt/share/tessdata wget https://github.com/tesseract-ocr/tessdata_best/raw/master/eng.traineddata wget https://github.com/tesseract-ocr/tessdata_best/raw/master/jpn.traineddata wget https://github.com/tesseract-ocr/tessdata_best/raw/master/jpn_vert.traineddata ## sudo mv *.traineddata /usr/local/share/tessdata/ ! Tune *https://laplace-daemon.com/training-tesseract/ *https://qiita.com/aki_abekawa/items/418e069038fbdb77c59e !ToDO *https://qiita.com/atuyosi/items/c0933b5edf605c4a7c19 *https://qiita.com/bohemian916/items/67f22ee7aeac103dd205 *==http://hadashi-gensan.hatenablog.com/entry/2013/10/14/170129== *==http://hadashi-gensan.hatenablog.com/entry/2014/01/15/135316== *http://a244.hateblo.jp/entry/2015/07/28/060803 *https://ebi-works.com/ocr-python/ *https://www.kkaneko.jp/tools/ubuntu/tesseract_buildout.html ! 前処理 二値化 *https://kakasi.hatenablog.com/entry/2019/12/20/184812 *https://kakasi.hatenablog.com/entry/2020/03/02/151053 *https://algorithm.joho.info/programming/python/opencv-otsu-thresholding-py/ 解像度を上げる *https://tech-blog.optim.co.jp/entry/2021/02/24/100000 文字列の傾き調整 *https://base64.work/so/python/2937452 Split Bregman (ノイズ除去 ) *https://lp-tech.net/articles/CY2Kn *https://lp-tech.net/articles/tkPFr *https://qiita.com/MuAuan/items/3962e24ece1860759429 ! メモ *https://qiita.com/henjiganai/items/f99cdd541dacf6328d07 ! Google Cloud Vision API tips-GoogleAPI pip3.7 install --upgrade google-cloud-vision pip3.7 install --upgrade pymupdf *https://valmore.work/cloud-vision-api-ocr/ サービスアカウントの作成 ( 認証キー ) *https://cloud.google.com/docs/authentication/getting-started?hl=ja *https://cloud.google.com/docs/authentication/production !self *https://qiita.com/tanreinama/items/8fc1c8af6554654aae00 *https://qiita.com/tanreinama/items/8fc1c8af6554654aae00