2018-12-20-文字识别Demo之Python版Tesseract(Google)

文字识别Demo之Python版Tesseract(Google)

安装(Mac系统环境)

1
2
brew install tesseract
brew list tesseract

安装tesseract,并打印

1
2
3
4
5
6
/usr/local/Cellar/tesseract/4.0.0/bin/tesseract
/usr/local/Cellar/tesseract/4.0.0/include/tesseract/ (20 files)
/usr/local/Cellar/tesseract/4.0.0/lib/libtesseract.4.dylib
/usr/local/Cellar/tesseract/4.0.0/lib/pkgconfig/tesseract.pc
/usr/local/Cellar/tesseract/4.0.0/lib/ (2 other files)
/usr/local/Cellar/tesseract/4.0.0/share/tessdata/ (32 files)

https://github.com/tesseract-ocr/tessdata 下载语言包,比如简体中文,就是文件 chi_sim_vert.traineddata 和 chi_sim.traineddata, 复制到 /usr/local/Cellar/tesseract/4.0.0/share/tessdata/ 目录下
然后安装pytesseract

1
2
3
4
5
6
7
8
9

Installing via pip:
Check the pytesseract package page for more information.
$ (env)> pip install pytesseract
Or if you have git installed:
$ (env)> pip install -U git+https://github.com/madmaze/pytesseract.git
Installing from source:
$> git clone https://github.com/madmaze/pytesseract.git
$ (env)> cd pytesseract && pip install -U .

运行

1
python pytesseractDemo.py

pytesseractDemo.py代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# -*- coding: utf-8 -*-

# @Author : github.com/foolsparadise
# @Time : 2018-12-20 18:18:19
# @desc :
# @usage : python pytesseractDemo.py

from PIL import Image
import os
import pytesseract
# tesseract/4.0.0
# https://github.com/tesseract-ocr/tesseract/wiki
# https://github.com/tesseract-ocr/tessdata
# https://github.com/madmaze/pytesseract

# 图片
img = Image.open("./screenshot.png")
# 识别范围
question = img.crop((0, 0, 600, 400))
# tesseract 路径
pytesseract.pytesseract.tesseract_cmd = '/usr/local/Cellar/tesseract/4.0.0/bin/tesseract'
# 语言包目录
tessdata_dir_config = '--tessdata-dir "/usr/local/Cellar/tesseract/4.0.0/share/tessdata"'
# lang 指定中文简体
text = pytesseract.image_to_string(question, lang='chi_sim', config=tessdata_dir_config)
text = text.replace(" ", "")[2:]
# 打印
print(text)

MIT

1
2
代码在我的github项目
https://github.com/foolsparadise/pytesseractDemo.python