100字范文 > 厦门房地产网签备案的图片识别之一

厦门房地产网签备案的图片识别之一

时间：2020-10-29 08:05:03

相关推荐

厦门房地产网签备案的图片识别之一

本文写于2月11日，鼠年除夕夜，祝所有看到本文的朋友们身体健康，万事如意！之前已经搞定了厦门房地产网签备案的图片下载，接下来就是识别图片的内容。关于图片识别，网上大量的使用pytesseract进行识别的文章，但是使用了之后，发现对于中文的识别并不好，无奈之下另寻他途，发现百度的OCR还不错，每天5000次的免费额度，对于普通个人来说已经足够了，关于如何使用百度OCR，可见这篇文章https://zoutao./article/details/86705491以下是识别的具体内容，本文会持续更新至实现作者的全部意图。

第一步：识别图片内容，并读入csv文件

# 百度tesseract-ocr使用from aip import AipOcrimport osimport pandas as pdfrom datetime import datefrom openpyxl import load_workbook""" API """APP_ID = '23657473'API_KEY = 'WG43q2kD6vDUAjkGAse3Ei6y'SECRET_KEY = 'IMATPqqUmSrmYvMVrwEP1siXjUvHqf44'# 初始化AipFace对象client = AipOcr(APP_ID, API_KEY, SECRET_KEY)""" 读取图片 """def get_file_content(filePath):with open(filePath, 'rb') as fp:return fp.read()def img_to_str(image_path):""" 可选参数 """options = {}options["language_type"] = "CHN_ENG" # 中英文混合options["detect_direction"] = "true" # 检测朝向options["detect_language"] = "true" # 是否检测语言options["probability"] = "false" # 是否返回识别结果中每一行的置信度image = get_file_content(image_path)""" 带参数调用通用文字识别 """result = client.basicGeneral(get_file_content(filePath), options)# 格式化输出-提取需要的部分if 'words_result' in result:oldtext = ('\n'.join([w['words'] for w in result['words_result']]))text = oldtext.replace(',','').replace(':',',')''' save '''fs = open(root + '\\' + file[:-4] + '.csv', 'w+', encoding='utf-8') # 将str,保存到txtfs.write(text)fs.close()csv = pd.read_csv(root + '\\' + file[:-4] + '.csv',encoding = 'utf-8')csv.to_excel(root + '\\' + file[:-4] + '.xlsx', sheet_name='data')os.remove(root + '\\' + file[:-4] + '.csv') # 删除csv，以免文件太多繁杂# print(type(result), "和", type(text))return textif __name__ == '__main__':for root,dirs,files in os.walk(r'C:\data\网签备份' + '\\' + str(date.today())):for file in files:if file[-3:] == 'png':filePath = root + '\\' + file# print(filePath)print(img_to_str(filePath))

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。