场景
期末复习的时候,题库一大堆,尤其是某些老师给你来个不给题库,反正就是一大堆文档,需要快速从中找出某题在哪个题库里,这段小代码由此而生。
功能
如题,就是查某句话都在指定目录下的指定后缀名文件里出现过,然后分别告诉你出现在哪个文档,以及它出现在哪句话。我这段代码可以搜索当前目录下所有的Excel文件(仅支持.xlsx)、Word文件(仅支持.docx)和文本文件
代码
import os
from openpyxl import load_workbook
from docx import Document
def search_files(keyword):
# 搜索当前目录下所有的Excel文件(仅支持.xlsx)、Word文件(仅支持.docx)和文本文件
filepaths = [os.path.join(root, filename)
for root, _, files in os.walk('.')
for filename in files
if not filename.startswith('~$') and filename.endswith(('.xlsx', '.docx', '.txt'))]
# 遍历每个文件,查找关键词
found_in_files = []
for filepath in filepaths:
if filepath.endswith('.txt'):
with open(filepath, 'r', encoding='utf-8') as f:
text = f.read()
if keyword in text:
lines = text.split('\n')
for line in lines:
if keyword in line:
found_in_files.append((filepath, line))
elif filepath.endswith(('.xlsx')):
try:
workbook = load_workbook(filepath)
for sheetname in workbook.sheetnames:
sheet = workbook[sheetname]
for row in sheet.iter_rows(values_only=True):
for cell_value in row:
if isinstance(cell_value, str) and keyword in cell_value:
found_in_files.append((filepath, cell_value))
except:
pass
elif filepath.endswith('.docx'):
document = Document(filepath)
for paragraph in document.paragraphs:
if keyword in paragraph.text:
found_in_files.append((filepath, paragraph.text))
for table in document.tables:
for row in table.rows:
for cell in row.cells:
if keyword in cell.text:
found_in_files.append((filepath, cell.text))
# 输出搜索结果
if found_in_files:
print('关键词“{}”在以下文件中出现:'.format(keyword))
for filepath, line in found_in_files:
print('文件:{}'.format(filepath))
print('所在句子:{}'.format(line))
print('-'*50)
else:
print('没有找到包含关键词“{}”的文件。'.format(keyword))
if __name__ == "__main__":
keyword = input('请输入要搜索的关键词:')
search_files(keyword)
效果展示
我觉得还不错