场景

期末复习的时候,题库一大堆,尤其是某些老师给你来个不给题库,反正就是一大堆文档,需要快速从中找出某题在哪个题库里,这段小代码由此而生。

功能

如题,就是查某句话都在指定目录下的指定后缀名文件里出现过,然后分别告诉你出现在哪个文档,以及它出现在哪句话。我这段代码可以搜索当前目录下所有的Excel文件(仅支持.xlsx)、Word文件(仅支持.docx)和文本文件

代码

import os
from openpyxl import load_workbook
from docx import Document

def search_files(keyword):
    # 搜索当前目录下所有的Excel文件(仅支持.xlsx)、Word文件(仅支持.docx)和文本文件
    filepaths = [os.path.join(root, filename)
                 for root, _, files in os.walk('.')
                 for filename in files
                 if not filename.startswith('~$') and filename.endswith(('.xlsx', '.docx', '.txt'))]

    # 遍历每个文件,查找关键词
    found_in_files = []
    for filepath in filepaths:
        if filepath.endswith('.txt'):
            with open(filepath, 'r', encoding='utf-8') as f:
                text = f.read()
                if keyword in text:
                    lines = text.split('\n')
                    for line in lines:
                        if keyword in line:
                            found_in_files.append((filepath, line))
                            
        elif filepath.endswith(('.xlsx')):
            try:
                workbook = load_workbook(filepath)
                for sheetname in workbook.sheetnames:
                    sheet = workbook[sheetname]
                    for row in sheet.iter_rows(values_only=True):
                        for cell_value in row:
                            if isinstance(cell_value, str) and keyword in cell_value:
                                found_in_files.append((filepath, cell_value))
            except:
                pass
                
        elif filepath.endswith('.docx'):
            document = Document(filepath)
            for paragraph in document.paragraphs:
                if keyword in paragraph.text:
                    found_in_files.append((filepath, paragraph.text))
                    
            for table in document.tables:
                for row in table.rows:
                    for cell in row.cells:
                        if keyword in cell.text:
                            found_in_files.append((filepath, cell.text))

    # 输出搜索结果
    if found_in_files:
        print('关键词“{}”在以下文件中出现:'.format(keyword))
        for filepath, line in found_in_files:
            print('文件:{}'.format(filepath))
            print('所在句子:{}'.format(line))
            print('-'*50)
    else:
        print('没有找到包含关键词“{}”的文件。'.format(keyword))

if __name__ == "__main__":
    keyword = input('请输入要搜索的关键词:')
    search_files(keyword)

效果展示

2023-06-13T07:57:14.png
我觉得还不错