Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
automatic document renaming
#1
Information 
Hi, I am trying to get a script written up to rename PDF and word documents in a certain folder by using the text in the documents.
I need them named: LASTNAME.Firstname CLIENT
the information would come from a timesheet so people would be entering that information into a table     - would this be possible ?
What I have so far is this, but I don't know how to get the information from the document table as each one would be different?

I am open to any suggestions on how this could work

import os
from docx import Document
import PyPDF2

def extract_text_from_docx(docx_path):
    doc = Document(docx_path)
    text = ""
    for paragraph in doc.paragraphs:
        text += paragraph.text
    return text

def extract_text_from_pdf(pdf_path):
    text = ""
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfFileReader(file)
        for page_num in range(reader.numPages):
            text += reader.getPage(page_num).extractText()
    return text

def main():
    directory = "path/to/your/documents"
    for filename in os.listdir(directory):
        if filename.endswith(".docx"):
            full_path = os.path.join(directory, filename)
            new_name = extract_text_from_docx(full_path)
        elif filename.endswith(".pdf"):
            full_path = os.path.join(directory, filename)
            new_name = extract_text_from_pdf(full_path)
        else:
            continue
        # Rename the file
        os.rename(full_path, os.path.join(directory, new_name + os.path.splitext(filename)[1]))

if __name__ == "__main__":
    main()
Gribouillis write Mar-19-2024, 09:06 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
Hi,
If your timesheet has a grid with explicit lines, it might be an idea to look at pdfPlumber.
I find that it handles pdfs with that particular feature very well.
set1 = {
                "vertical_strategy": "explicit",
                "horizontal_strategy": "explicit",
                "explicit_vertical_lines": page.curves+page.edges,
                "explicit_horizontal_lines": page.curves+page.edges}

            text = page.extract_tables(table_settings=set1)
You get lists for every line, where every element is a "field" you can use.
Paul
Pedroski55 and lisa_d like this post
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#3
I made a docx with a table, saved it also as PDF.

Assuming:
1. you only have 1 table per document, or the info you want is in the first table
2. the name is in row 1 column 2 in the table,

this gets the name. The easiest is to put the data in a dataframe I think.

I read fitz (PyMuPDF) is more advanced than PyPDF2

from docx import Document
import pandas as pd
import fitz

mydoc = '/home/pedro/myPython/pdfplumber/pdfs/table_docx.docx'
mypdf = '/home/pedro/myPython/pdfplumber/pdfs/table_docx.pdf'

# name from docx    
for table in Document(mydoc).tables:
    data = [[cell.text for cell in row.cells] for row in table.rows]
df = pd.DataFrame(data)
name = df.iloc[0,1] # 'John Smith'

# get name from pdf
doc = fitz.open(mypdf)
for page in doc:
    tabs = page.find_tables()
df = pd.DataFrame(tabs[0].extract())
name = df.iloc[0,1] # 'John Smith'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  renaming the 0 column in a dataframe Led_Zeppelin 5 1,563 Aug-16-2022, 04:07 PM
Last Post: deanhystad
  Matplotlib - automatic update frohr 1 1,110 Mar-29-2022, 07:05 PM
Last Post: deanhystad
  Functions to consider for file renaming and moving around directories cubangt 2 1,770 Jan-07-2022, 02:16 PM
Last Post: cubangt
  automatic create folders Mr_Kool 4 1,780 Dec-21-2021, 04:38 PM
Last Post: BashBedlam
  Automatic user/password entry on prompt by bash script PBOX_XS4_2001 3 2,793 May-18-2021, 06:42 PM
Last Post: Skaperen
  Python with win32com and EXIF renaming files. Amrcodes 4 3,696 Apr-03-2021, 08:51 PM
Last Post: DeaD_EyE
  copying an Excelsheet with its conent and renaming into different names deheugden 6 3,243 Jun-05-2020, 03:20 PM
Last Post: deheugden
  Automatic registering python to registry kozaizsvemira 1 2,201 Oct-22-2019, 11:23 AM
Last Post: kozaizsvemira
  renaming the file in a suitable format, I just wondering if it is possible go127a 11 5,178 Jun-26-2019, 06:15 AM
Last Post: snippsat
  Renaming explorer files in order? stroudie2 2 2,893 Mar-03-2019, 12:41 AM
Last Post: stroudie2

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020