Python Forum
builtins.TypeError: a bytes-like object is required, not 'str' - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: builtins.TypeError: a bytes-like object is required, not 'str' (/thread-15803.html)



builtins.TypeError: a bytes-like object is required, not 'str' - BigOldArt - Jan-31-2019

I am just getting started with python. Apologies if this is awkwardly presented

The syntax below worked in the python 2 embedded in SPSS. I am trying to use it in python 3. I did have to change the print commands to have parentheses.

The goal is to take a set of filled in fillable PDF forms and extract the filled in information and put it into a pipe-separated text file with one line per input form.
--- this is where the problem occurs. my vaalue is the contents of the /V element. This supposed to remove "\r" and replace it with "~".
if "\\r" in myvalue:
myvalue=re.sub("\r", "~",myvalue)
---- notes
The elements in the {} have a flag paired with some text. e.g., /T for the field name, /TU for the tool tip for when hover inside a form, and most importantly
/V for the filled in information.

As is seen in the python window below
/T prints okay
/V prints okay except when there are more than one line in the information. \r signals a newline. However, the Pipe-Separated does not work when read into packages so needs to be replaced. FirstName, LastName are okay. SayFewLines has some \r so gets the problem.

I do not see how to attach example inputs, but I would be glad to send the empty PDF form and 4 filled in cases with made up data. I can also sent the target.txt that is created via python2 in SPSS.



----- This is what was in the python window
evaluating 51 lines of code...
FirstName |
MiddEleInitial |
LastName |
GPA |
SayFewLines |
CheckedCheckBox |
UncheckedCheckBox |
State |
Handedness |
ListBoxPickALL |
ListBoxPick1 |
single |
package |
NotToolTip |
var names line completed
FirstName {'/FT': '/Tx', '/T': 'FirstName', '/TU': 'Given or Chistian name', '/Ff': 20971520, '/V': 'Alfred'}
Alfred
MiddleInitial {'/FT': '/Tx', '/T': 'MiddleInitial', '/TU': '', '/V': 'A.'}
A.
LastName {'/FT': '/Tx', '/T': 'LastName', '/TU': 'Surname or Family name', '/V': 'Alpha'}
Alpha
GPA {'/FT': '/Tx', '/T': 'GPA', '/TU': 'Grade Point Average', '/Ff': 4194304, '/V': '1.00'}
1.00
SayFewLines {'/FT': '/Tx', '/T': 'SayFewLines', '/TU': 'Put in a few lines of arbitrary text', '/Ff': 4096, '/V': b"If I had a hammer, I'd hammer in the morning\rI'd hammer in the evening All over this land\rI'd hammer out danger, I'd hammer out warning\rI'd hammer out the love between My brothers and my sisters\rAll over this land."}
Traceback (most recent call last):
Python Shell, prompt 16, line 45
builtins.TypeError: a bytes-like object is required, not 'str'


------ This is the python code
# extract filled in data from fillable PDF files
# put it into a pipe-separated TXT file
# use print commands while shaking down this job
# comment them out for large run
import sys
sys.path.append("c:/python37/lib/site-packages")
import os
import re
import glob
import PyPDF2

path = 'G:/'
outfile = open("G:/target.txt","w")

# get field names to put in first line of TXT file as variable names
firstpdf = "G:/all controls case 1.pdf"
open(firstpdf, "rb")
reader = PyPDF2.PdfFileReader(firstpdf)
page = reader.getPage(0)
text = page.extractText()
reader.numPages
for k, v in reader.getFields().items():
    myvalue = v.get('/T')
    print (myvalue, "|")
    outfile.write(myvalue)
    outfile.write("|")
outfile.write("\n")

print ("var names line completed")

# get filled in data
for file in glob.glob(path + "*.pdf"):
    if file.endswith(".pdf"):
        mycase= os.path.join(path, file)
        pdf = open(mycase, "rb")
        reader = PyPDF2.PdfFileReader(pdf)
        page = reader.getPage(0)
        text = page.extractText()
        reader.numPages
        for k, v in reader.getFields().items():
            print (k, v)
            myvalue = v.get('/V',"blank")
            if isinstance(myvalue,(list,)): 
                myvalue = ','.join(myvalue)
            if "\\r" in myvalue:
                myvalue=re.sub("\\r", "~",myvalue)
            print (myvalue)
            outfile.write(myvalue)
            outfile.write("|")
    outfile.write("\n")
outfile.close()