Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Where is the wrong indentation?
#1
Hi!

I have the following code. I don't know how PDFminer works so the first part is somebody else code which I modified a bit. It seems to work but does everything 8 or more times instead of just one and gets slower with every line I add. The point of it would be that there are a lot of PDF-s in a folder. I want to open some which have specific words in their titles and extract some words into an excel file. The concept seems to work. The problem is what i described above. I think there is a wring indentation. Or is it something else? I am still learning. Thank you for your help.
I also had to change the variables to fruits, I hope it doesn't look that silly. It is far from finished so most fruits are not added in the end yet.

import glob, os
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from io import StringIO #first was cStringIO
import re


filename = "Data_Aquisition.csv"
f = open(filename, "a", encoding="utf-8")
headers = "company_name, apple_date, apple_designation, banana_date, banana_Designation, orange_date, orange_designation, pear, cherry_date, cheery_company\n"
f.write(headers)

pdflist = glob.glob("*.pdf")

for file in pdflist:
	
	if "apple" or "banana" or "orange" or "Orange" or "pear" or "cherry" or "Cherry" or "Apple" or "Banana" or "Pear" in str(file):
	
		def convert(fname, pages=None):
			if not pages:
				pagenums = set()
			else:
				pagenums = set(pages)

			output = StringIO()
			manager = PDFResourceManager()
			converter = TextConverter(manager, output, laparams=LAParams())
			interpreter = PDFPageInterpreter(manager, converter)

			infile = open(fname, 'rb')
			for page in PDFPage.get_pages(infile, pagenums):
				interpreter.process_page(page)
			infile.close()
			converter.close()
			text = output.getvalue()
			output.close
			return text
		   
		def convertMultiple(pdfDir, txtDir):
			if pdfDir == "": pdfDir = os.getcwd() + "\\" #if no pdfDir passed in 
			for pdf in os.listdir(pdfDir): #iterate through pdfs in pdf directory
				fileExtension = pdf.split(".")[-1]
				if fileExtension == "pdf":
					pdfFilename = pdfDir + pdf 
					text = convert(pdfFilename) #get string of text content of pdf
					text1line = str(text).replace("\n", " ")
					if "apple" or "Apple" in str(file):
						appledate = re.search('enquiry dated (.*), we can confirm', text)
						applecompanysource = re.search(' shares of (.*) registered in', text)
						applecompany = str(applecompanysource).split('PLC')[0] + " PLC"
						appledesignation = re.search('registered in the name of (.*)Voting', text1line)
						print(statestreetdate)
						print(statestreetcompany)
						print(statestreetdesignation)
						f.write(str(statestreetdate) + ",")
					elif "banana" or "Banana" in str(file):
						bananadate = re.search('- As at (.*)', text)
						print(bananadate)
						f.write(str(bananadate) + ",")
					elif "cherry" or "Cherry" in str(file):
						cherrydate = re.search('DATE : (.*)', text)
						print(cherrydate)
						f.write(str(cherrydate) + ",")

					else:
						continue
				else:
					continue

					

		pdfDir = "C:/Users/thisisme/Desktop/DataAquisition/"
		txtDir = "C:/Users/thisisme/Desktop/DataAquisition/"
		convertMultiple(pdfDir, txtDir)

		
	else:
		continue

	
Quote
#2
Your problem is in line 12 (or at least, that's the first error I see). You need to have an expression on each side of the or (see here for details)
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures

Quote
#3
(Mar-05-2019, 03:13 PM)CaptainCsaba Wrote: if "apple" or "banana" or "orange" or "Orange" or "pear" or "cherry" or "Cherry" or "Apple" or "Banana" or "Pear" in str(file):
read https://python-forum.io/Thread-Multiple-...or-keyword
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  I'm getting a wrong answer don't know where the bug is 357mag 4 195 Jul-07-2019, 11:21 PM
Last Post: DeaD_EyE
  What am I doing wrong? New to Python Stthmc_1995 6 509 Jul-02-2019, 02:24 PM
Last Post: Stthmc_1995
  looping and indentation issue ameydiwanji 3 145 Jul-01-2019, 10:53 AM
Last Post: perfringo
  Wrong output in Visual Studio Code py_learner 1 140 Jun-24-2019, 10:02 PM
Last Post: Yoriz
  Mysterious Indentation Problem Dakodak 11 554 Jun-24-2019, 06:23 AM
Last Post: Dakodak
  Python: why skip the 'else' if password is wrong Max_988 1 156 Jun-20-2019, 12:19 AM
Last Post: woooee
  elevator simulator...whats the wrong at this code? tasos710 5 624 Jun-11-2019, 01:38 AM
Last Post: micseydel
  Syntax Error : I can't identify what's wrong! caarsonr 11 484 Jun-10-2019, 11:18 PM
Last Post: Yoriz
  Getting error message for indentation Shafla 5 218 May-07-2019, 08:56 PM
Last Post: Yoriz
  How "continue" in another indentation? CaptainCsaba 6 302 May-07-2019, 10:38 AM
Last Post: avorane

Forum Jump:


Users browsing this thread: 1 Guest(s)