Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Text formatting question
#1
Hello,
I regularly have to copy text from pdf files, which i save as a doc file as my notes to review later. Of course, the formatting is all lost, so i correct it manually. I typically keep my notes bulleted, as it looks neat.
I was learning python these holidays and discovered that i could probably write some code to automate this process. I was wondering if there is any way to add real bullets to the output document so that the text is arranged the way it is in MS Word when you add a bullet?

PS: I do not know if there already is such a tool available, if there is, please let me know.
Reply
#2
This is what i could come up with..
It does the job, but having real bullets would make a lot of difference.
Any suggestions are welcome.

""" Programme to format text copied from pdfs into bulleted lines.
	Assumes a badly formatted text file
	Outputs a formatted text file (sentences are continuous, full-stops are used as break-points, each sentence receives the bullet "-->")
	Please replace all "etc." with "etc" or "etc," to avoid splitting of the sentence at etc. 
"""

def tinyStr (read_from):
	"For small bullets, so that they are not all put in the same line later on."
	read_from = open(read_from, 'r')	
	a = open('x.txt', 'w')
	for line in read_from:
		line_split = line.split()
		if len(line_split) < 4:
			line = line[:-1] + '.'+'\n'
		a.write(line)
	a.close()
	b = open('x.txt' , 'r')		
	concat(b)

def concat (read_from):
	global write_file
	"Converting entire document into a one biggg string"
	concat_string = ''
	for line in read_from:
		if line[-1] == '\n':
			line = line[:-1] + ' '
		concat_string = concat_string + line	
	edits (concat_string, write_file)

def edits (concat_string, write_to):
	"Adds all the final edits (splitting sentences, bullets and full stops)"	
	list_of_sentences = concat_string.split('.')	
	list_of_sentences = list_of_sentences [:-1]
	open_write_to = open(write_to, 'w')		
	for sentence in list_of_sentences:
		open_write_to.write("--> " + sentence + '.' + '\n')
	open_write_to.close()
	write_two = open(write_to, 'r')
		

read_file = raw_input("Enter the file to read from: ")
write_file = raw_input("Enter the file to write to: ")
 
tinyStr(read_file)

I just found that if i just paste full sentences into word and apply bullets after ctrl+A, it does it :D

Making it automatic would be nice too :)
Reply
#3
check available packages: https://pypi.python.org/pypi?%3Aaction=s...mit=search
this looks promising: https://pypi.python.org/pypi/PDFTron-PDF...Python/5.7
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  beginner text formatting single line to column jafrost 4 3,234 Apr-28-2021, 07:03 PM
Last Post: jafrost
  Importing text file into excel spreadsheet with formatting david_dsmn 1 3,634 Apr-05-2021, 10:21 PM
Last Post: david_dsmn
  Question on HTML formatting with set string in message Cknutson575 3 3,509 Mar-09-2021, 08:11 AM
Last Post: Cknutson575
  ChatterBot: How to store unanswered question in a text file? animrehrm 0 1,924 May-16-2020, 06:00 AM
Last Post: animrehrm
  formatting question Jonininireland 1 2,152 Nov-17-2018, 02:53 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020