Aug-14-2017, 06:05 AM
So, I'm trying to reformat a text from Shakespear's plays so that, every monologue is only one line long.
For example, this is what I am given:
"<STAGEDIR>Exeunt MARK ANTONY and CLEOPATRA with
their train</STAGEDIR>
<SPEECH>
<SPEAKER>DEMETRIUS</SPEAKER>
<LINE>Is Caesar with Antonius prized so slight?</LINE>
</SPEECH>
<SPEECH>
<SPEAKER>PHILO</SPEAKER>
<LINE>Sir, sometimes, when he is not Antony,</LINE>
<LINE>He comes too short of that great property</LINE>
<LINE>Which still should go with Antony.</LINE>
</SPEECH>
<SPEECH>
<SPEAKER>DEMETRIUS</SPEAKER>..."
And this is what I want:
"Is Caesar with Antonius prized so slight?
Sir, sometimes, when he is not Antony, he comes too short of that great property which still should go with Antony.
..."
So far I have been able to get this far:
"Is Caesar with Antonius prized so slight?
Sir, sometimes, when he is not Antony,
He comes too short of that great property
Which still should go with Antony.
..."
And here is my code thus far...
For example, this is what I am given:
"<STAGEDIR>Exeunt MARK ANTONY and CLEOPATRA with
their train</STAGEDIR>
<SPEECH>
<SPEAKER>DEMETRIUS</SPEAKER>
<LINE>Is Caesar with Antonius prized so slight?</LINE>
</SPEECH>
<SPEECH>
<SPEAKER>PHILO</SPEAKER>
<LINE>Sir, sometimes, when he is not Antony,</LINE>
<LINE>He comes too short of that great property</LINE>
<LINE>Which still should go with Antony.</LINE>
</SPEECH>
<SPEECH>
<SPEAKER>DEMETRIUS</SPEAKER>..."
And this is what I want:
"Is Caesar with Antonius prized so slight?
Sir, sometimes, when he is not Antony, he comes too short of that great property which still should go with Antony.
..."
So far I have been able to get this far:
"Is Caesar with Antonius prized so slight?
Sir, sometimes, when he is not Antony,
He comes too short of that great property
Which still should go with Antony.
..."
And here is my code thus far...
import re ss = ["a_and_c.txt", "dream.txt", "hamlet.txt", "j_caesar.txt", "macbeth.txt", "merchant.txt", "othello.txt", "r_and_j.txt"] size = len(ss) f = open( "/Users/Tuck/Documents/PyCharm_PythonPrograms/ChatBot_Test/corpus/shakespeare/ShakespearsDialogsPreProcessed.txt", 'w') print f i = 0 while (i < 8): print ss[i] path = "/Users/Tuck/Documents/PyCharm_PythonPrograms/ChatBot_Test/corpus/shakespeare/" + ss[i] print ("working on: " + path) p = open(path, 'r') for line in p: if "</SPEECH>" in line: f.write('\n') print line if "<LINE>" in line: line = line.split("<LINE>") line = "".join(str(x) for x in line) _line = re.sub(r"</LINE>", " ", line) f.write(_line) else: print "moving to next line!" i += 1Any thoughts/suggestions?