May-26-2019, 05:24 AM
Maybe you can help me with this. Using requests and BeautifulSoup I can get this text:
So, I just need to get the file names from this and I have what I want.
Can I do this with Regex??
Quote:>>> soup = BeautifulSoup(requests.get(file_url).text)
>>> soup
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /php/uploads</title>
</head>
<body>
<h1>Index of /php/uploads</h1>
<ul><li><a href="/php/"> Parent Directory</a></li>
<li><a href="chineseYearAnimals.txt"> chineseYearAnimals.txt</a></li>
<li><a href="chineseYearAnimals_gapWords.xlsx"> chineseYearAnimals_gapWords.xlsx</a></li>
<li><a href="chineseYearAnimals_gapWords.xlsx.data"> chineseYearAnimals_gapWords.xlsx.data</a></li>
<li><a href="chineseYearAnimals_gapped.txt"> chineseYearAnimals_gapped.txt</a></li>
<li><a href="cloze2.txt"> cloze2.txt</a></li>
<li><a href="cloze3.txt"> cloze3.txt</a></li>
<li><a href="cloze4HiddenRules.txt"> cloze4HiddenRules.txt</a></li>
<li><a href="cloze4HiddenRules.txtnoPrepos"> cloze4HiddenRules.txtnoPrepos</a></li>
<li><a href="cloze4HiddenRules_again.txt"> cloze4HiddenRules_again.txt</a></li>
<li><a href="cloze4HiddenRules_gapped.txt"> cloze4HiddenRules_gapped.txt</a></li>
<li><a href="cloze4HiddenRules_gapped.txt.data"> cloze4HiddenRules_gapped.txt.data</a></li>
</ul>
</body></html>
>>>
So, I just need to get the file names from this and I have what I want.
Can I do this with Regex??