Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python Challenge ~ Help
#4
Quote:one HTML like below with 100k entries
This data is not HTML as presented, perhaps browser output of HTML?
If it is the result of an HTML file being displayed in a browser, you need to supply the actual HTML file,
or supply the URL of one that can be used for testing.

If it is indeed an HTML file, you're going to need to learn about parsing HTML with BeautifulSoup
The best way to do this is use the web scraping tutorials written by snippsat
part 1 here
part 2 here

Even if this is an internal html file, the BeautifulSoup parts of these tutorials apply.

As far as reading the file, There are fasta files available on line,
you can try a simple read on it like:
filename = input('Enter file name: ')
count = 0
maxcount = 20
with open filename as f:
   for line in f.readlines():
       print(line)
   count += 1
   if count > maxcount:
       break
my guess is that these are newline separated text records.
the first being a sequence id ('>seq0')
and any additional the sequence itself (if this is multi-line, it will show up in above snippet)
this will simply read the file line by line, and quit after maxcount lines,
and be used as the basis for other operations.

It's all doable with relative ease, just need all the ingredients up front.
Reply


Messages In This Thread
Python Challenge ~ Help - by Takshan - Jul-07-2017, 03:32 AM
RE: Python Challenge ~ Help - by Larz60+ - Jul-07-2017, 04:01 AM
RE: Python Challenge ~ Help - by Takshan - Jul-07-2017, 04:09 AM
RE: Python Challenge ~ Help - by Larz60+ - Jul-07-2017, 10:33 AM
RE: Python Challenge ~ Help - by Takshan - Jul-07-2017, 11:01 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020