May-31-2018, 09:04 AM
Hi there,
If got a little problem with my code as I did not code python before but want to or have to do it for my research project.
I want to crawl a website for a set of data. My research project is to gather data from their website and put in in excel.
Here is my code so far:
1. I need to find out if every single ICO (sub pages) has a whitepaper or not. As it's an onclick field I don't know how to search for it and see if there is a whitepaper or not.
2. The export of the data to a csv file (excel): The print looks kinda messy atm. Some parts are in lines orther in columns etc. As you might guess I need to make a beautiful chart with each ICO in a seperate line and the different elements in different columns to be able to use R or some other program to do the statistics.
I would be glad for any help!! Thanks a lot in advance for any support.
aStudent (in urgent need of help)
If got a little problem with my code as I did not code python before but want to or have to do it for my research project.
I want to crawl a website for a set of data. My research project is to gather data from their website and put in in excel.
Here is my code so far:
import requests from bs4 import BeautifulSoup # Erstellen eines Crawlers fuer die Seite icobench.com der die jeweiligen Links (Unterseiten) aller beendeten ICOs zum # aktuellen Zeitpunkt aufruft und deren Titel ausgibt. def ended_ico_spider(max_pages): page = 1 while page <= max_pages: url = "https://icobench.com/icos?&filterBonus=&filterBounty=&filterMvp=&filterTeam=&filterExpert=&" \ "filterSort=&filterCategory=all&filterRating=any&filterStatus=ended&filterPublished=&" \ "filterCountry=any&filterRegistration=0&filterExcludeArea=none&filterPlatform=any&filterCurrency=any&" \ "filterTrading=any&s=&filterStartAfter=&filterEndBefore=0&page= " + str(page) source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "lxml") for link in soup.findAll('a', {'class': 'name'}): href = "https://icobench.com/" + link.get('href') title = link.string print (title) get_single_ico_rating(href) get_single_ico_fixed_data(href) get_single_ico_financial_token_info(href) get_single_ico_financial_investment_info(href) # get_single_ico_whitepaper(href) page += 1 # Abrufen der einzelnen Datenbloecke, der jeweiligen Unterseite. Felder wurden entsprechend des HTML-Codes benannt. def get_single_ico_rating(single_item_url): source_code = requests.get(single_item_url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "lxml") # Daten aus dem Wertungsfeld for data in soup.findAll('div', {'class': ['rate color1', 'rate color2', 'rate color3', 'rate color4', 'rate color5', 'col_4 col_3']}): print(data.text), def get_single_ico_fixed_data(single_item_url): source_code = requests.get(single_item_url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "lxml") for fixed_data in soup.findAll('div', {'class': 'col_2'}): print(fixed_data.text) def get_single_ico_financial_token_info(single_item_url): source_code = requests.get(single_item_url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "lxml") for financial_token_info in soup.findAll('div', {'class': 'box_left'}): print(financial_token_info.text) def get_single_ico_financial_investment_info(single_item_url): source_code = requests.get(single_item_url) plain_text = source_code.text soup = BeautifulSoup(plain_text, "lxml") for investment_info in soup.findAll('div', {'class': 'box_right'}): print(investment_info.text) # Ich moechte hier herausfinden, ob ein whitepaper auf der jeweiligen Unterseite vorhanden ist oder nicht. Falls eins # vorhanden ist kann ein Wert X zurueckgegeben werden, ansonsten ein Wert Y. # def get_single_ico_whitepaper(href): # source_code = requests.get(href) # plain_text = source_code.text # soup = BeautifulSoup(plain_text, "lxml") # for whitepaper_link in soup.findAll('div', {'class': 'onclick'}): # print(whitepaper_link.text) ended_ico_spider(1)Well and there are some parts missing and I would be glad for any help you could offer me. Here are the missing points I couldn't solve even though I search the web for hours (guess I'm just a noob in python )
1. I need to find out if every single ICO (sub pages) has a whitepaper or not. As it's an onclick field I don't know how to search for it and see if there is a whitepaper or not.
2. The export of the data to a csv file (excel): The print looks kinda messy atm. Some parts are in lines orther in columns etc. As you might guess I need to make a beautiful chart with each ICO in a seperate line and the different elements in different columns to be able to use R or some other program to do the statistics.
I would be glad for any help!! Thanks a lot in advance for any support.
aStudent (in urgent need of help)