Python Forum
Class CryptoCSV: scrapes a given crypto-currency's daily historical data 2013-present
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Class CryptoCSV: scrapes a given crypto-currency's daily historical data 2013-present
#1
also have a github repo... here's the code:

#! /usr/bin/python3
# Scrape coinmarketcap's historical cryptocurrency datasets.
# Write date, open, high, low, close, volume, & market capacity
# to .csv in current directory. ex: bitcoin20181120.csv
# Data for a given currency will be written to the .csv file
# from April 28, 2013 to the current date
# usage: python3 crypto_csv_writer.py -c currency-name
# ./crypto_csv_writer.py -c bitcoin
#
import requests
import argparse
from datetime import datetime
from sys import exit
import re
import csv


class CryptoCSV:
    def __init__(self, currency_type):
        self.today = datetime.now().strftime("%Y%m%d")
        self.currency = currency_type
        self.matches = []
        self.__build_url()

    def __build_url(self):
        self.url = "https://coinmarketcap.com/currencies/"
        self.url += self.currency
        self.url += "/historical-data/?start=20130428&end="
        self.url += self.today

    def __search_regex(self):
        req = requests.get(self.url)
        reg_str = "<td.*\w+.*</td>"
        self.matches = re.findall(reg_str, req.text)

    def __get_data(self):
        csv_row = []
        for td_tag in self.matches:
            regex = ">(.*)<"
            match = re.search(regex, td_tag)
            if match:
                if not match.group(1)[:3].isalpha():
                    csv_row.append(match.group(1))
                else:
                    copy = csv_row[:]
                    csv_row = [match.group(1)]
                    yield copy

    def create_csv(self):
        self.__search_regex()
        meta_data = [
            "date", "open", "high", "low",
            "close", "volume", "mkt_cap"
        ]
        file_name = self.currency + self.today + ".csv"
        with open(file_name, "w") as csv_file:
            writer = csv.writer(csv_file, delimiter='\t')
            writer.writerow(meta_data)
            for line in self.__get_data():
                writer.writerow(line)


if __name__ == "__main__":
    valid_currencies = [
        "bitcoin", "litecoin", "ripple", "ethereum",
        "bitcoin-cash", "reddcoin", "stellar", "eos",
        "cardano", "monero", "tron", "iota", "dash",
        "factom", "nem", "neo", "ethereum-classic",
        "tezos", "zcash", "bitcoin-gold", "ark", "vechain",
        "ontology", "dogecoin", "decred", "qtum", "lisk",
        "bytecoin", "bitcoin-diamond", "bytecoin", "icon",
        "bitshares", "nano", "digibyte", "siacoin", "steem",
        "bytom", "waves", "metaverse", "verge", "stratis",
        "electroneum", "komodo", "cryptonex", "ardor",
        "wanchain", "monacoin", "moac", "pivx", "horizen",
        "ravencoin", "gxchain", "huobi-token"
    ]
    message = 'Enter a currency type'
    parser = argparse.ArgumentParser(description='-c currency name argument')
    parser.add_argument('-c', '--currency', required=True, help=message)
    currency_arg = vars(parser.parse_args())
    if currency_arg['currency'] not in valid_currencies:
        print("Please enter a valid cryptocurrency...")
        exit()
    CryptoCSV(currency_arg['currency']).create_csv()
Reply
#2
Nicely done (especially your README), but the code is a bit hard to read!

If you feel like taking it a step further, here are some recommendations for cleaning up:
  • Use an HTML parser - this will make data extraction code much simpler and clearer than using re
  • Try to avoid __private names (looks ugly and mangles the name), it's a (pretty strong) convention to use a single leading underscore for private variables
  • Let the library (requests in your case) handle the arguments in the url instead of concatenating strings yourself
  • Regarding argparse:
    • I'd use description for something more like "A script that does this:"; a usage message is already provided by argparse
    • Don't call vars(), use the dot syntax (i.e. currency_arg.currency) to access the arguments as intended.
    • You don't need the error handling code, .parse_args() will fail with a helpful message if a required argument is missing.
Another thing I noticed is you're only calling one method of your class and not really using state in it.
You might want to consider just using a few functions instead (but it's mostly a preference).
Reply
#3
Thank you :D
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Meteostat - Historical Weather and Climate Data clampr 1 3,694 May-25-2021, 04:32 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020