Python Forum
Issues with csv double quotes
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Issues with csv double quotes
#1
Hi guys, I made a Bs4 code for export a csv file with numbers but can't get rid of double quotes...
It doesn't let me manipulate the data later when I try to make any calculation...


Bit Coin Data Exported:
import requests
import bs4
import pandas as pd
import csv

from bs4 import BeautifulSoup as bs


dateList = []
openList = []
highList = []
lowList  = []
closeList= []
volumeList= []
MCapList = []

r = requests.get('https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20130428&end=20200315')

soup = bs(r.text,'lxml')

soup.find('tr', {'class': 'cmc-table-row'}).find('td', {'class' : 'cmc-table__cell cmc-table__cell--sticky cmc-table__cell--left'}).text

tr = soup.findAll('tr', {'class': 'cmc-table-row'})

for item in tr:
    dateList.append(item.find('td', {'class' : 'cmc-table__cell cmc-table__cell--sticky cmc-table__cell--left'}).text)
    openList.append(item.find_all('td')[1].text)
    highList.append(item.find_all('td')[2].text)
    lowList.append(item.find_all('td')[3].text)
    closeList.append(item.find_all('td')[4].text)
    volumeList.append(item.find_all('td')[5].text)
    MCapList.append(item.find_all('td')[6].text)

row0 =['Dates', 'Open', 'High', 'Low', 'Close', 'Volume', 'Market Capitalization' ]
    
rows = zip(dateList, openList, highList, lowList, closeList, volumeList, MCapList)

with open('bitcoinHistoricalPrice.csv', 'w', encoding='utf-8', newline='') as csvfile:
    links_writer = csv.writer(csvfile)
    links_writer.writerow(row0)
    for row in rows:
        links_writer.writerow(row)
        
# dfTable = pd.DataFrame({'Dates': dateList,'Open':openList ,'High':highList, 'Low': lowList, 'Close':closeList, 'Volume':volumeList, 'Market Capitalization': MCapList})
Trying to manipulate the data:

%matplotlib inline
import pandas as pd
import matplotlib as plt
import numpy as np
plt.rcParams['figure.figsize'] = (20.0 , 10.0)

#Read the data
data = pd.read_csv('bitcoinHistoricalPrice.csv')
print(data.shape)
data.head()

# Can't use the values because it's in string format

#Collect X and Y
X = data['Low'].values
Y = data['High'].values

#Mean X and Y
mean_x = np.mean(X)
mean_y = np.mean(Y)
Reply
#2
Pandas can read table from website with pd.read_html,then no need to do parsing.
Example.
>>> import pandas as pd
...
... df = pd.read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20130428&end=20200315")
... df = df[2]

>>> df.head(10)
           Date    Open*     High      Low  Close**       Volume    Market Cap
0  Mar 16, 2020  5385.23  5385.23  4575.36  5014.48  45368026430   91633478850
1  Mar 15, 2020  5201.07  5836.65  5169.28  5392.31  33997889639   98530059890
2  Mar 14, 2020  5573.08  5625.23  5125.07  5200.37  36154506008   95014981944
3  Mar 13, 2020  5017.83  5838.11  4106.98  5563.71  74156772075  101644613038
4  Mar 12, 2020  7913.62  7929.12  4860.35  4970.79  53980357243   90804613601
5  Mar 11, 2020  7910.09  7950.81  7642.81  7911.43  38682762605  144508402671
6  Mar 10, 2020  7922.15  8136.95  7814.76  7909.73  42213940994  144465567734
7  Mar 09, 2020  8111.15  8177.79  7690.10  7923.64  46936995808  144706353758
8  Mar 08, 2020  8908.21  8914.34  8105.25  8108.12  39973102121  148060284561
9  Mar 07, 2020  9121.60  9163.22  8890.74  8909.95  36216930370  162684945903


>>> df.dtypes
Date           object
Open*         float64
High          float64
Low           float64
Close**       float64
Volume          int64
Market Cap      int64
dtype: object


>>> df['High'].max()
20089.0 
Find all types correct except Date,so fix for date would be.
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df.dtypes
Date          datetime64[ns]
Open*                float64
High                 float64
Low                  float64
Close**              float64
Volume                 int64
Market Cap             int64
dtype: object
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020