Posts: 14
Threads: 5
Joined: Dec 2019
Hello there!
I'm having a problem with copying a .csv file into a .xlsx file. It does work because the information is there but the numbers do not register as numbers but as text values which causes the formulas in my .xlsx file to not work.
Keep in mind that I'm trying to copy my .csv sheet into a specific sheet in my .xlsx file, hence the sheet name. Here is my function:
import pandas as pd
import openpyxl
def write_to_target(xlsx_file, csv_file, sheet_name):
book = openpyxl.load_workbook(xlsx)
sheet_book = pd.read_csv(csv_file)
with pd.ExcelWriter(xlsx_file, engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
sheet_book.to_excel(writer, sheet_name=sheet_name, startrow=0, startcol=0, engine = 'openpyxl') Thanks in advance!
Posts: 6,799
Threads: 20
Joined: Feb 2020
There must be something in your csv file that is confusing the reader. Do you have multi-row column headers in the csv file? Do you have ragged columns (not all the same length). Do you have column footers?
After reading the csv file print some of the data. Start with first row/first column and last row/last column. Is there any text where you expect to see a number? If so you might have to give the reader some extra help, tell it to skip some rows, or tell it what the data type is for a column
Posts: 14
Threads: 5
Joined: Dec 2019
(Mar-28-2020, 05:44 AM)deanhystad Wrote: There must be something in your csv file that is confusing the reader. Do you have multi-row column headers in the csv file? Do you have ragged columns (not all the same length). Do you have column footers?
After reading the csv file print some of the data. Start with first row/first column and last row/last column. Is there any text where you expect to see a number? If so you might have to give the reader some extra help, tell it to skip some rows, or tell it what the data type is for a column
Thanks for responding! But I don't understand what you mean. The .csv files are financial statements from Morningstar.com. The excel sheets are recieved from [url=("http://financials.morningstar.com/balance-sheet/bs.html?t="+ticker+"®ion=usa&culture=en-US")]][/url]
For example, you can get microsoft's balance sheet at http://financials.morningstar.com/balanc...ture=en-US
I basically I want the .csv files to be pasted into a sheet in an existing xlsx file. But since it pastes as text and not numbers, the formulas that reference it in other excel files don't work.
Posts: 8,160
Threads: 160
Joined: Sep 2016
When reading from csv you will get str, unless you do conversion. You don't show how you read the file, but you can do the conversion either when reading or when writing to excel
Posts: 14
Threads: 5
Joined: Dec 2019
(Mar-28-2020, 06:51 AM)buran Wrote: When reading from csv you will get str, unless you do conversion. You don't show how you read the file, but you can do the conversion either when reading or when writing to excel
So would I have to convert the .csv to an .xlsx file or other form? is there a file conversion that can keep the formating and the number values of the .csv file?
Posts: 8,160
Threads: 160
Joined: Sep 2016
Mar-28-2020, 08:24 AM
(This post was last modified: Mar-28-2020, 08:24 AM by buran.)
can you show how your csv file data looks like? I overlooked that you read the csv file in the function. pandas should detect the type correctly.
I ask because the file you refer to - MSFT balance sheet is not well suited for direct reading with pandas.read_csv
Try to replace
sheet_book = pd.read_csv(csv_file) with
sheet_book = pd.read_csv(csv_file, skiprows=1) this way you will skip the first line MICROSOFT CORP (MSFT) CashFlowFlag BALANCE SHEET and will get a proper dataframe
Posts: 14
Threads: 5
Joined: Dec 2019
Mar-28-2020, 09:38 AM
(This post was last modified: Mar-28-2020, 09:39 AM by YoshikageKira.)
Update: I figured it out
import sys
import openpyxl
import csv
import glob
from openpyxl.utils import get_column_letter
def write_to_target(target, directory, sheet_name):
book = openpyxl.load_workbook(target)
ws = book.get_sheet_by_name(sheet_name)
f = open(directory)
reader = csv.reader(f)
for row_index, row in enumerate(reader):
for column_index, cell in enumerate(row):
column_letter = get_column_letter((column_index + 1))
s = cell
try:
s=float(s)
except ValueError:
pass
ws.cell('%s%s'%(column_letter, (row_index + 1))).value = s Just replace target with the .xlsx file and directory with the .csv file
Thanks to everyone who've responded to this post
Posts: 8,160
Threads: 160
Joined: Sep 2016
Mar-28-2020, 10:38 AM
(This post was last modified: Mar-28-2020, 10:38 AM by buran.)
Assuming the xlsx file exists and there is sheet with sheet_name in it your original code would have worked with the small suggestion I made:
import pandas as pd
import openpyxl
def write_to_target(xlsx_file, csv_file, sheet_name):
book = openpyxl.load_workbook(xlsx_file)
sheet_book = pd.read_csv(csv_file, skiprows=1)
with pd.ExcelWriter(xlsx_file, engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
sheet_book.to_excel(writer, sheet_name=sheet_name, startrow=0, startcol=0, engine = 'openpyxl')
MSFT.xlsx (Size: 7.5 KB / Downloads: 0)
|