Hello there!
I'm having a problem with copying a .csv file into a .xlsx file. It does work because the information is there but the numbers do not register as numbers but as text values which causes the formulas in my .xlsx file to not work.
Keep in mind that I'm trying to copy my .csv sheet into a specific sheet in my .xlsx file, hence the sheet name. Here is my function:
import pandas as pd
import openpyxl
def write_to_target(xlsx_file, csv_file, sheet_name):
book = openpyxl.load_workbook(xlsx)
sheet_book = pd.read_csv(csv_file)
with pd.ExcelWriter(xlsx_file, engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
sheet_book.to_excel(writer, sheet_name=sheet_name, startrow=0, startcol=0, engine = 'openpyxl')
Thanks in advance!
There must be something in your csv file that is confusing the reader. Do you have multi-row column headers in the csv file? Do you have ragged columns (not all the same length). Do you have column footers?
After reading the csv file print some of the data. Start with first row/first column and last row/last column. Is there any text where you expect to see a number? If so you might have to give the reader some extra help, tell it to skip some rows, or tell it what the data type is for a column
(Mar-28-2020, 05:44 AM)deanhystad Wrote: [ -> ]There must be something in your csv file that is confusing the reader. Do you have multi-row column headers in the csv file? Do you have ragged columns (not all the same length). Do you have column footers?
After reading the csv file print some of the data. Start with first row/first column and last row/last column. Is there any text where you expect to see a number? If so you might have to give the reader some extra help, tell it to skip some rows, or tell it what the data type is for a column
Thanks for responding! But I don't understand what you mean. The .csv files are financial statements from Morningstar.com. The excel sheets are recieved from [url=("http://financials.morningstar.com/balance-sheet/bs.html?t="+ticker+"®ion=usa&culture=en-US")]][/url]
For example, you can get microsoft's balance sheet at
http://financials.morningstar.com/balanc...ture=en-US
I basically I want the .csv files to be pasted into a sheet in an existing xlsx file. But since it pastes as text and not numbers, the formulas that reference it in other excel files don't work.
When reading from csv you will get str, unless you do conversion. You don't show how you read the file, but you can do the conversion either when reading or when writing to excel
(Mar-28-2020, 06:51 AM)buran Wrote: [ -> ]When reading from csv you will get str, unless you do conversion. You don't show how you read the file, but you can do the conversion either when reading or when writing to excel
So would I have to convert the .csv to an .xlsx file or other form? is there a file conversion that can keep the formating and the number values of the .csv file?
can you show how your csv file data looks like? I overlooked that you read the csv file in the function. pandas should detect the type correctly.
I ask because the file you refer to - MSFT balance sheet is not well suited for direct reading with pandas.read_csv
Try to replace
sheet_book = pd.read_csv(csv_file)
with
sheet_book = pd.read_csv(csv_file, skiprows=1)
this way you will skip the first line
MICROSOFT CORP (MSFT) CashFlowFlag BALANCE SHEET
and will get a proper dataframe
Update: I figured it out
import sys
import openpyxl
import csv
import glob
from openpyxl.utils import get_column_letter
def write_to_target(target, directory, sheet_name):
book = openpyxl.load_workbook(target)
ws = book.get_sheet_by_name(sheet_name)
f = open(directory)
reader = csv.reader(f)
for row_index, row in enumerate(reader):
for column_index, cell in enumerate(row):
column_letter = get_column_letter((column_index + 1))
s = cell
try:
s=float(s)
except ValueError:
pass
ws.cell('%s%s'%(column_letter, (row_index + 1))).value = s
Just replace target with the .xlsx file and directory with the .csv file
Thanks to everyone who've responded to this post
Assuming the xlsx file exists and there is sheet with sheet_name in it your original code would have worked with the small suggestion I made:
import pandas as pd
import openpyxl
def write_to_target(xlsx_file, csv_file, sheet_name):
book = openpyxl.load_workbook(xlsx_file)
sheet_book = pd.read_csv(csv_file, skiprows=1)
with pd.ExcelWriter(xlsx_file, engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
sheet_book.to_excel(writer, sheet_name=sheet_name, startrow=0, startcol=0, engine = 'openpyxl')
[
attachment=809]