Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Very basic help - python & Excel
#1
Hi All,

I have an excel file with two columns: table and number.

All I want to do is use Python to load an excel file, loop through each row and write the row to a new file as many times as the respective number.

For example the excel file will have (in reality is's much bigger of course):
apples, 1
oranges, 3
grapefruit, 1
banana, 2

The new file will look like this:
apples (1)
oranges (1)
oranges (2)
oranges (3)
grapefruit (1)
bananas (1)
bananas (2)

I've spent 3 hours solid trying to find a way to do this and not even got close to a solution. I've been able to use pandas to read in the data and print it but I can't get my head around how to do the above. I'm now confused to which plugin/tool I should even use.

Going to be honest - I'm teaching myself Python and I am a beginner. I'd usually say I learn really quick but this proving tough. While I appreciate it takes time to learn the basics I do need to get this work done (ideally not manually).

I've got no code of any value to add other printing the two columns from a panda data frame which I'm not sure is even the correct way to go.

Not looking for someone to do my work for me but I could really do with a kind pointer or two to get me going.

As note I'm running Python 3 on Windows 10 installed using Anaconda.

I'm writing my code in Notepad++ and executing in Jupyter QTConsole.

Many thanks
Richard
Quote
#2
look at pandas:
https://pandas.pydata.org/pandas-docs/st...rials.html
short tutorial: https://www.learnpython.org/en/Pandas_Basics
Quote
#3
Thanks for the input Larz60+

Still can't seem to find reference in those links that help.

I have now started to get somewhere with the following:

# Import `load_workbook` module from `openpyxl`
from openpyxl import load_workbook

# Load in the workbook
wb = load_workbook('mystuff.xlsx')

# Get sheet names
print(wb.sheetnames)

#activate a sheet into array
sheet = wb['Sheet1']

#print value of A1
print(sheet['A1'].value)

for row in sheet.iter_rows():
	for cell in row:
		#if cell.value in row > '1':
		#print(cell.value)
		print(cell.value)
Having trouble applying a condition inside the for loop.

Getting closer.
Quote
#4
if cell.value in row > '1':
this checks for value greater than string '1'
id this what you want, or is it integer (no quotes)
Quote
#5
pandas is a great package - but it's not trivial, and it's recommended to learn Python some before you dive into pandas. openpyxl documentation have some good examples, as far as I've seen.

Again, without understanding at least looping in Python, you will be lost. This video is highly recommended for beginners
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Quote
#6
Thanks for the help and pointers - much appreciated. I am still working through through the datacamp lessons but it takes time.

In parallel I'm now starting to get somewhere with my project but failing to convert the row array value into an integer:

# select all populated rows and iterate through		
for row in sheet.iter_rows(min_col=1, min_row=1, max_col=2, max_row=sheet.max_row):
	# occur = int(row[1].value) 
	if row[1].value == 1:
		pass
	else:
		print(row[0].value)		
Quote
#7
You can read Excel sheets using pandas with the help of xlrd module
First, install xlrd:
pip install xlrd
Then you can read from Excel like this:
pandas.read_excel('YourExcelFile.xlsx', 'sheet1')
Check this tutorial for more about pandas and how to get data from Excel easily.
Hope that helps.
Regards,
Quote

Top Page

Forum Jump:


Users browsing this thread: 1 Guest(s)