Python Forum
Working with dataframes
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Working with dataframes
#1
i 3 "datasets":
Output:
genes_mouse_human = data frame where one of the columns is "gene symbol_human" and another is "gene symbol mouse" data = data frame where one of the columns is "gene symbol" genes_list_mouse = a series of gene names (i created it from filtering another data frame with some codition)
what I'm trying to do now is:
Output:
run through genes_list_mouse for each item - 1) find the gene symbol mouse, and find the relevant human symbol 2) find the row in "data" that holds this gene 3) do some calculations on this row
currently I'm having troubles completing 1 and 2.
this is what i tried to do:

for i in genes_list_mouse:
    gene_human = genes_mouse_human [genes_mouse_human ['Symbol_mouse']==i]['Symbol_human'].astype("string")
    ind = data.where(data['gene symbol']==gene_human.values)
i get the following error:
Error:
ValueError: ('Lengths must match to compare', (20530,), (1,))
can you please help me understand why it's not working and what is the best way to do these simple steps?
thanks!!!!
Reply
#2
Hi!
I have 3 "datasets":
genes_mouse_human = data frame where one of the columns is "gene symbol_human" and another is "gene symbol mouse"
data = data frame where one of the columns is "gene symbol"
genes_list_mouse = a series of gene names (i created it from filtering another data frame with some codition)

what I'm trying to do now is:
run through genes_list_mouse
for each item -
1) find the gene symbol mouse, and find the relevant human symbol
2) find the row in "data" that holds this gene
3) do some calculations on this row

currently I'm having troubles completing 1 and 2.
this is what i tried to do:

for i in genes_list_mouse:
gene_human = genes_mouse_human [genes_mouse_human ['Symbol_mouse']==i]['Symbol_human'].astype("string")
ind = data.where(data['gene symbol']==gene_human.values)

i get the following error:
ValueError: ('Lengths must match to compare', (20530,), (1,))

can you please help me understand why it's not working and what is the best way to do these simple steps?
thanks!!!!
Reply
#3
Line 2 seems particularly problematic.
First, remember that whitespace matters in Python. So, tighten up the code, get rid of the space between genes_mouse_human and the opening bracket in both places. However, that may not be the problem.
Recognize that where you have == the expression will resolve to True or False. I do not think that is what you want.
Look at the Pandas functions .loc and .iloc I think those will help you with finding values in your dataframe properly.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Merging two DataFrames based on indexes from two other DataFrames lucinda_rigeitti 0 1,756 Jan-16-2020, 08:36 PM
Last Post: lucinda_rigeitti

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020