Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Could anyone help me get the jaccard distance between my dataframes please? :)
Hi there, I'm doing a piece of homework and I'm almost done, but the last part has me pretty stumped. I have four files to work with, and I have to get n top values from each file, n being a user defined number, and then calculate the jaccard distance between each pair of files. So far I've done everything except the jaccard distance.

The jaccard distance is defined as (the number of unique values across the pair of files - the number of values that occur in both files) / the number of unique values across the pair of files again.

So, for example if I had ten values between the two files, but two of them occur in both files, it would be (8-2)/8.

I have made four separate dataframes out of the four files, each one only showing n top values. This is where I don't really know how to proceed. I have tried merging the dataframes into one big dataframe with only the relevant columns (the K-mers A-D columns) but that didn't work out for me. I don't know how to retrieve the number of unique or reoccurring values, especially when the dataframes are separated, so I'm quite stuck. I'd really appreciate any help :)

Here's my code:

import pandas as pd
import numpy as np

requested_path_A = input('Enter file path for species A:')
requested_path_B = input('Enter file path for species B:')
requested_path_C = input('Enter file path for species C:')
requested_path_D = input('Enter file path for species D:')


Enter file path for species A:H:\Bioinformatics_Resit\Species_A.txt
Enter file path for species B:H:\Bioinformatics_Resit\Species_B.txt
Enter file path for species C:H:\Bioinformatics_Resit\Species_C.txt
Enter file path for species D:H:\Bioinformatics_Resit\Species_D.txt
n = int(input('Enter your chosen value for n:'))

 Enter your chosen value for n:7 
df1 = pd.read_csv(requested_path_A, sep='\s+', names=["K-mers A", "Count A",])
df2 = pd.read_csv(requested_path_B, sep='\s+', names=["K-mers B", "Count B",])
df3 = pd.read_csv(requested_path_C, sep='\s+', names=["K-mers C", "Count C",])
df4 = pd.read_csv(requested_path_D, sep='\s+', names=["K-mers D", "Count D",])

df1=df1.nlargest(n, ['Count A'])
df2=df2.nlargest(n, ['Count B'])
df3=df3.nlargest(n, ['Count C'])
df4=df4.nlargest(n, ['Count D'])
Edit: I forgot how forums work and didn't realise my comment on a previous thread would bump it up :/

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  append dataframes in loop ghena 1 86 Feb-17-2020, 08:43 PM
Last Post: jefsummers
  Concatenate/Join/Merge two Dataframes karlito 4 193 Jan-21-2020, 12:36 PM
Last Post: karlito
  Merging two DataFrames based on indexes from two other DataFrames lucinda_rigeitti 0 97 Jan-16-2020, 08:36 PM
Last Post: lucinda_rigeitti
  Compare between 2 DataFrames Nidhesh 2 340 Jul-26-2019, 08:16 AM
Last Post: Nidhesh
  Giving index when joining dataframes kw42chan 1 568 Jul-06-2019, 06:19 AM
Last Post: kw42chan
  Two dataframes merged Ecniv 10 943 Jun-16-2019, 09:10 PM
Last Post: Ecniv
  Statistical analysis of two dataframes zhl 1 646 Jun-11-2019, 07:26 PM
Last Post: Ecniv
  Interpolate using multiple dataframes Lastwizzle 0 386 May-29-2019, 05:32 PM
Last Post: Lastwizzle
  Why can't I merge pandas dataframes learnpython2018 2 1,556 Sep-23-2018, 05:53 PM
Last Post: learnpython2018

Forum Jump:

Users browsing this thread: 1 Guest(s)