Python Forum

I have an array of nodes and edges like this (in reality, there are over 200,000 edges):

[101_123,12312_2345,346_234,45676_345,234_457,568_234,234_346,23297823_23834734]

for example in this case, 101_123 means that there are two nodes, called 101 and 123, that have an interaction/edge between them. I would like to randomise this network using an edge swap approach, while maintaining the topology of the network (i.e. the degree of the node after randomisation should be the same before and after randomisation). I cannot do this with networkx, as the network is too big/the randomisation takes too long, so I must code it myself.

The code I have is like this:

import random
import sys
## First, make a list as described above. Put the smaller number first in all cases (is this necessary?)
list_of_edges = []
for line in open(sys.argv[1]):
    line = line.strip().split()
    if int(line[0]) < int(line[1]):
        edge = line[0] + "_" + line[1]
    elif int(line[0]) > int(line[1]):
        edge = line[1] + "_" + line[0]
    list_of_edges.append(edge)

#The aim: 5 times, take two random edges from the array (e.g. A_B and C_D) and swap them (e.g. A_C and B_D). update the list to include the new swap and then use this as the base list for the next iteration (so this time A_C and B_D could become e.g. B_C and A_D for example). Once the iteration has been done 5 times, print the final randomised network.

number_of_times = len(list_of_edges)
for i in range(5): #This number will be changed, it is just for testing
    random_edge1 = random.choice(list_of_edges)
    random_edge2 = random.choice(list_of_edges)
    list_of_edges = []
    split_random_edge1 = random_edge1.split("_")
    split_random_edge2 = random_edge2.split("_")
    random.shuffle(split_random_edge1)
    random.shuffle(split_random_edge2)
    new_list1 = [split_random_edge1[0],split_random_edge2[0]]
    new_list2 = [split_random_edge1[1],split_random_edge2[1]]
    if int(new_list1[0]) < (new_list1[1]):
        edge1 = new_list1[0] + "_" + new_list1[1]
    elif int(new_list1[0]) > int(new_list1[1]):
        edge1 = new_list1[1] + "_" + new_list1[0]
    list_of_edges.append(edge1)
    if int(new_list2[0]) < (new_list2[1]):
       edge2 = new_list2[0] + "_" + new_list2[1]
    elif int(new_list2[0]) > int(new_list2[1]):
       edge2 = new_list2[1] + "_" + new_list2[0]
    list_of_edges.append(edge2)
    print list_of_edges

print "*"
print list_of_edges

As you can see, I'm stuck because with each iteration, I do not update the list to the newest set of edges. Also, the print statement at the end does not print out the correct randomised network. If someone could demonstrate what I should have done/where I have gone wrong, I would appreciate it.

I see some problems here. For example, when you pick two edges, you could be picking the same edge.

Rather than try to mess with the list in place, I think it would be simpler to just build a new list. Shuffle the list of edges. Then each iteration, pop two edges off the list, swap the nodes, and append them both to a new list.

Thank you.

I've taken your advice on board. My algorithm now makes a list, shuffles the list. Pairs each two elements in the list (to stop the item being paired with itself); swaps them, adds all the new pairs to a new list....and repeats X amount of times. I do have one question though, this code will only work if there is an even number of items in the list (since every item is paired with next item). I'm wondering your opinion on how I could improve this to account for an uneven list length.

Example, when I have a list of interactions like this (and list is uneven in length):

The script:

import sys
import random

pair_lambda = lambda x: each_pair[x].split("_")
new_list_lambda = lambda x: pair1[x] + "_"+ pair2[x]

list_of_edges = [line.strip().split()[0] + "_" + line.strip().split()[1] for line in open(sys.argv[1])]
random.shuffle(list_of_edges)

for i in range(4):
   random.shuffle(list_of_edges)
   paired_list = zip(list_of_edges,list_of_edges[1:])[::2]
   new_list = []
   for each_pair in paired_list:
       pair1,pair2 = pair_lambda(0),pair_lambda(1)
       new_list1,new_list2 = new_list_lambda(0),new_list_lambda(1)
       new_list.append(new_list1)
       new_list.append(new_list2)
   list_of_edges = new_list
   print list_of_edges
   print "*"
print list_of_edges

The output:

['5_7', '6_8', '9_13', '10_14', '1_3', '2_4']
*
['10_5', '14_7', '9_2', '13_4', '6_1', '8_3']
*
['6_8', '1_3', '9_13', '2_4', '14_10', '7_5']
*
['7_1', '5_3', '9_14', '13_10', '2_6', '4_8']
*
['7_1', '5_3', '9_14', '13_10', '2_6', '4_8']

Removes one of the pairs from the analysis? I'm wondering how I could optimise this script. I thought about adding an if/else statement, saying if list is odd in length, just keep the last element in, so each iteration will have one interaction that was not randomised?

I would do your zip differently:

paired_lists = zip(list_of_edges[::2], list_of_edges[1::2])

Mainly because the above version works in Python 3.x, but whatever.

As to odd length lists of edges, I would just have an if clause at the end of the inner loop, and swap the extra item with the last item in the swapped list:

if len(list_of_edges) % 2:
    pair1 = pair_lambda(list_of_edges[-1])
    pair2 = pair_lambda(new_list.pop())
    # swap pairs and append to new_list

I wouldn't use lambdas for your functions, I would define actual functions. Lambdas are for one off functions, which yours are not. Also, since you are doing the swapping in the inner loop and in the if clause, I would make a function for that.

This works, my main concern is that it may not generate all possible reorderings with the same probability. Of course, if your list of edges has 200,000 items, that's probably not possible without being able to generate true random numbers.

Tom1988

ichabod801

Tom1988

ichabod801