Python Forum

Full Version: Numpy array
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Please excuse my ignorance, just trying to learn. This is my first attempt at writing a program myself. I'm self taught through courses on the web. I'm very green at programming, so I apologize in advance if I'm asking a stupid or easy question. People here seem very knowledgeable so I thought this was the place to ask.

I have a 2d array with 5 ints per row (ints 1 to 43)
d5 = np.array
I want to find out if any 2 of the ints from line2 d5[1][] (10 different permutations of 2 of 5) are in line 1 d5[0][]
I have tried soo many attempts at this. Can't even remember what I tried anymore. Tried breaking down into tuples, arrays, lists....and different variations. I 'll try to post some code here for someone to check. Guess I just don't know what I'm doing. BUT, I want to learn. If anyone could point me in the proper direction, it would be much appreciated!!

line_no = 1
a = 0
lay_1 = 1
while line_no <= draws_reporting:
    b = lay_1
    two_counter = 0
    draw_two = d5[a]
    # [[d5[a][0], d5[a][1]], [d5[a][0], d5[a][2]], [d5[a][0], d5[a][3]], [d5[a][0], d5[a][4]],
    #             [d5[a][1], d5[a][2]], [d5[a][1], d5[a][3]], [d5[a][1], d5[a][4]], [d5[a][2], d5[a][3]],
    #             [d5[a][2], d5[a][4]], [d5[a][3], d5[a][4]]]
    for row in d5:
        two_1 = d5[b][0], d5[b][1]
        # two_1 = ((d5[b][0], d5[b][1]), (d5[b][0], d5[b][2]), (d5[b][0], d5[b][3]), (d5[b][0], d5[b][4])
        #         , (d5[b][1], d5[b][2]), (d5[b][1], d5[b][3]), (d5[b][1], d5[b][4]), (d5[b][2], d5[b][3])
        #         , (d5[b][2], d5[b][4]), (d5[b][3], d5[b][4]))
        if two_1 not in draw_two:
            b += 1
            two_counter += 1
        else:
            break
    print(line_no, two_counter)
    line_no += 1
    a += 1
    lay_1 += 1
It would be nice with an example, a complete "d5" and the expected result.
(Jan-11-2021, 10:48 PM)BrianPA Wrote: [ -> ]I have a 2d array with 5 ints per row (ints 1 to 43)

Is it typo? Shouldn't there be 45 instead of 43? I believe that there is no way that numpy can make shape from 43 elements which has one dimension 5 (or any shape other that 1D for that matter as 43 is prime).
Sorry, maybe I didn't explain that properly.
It is a 2d array
Only 5 integers per line
The integers can only range from 1 to 43 (In a sorted manner)
There are a few million lines(rows) in the 2d array.
Here is an example:
4 11 20 35 36
1 16 18 21 42
10 14 27 29 32
**No number can be lower than 1 or higher than 43.**

Hope that explains it better. Sorry for the confusion.

Thanks for the replies and the interest!!
Now when this is clear I second Serafim's opinion: some sample rows and explanation what you want to be done and then sample result. As far as current description goes I don't understand what should be accomplished and how expected result should look like.
I've not understood what you would like to do (I agree with Serafim), but I feel that numpy.intersect1d may help you
I will try to explain at length what I'm trying to do and also with examples:

Here is a sample of the D5 2d numpy array. Each row has 5 integers, and the integers are from 1 to 43 inclusive:
3 8 21 31 37
15 32 34 38 40
3 12 20 26 37
6 7 15 21 30
8 10 14 27 31
10 15 26 37 41
26 29 32 36 39
6 9 11 13 35
15 18 30 31 37
15 17 24 26 34
2 3 23 28 35
1 3 13 40 43
3 4 23 29 30
7 12 22 23 33
2 5 7 30 40

Here are some variables:
a = 0
b = 1
I want to find out IF two numbers from row #2: d5[b] which is ( 15 32 34 38 40 ) are in row #1 d5[a] which is ( 3 8 21 31 37 )
Now, there are 10 different permutations of 2 integers per row...eg:
d5[b] = 15 32 34 38 40
permutations are as follows:
15 32
15 34
15 38
15 40
32 34
32 38
32 40
34 38
34 40
38 40
If none of these are in d5[a], which is 3 8 21 31 37
then b +=1 (advance to row #3 and check if any of those permutations are in d5[a]......continue until a match is found.
Then when a match is found, the starting values for a and b are a = 1 b = 2 and the iteration starts again.

Here should be the output for 4 iterations of a
1
7
2
4
Here is also a sample of code that doesn't work. But you may get a better understanding of what I'm trying to do.

line_no = 1
draws_reporting = 4
a = 0
layer_start = 1
while line_no <= draws_reporting:
    b = layer_start
    two_counter = 0
    draw_two = [d5[a][0], d5[a][1], d5[a][2], d5[a][3], d5[a][4]]
    for row in d5:
        two_1 = [[d5[b][0], d5[b][1]],   # These are the permutations
                 [d5[b][0], d5[b][2]],
                 [d5[b][0], d5[b][3]],
                 [d5[b][0], d5[b][4]],
                 [d5[b][1], d5[b][2]],
                 [d5[b][1], d5[b][3]],
                 [d5[b][1], d5[b][4]],
                 [d5[b][2], d5[b][3]],
                 [d5[b][2], d5[b][4]],
                 [d5[b][3], d5[b][4]]]
        if two_1[0] in draw_two or two_1[1] in draw_two or two_1[2] in draw_two or \
                two_1[3] in draw_two or two_1[4] in draw_two or two_1[5] in draw_two or \
                two_1[6] in draw_two or two_1[7] in draw_two or two_1[8] in draw_two or \
                two_1[9] in draw_two:
            break
        else:
            b += 1
            two_counter += 1
    print(two_counter)
    line_no += 1
    a += 1
    layer_start += 1
numpy, numpy... can't live with 'em, can't live without 'em. There are so many ways to do stuff in numpy that this is just overwhelming:

>>> np.
Display all 600 possibilities? (y or n)
I admit that the problem is still mystery to me but based on my assumptions I would define my own little problem and try to solve it:

- I have numpy array of integers in shape of 15,5
- I want to find how many rows have two or more integers in common to row at index 1

So I try to divide it into subproblems and there are plenty:

- how to compare 1d arrays (rows)
- how to find number of common elements
- how to apply comparison to all rows
- how to count rows which have two or more common elements

Following is based on sample array:

import numpy as np

arr = np.array([
                [3, 8, 21, 31, 37],
                [15, 32, 34, 38, 40],
                [3, 12, 20, 26, 37],
                [6, 7, 15, 21, 30],
                [8, 10, 14, 27, 31],
                [10, 15, 26, 37, 41],
                [26, 29, 32, 36, 39],
                [6, 9, 11, 13, 35],
                [15, 18, 30, 31, 37],
                [15, 17, 24, 26, 34],
                [2, 3, 23, 28, 35],
                [1, 3, 13, 40, 43],
                [3, 4, 23, 29, 30],
                [7, 12, 22, 23, 33],
                [2, 5, 7, 30, 40]
                ])
How to compare 1d arrays?

As paul18fr already suggested - there is intersec1d. How can one use it?

>>> np.intersect1d(arr[0], arr[1])
array([], dtype=int64)
>>> np.intersect1d(arr[9], arr[1])
array([15, 34])
Problem solved - we know how to find common integers.

How to find number of common elements?

numpy has handy .size for that. Building on solution of previous problem:

>>> np.intersect1d(arr[0], arr[1]).size
0
>>> np.intersect1d(arr[9], arr[1]).size
2


Problem solved - we know how to find number of common integers. However, if we think about it - we actually don't want number of common integers - we need to know whether condition is met or not i.e is <= 2 common integers. So we can adjust this:

>>> 2 <= np.intersect1d(arr[0], arr[1]).size
False
>>> 2 <= np.intersect1d(arr[9], arr[1]).size
True
How to apply it to all rows

This is nice and all but how to apply this to every row in array? There is apply_along_axis which can be used. However, in order to do so we need function which will be applied to all rows. As we have solution which we want to apply to every row we can write such a function with no effort at all:

def at_least_two_common(row, test_array):
    return 2 <= np.intersect1d(row, test_array).size
Now we apply this function to all rows:

>>> np.apply_along_axis(at_least_two_common, 1, arr, arr[1])
[False  True False False False False False False False  True False False
 False False False]
We observe that there are two Trues - row itself and actual match. So in order to get correct result we should deduct row itself.

How to count rows which have two or more common elements

We have array of Trues and Falses and we need to count Trues. What we do? We count nonzeros:

>>> np.count_nonzero(np.apply_along_axis(at_least_two_common, 1, arr, arr[1]))
2
As we observed earlier this result includes row itself, so little adjustment (-1) must be made and whole solutions will look like (of course numpy must be imported and arr initialized too):

def at_least_two_common(row, test_array):
    return 2 <= np.intersect1d(row, test_array).size

print(np.count_nonzero(np.apply_along_axis(at_least_two_common, 1, arr, arr[1])) - 1)
This might or might not be helpful to solve your problem. As I already mentioned - numpy can be overwhelming and if I provided just those three lines it wouldn't be that helpful. But knowing the steps stitched together in last line it should be pretty obvious what is going on and maybe gives some ideas how to approach your problem (I am sure that this little problem I had can be solved in gazillion other ways using numpy-s built-in methods).
THANKS Perfringo!!!!
You have given me some great options to investigate. The np.intersect looks intriguing. I will definitely have to look into that option. I did figure out how to accomplish the task at hand, and it is A LOT of written code. I will post it below. Just yesterday I learned on how to time the execution of code. I'm a self taught newbie. I was happy with the timing results of the code. I did the timing for 100 results and ran it 10 times. The average of the 10 runs was: 0.0156 seconds. As you mentioned: "There are probably a gazillion ways of going about this solution", isn't that the truth! Since this is my first project that I'm working on that is not involved with the online courses. I wanted to get code that works first. Then I will spend time on the efficiency of my code. I have 18 parts of my code, and there are 4 parts that I'm not happy with the timings. I really appreciate the time you spent helping me.

Here is the code that I got to work and produces the correct results. Please keep in mind how new I am to programming.

import numpy as np

d5 = np.genfromtxt('d5.txt', dtype=np.int32)
line_no = 1
draws_reporting = 100
a = 0
layer_start = 1
while line_no <= draws_reporting:
    b = layer_start
    two_counter = 0
    draw_two = ([d5[a][0], d5[a][1]],
                [d5[a][0], d5[a][2]],
                [d5[a][0], d5[a][3]],
                [d5[a][0], d5[a][4]],
                [d5[a][1], d5[a][2]],
                [d5[a][1], d5[a][3]],
                [d5[a][1], d5[a][4]],
                [d5[a][2], d5[a][3]],
                [d5[a][2], d5[a][4]],
                [d5[a][3], d5[a][4]])
    for two in d5:
        two_1 = ([d5[b][0], d5[b][1]],
                 [d5[b][0], d5[b][2]],
                 [d5[b][0], d5[b][3]],
                 [d5[b][0], d5[b][4]],
                 [d5[b][1], d5[b][2]],
                 [d5[b][1], d5[b][3]],
                 [d5[b][1], d5[b][4]],
                 [d5[b][2], d5[b][3]],
                 [d5[b][2], d5[b][4]],
                 [d5[b][3], d5[b][4]])
        if two_1[0] == draw_two[0] or two_1[1] == draw_two[0] or two_1[2] == draw_two[0] or \
                two_1[3] == draw_two[0] or two_1[4] == draw_two[0] or two_1[5] == draw_two[0] or \
                two_1[6] == draw_two[0] or two_1[7] == draw_two[0] or two_1[8] == draw_two[0] or \
                two_1[9] == draw_two[0]:
            break
        elif two_1[0] == draw_two[1] or two_1[1] == draw_two[1] or two_1[2] == draw_two[1] or \
                two_1[3] == draw_two[1] or two_1[4] == draw_two[1] or two_1[5] == draw_two[1] or \
                two_1[6] == draw_two[1] or two_1[7] == draw_two[1] or two_1[8] == draw_two[1] or \
                two_1[9] == draw_two[1]:
            break
        elif two_1[0] == draw_two[2] or two_1[1] == draw_two[2] or two_1[2] == draw_two[2] or \
                two_1[3] == draw_two[2] or two_1[4] == draw_two[2] or two_1[5] == draw_two[2] or \
                two_1[6] == draw_two[2] or two_1[7] == draw_two[2] or two_1[8] == draw_two[2] or \
                two_1[9] == draw_two[2]:
            break
        elif two_1[0] == draw_two[3] or two_1[1] == draw_two[3] or two_1[2] == draw_two[3] or \
                two_1[3] == draw_two[3] or two_1[4] == draw_two[3] or two_1[5] == draw_two[3] or \
                two_1[6] == draw_two[3] or two_1[7] == draw_two[3] or two_1[8] == draw_two[3] or \
                two_1[9] == draw_two[3]:
            break
        elif two_1[0] == draw_two[4] or two_1[1] == draw_two[4] or two_1[2] == draw_two[4] or \
                two_1[3] == draw_two[4] or two_1[4] == draw_two[4] or two_1[5] == draw_two[4] or \
                two_1[6] == draw_two[4] or two_1[7] == draw_two[4] or two_1[8] == draw_two[4] or \
                two_1[9] == draw_two[4]:
            break
        elif two_1[0] == draw_two[5] or two_1[1] == draw_two[5] or two_1[2] == draw_two[5] or \
                two_1[3] == draw_two[5] or two_1[4] == draw_two[5] or two_1[5] == draw_two[5] or \
                two_1[6] == draw_two[5] or two_1[7] == draw_two[5] or two_1[8] == draw_two[5] or \
                two_1[9] == draw_two[5]:
            break
        elif two_1[0] == draw_two[6] or two_1[1] == draw_two[6] or two_1[2] == draw_two[6] or \
                two_1[3] == draw_two[6] or two_1[4] == draw_two[6] or two_1[5] == draw_two[6] or \
                two_1[6] == draw_two[6] or two_1[7] == draw_two[6] or two_1[8] == draw_two[6] or \
                two_1[9] == draw_two[6]:
            break
        elif two_1[0] == draw_two[7] or two_1[1] == draw_two[7] or two_1[2] == draw_two[7] or \
                two_1[3] == draw_two[7] or two_1[4] == draw_two[7] or two_1[5] == draw_two[7] or \
                two_1[6] == draw_two[7] or two_1[7] == draw_two[7] or two_1[8] == draw_two[7] or \
                two_1[9] == draw_two[7]:
            break
        elif two_1[0] == draw_two[8] or two_1[1] == draw_two[8] or two_1[2] == draw_two[8] or \
                two_1[3] == draw_two[8] or two_1[4] == draw_two[8] or two_1[5] == draw_two[8] or \
                two_1[6] == draw_two[8] or two_1[7] == draw_two[8] or two_1[8] == draw_two[8] or \
                two_1[9] == draw_two[8]:
            break
        elif two_1[0] == draw_two[9] or two_1[1] == draw_two[9] or two_1[2] == draw_two[9] or \
                two_1[3] == draw_two[9] or two_1[4] == draw_two[9] or two_1[5] == draw_two[9] or \
                two_1[6] == draw_two[9] or two_1[7] == draw_two[9] or two_1[8] == draw_two[9] or \
                two_1[9] == draw_two[9]:
            break
        else:
            b += 1
            two_counter += 1
    print(two_counter)
    line_no += 1
    a += 1
    layer_start += 1
Here is shortened version of your code. I use itertools.combinations to find all combinations of numbers in the array rows and then I build a list of truth values for each comparison of value pairs. I also used the example array to verify the result. But I get an error if "draws_reporting" is greater than 4 (but then of course I haven't really understood the problem, I just thought the code was too long)

import numpy as np
from itertools import combinations

d5 = np.array([[3, 8, 21, 31, 37],
      [15, 32, 34, 38, 40],
      [3, 12, 20, 26, 37],
      [6, 7, 15, 21, 30],
      [8, 10, 14, 27, 31],
      [10, 15, 26, 37, 41],
      [26, 29, 32, 36, 39],
      [6, 9, 11, 13, 35],
      [15, 18, 30, 31, 37],
      [15, 17, 24, 26, 34],
      [2, 3, 23, 28, 35],
      [1, 3, 13, 40, 43],
      [3, 4, 23, 29, 30],
      [7, 12, 22, 23, 33],
      [2, 5, 7, 30, 40]])
line_no = 1
draws_reporting = 4
a = 0
layer_start = 1
while line_no <= draws_reporting:
    b = layer_start
    two_counter = 0
    draw_two = list(combinations(d5[a], 2))
    two_counter = 0
    for two in d5:
        two_1 = list(combinations(d5[b], 2))
        found = False
        for draw in draw_two:
            if any([x == draw for x in two_1]):
                found = True
                break
        if not found:
            b += 1
            two_counter += 1
    print(two_counter)
    line_no += 1
    a += 1
    layer_start += 1
A run gives the values:
1
7
2
4
If I raise the number of "draws_reporting" I get the error:
Error:
Traceback (most recent call last): File "/home/serafim/dicts.py", line 30, in <module> two_1 = list(combinations(d5[b], 2)) IndexError: index 15 is out of bounds for axis 0 with size 15
Maybe, someone more familiar with numpy arrays can explain the error.

EDIT: It took a while to realize that the error has nothing to do with numpy arrays but rather that the short example array is rapidly exhausted in the loop where i calculate the "two_1" combinations.
Pages: 1 2