Python Forum
[Numpy] How to store different data type in one numpy array?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Numpy] How to store different data type in one numpy array?
#1
I want to store different data type in on numpy array,
b = np.array([['2024-03-22', 71.0, 'ceh'], ['2024-03-23', 63.0, 'abc']])
and specific dtype likes:
[['datetime64[D]', 'float64', 'string'], ['datetime64[D]', 'float64', 'string']]
how to define that?
Reply
#2
Look at:

https://numpy.org/doc/stable/reference/g...array.html
https://numpy.org/doc/stable/user/basics.rec.html
Reply
#3
Example like this.
>>> import numpy as np
>>>
>>> dtype = [('date', 'datetime64[D]'), ('value', 'float64'), ('code', 'U3')]
>>> n = np.array([('2024-03-22', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abc')], dtype=dtype)
>>> n
array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')],
      dtype=[('date', '<M8[D]'), ('value', '<f8'), ('code', '<U3')])
String of up to 3 characters <U3.
>>> n = np.array([('2024-03-22', 71.0, 'cehar'), ('2024-03-23', 63.0, 'abchhhhhhhhhh')], dtype=dtype)
>>> n
array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')],
      dtype=[('date', '<M8[D]'), ('value', '<f8'), ('code', '<U3')])
To Pandas dataFrame,as Panda i build on NumPy in bottom it seamlessly transfer over.
>>> import pandas as pd
>>> 
>>> df = pd.DataFrame(n)
>>> df
        date  value code
0 2024-03-22   71.0  ceh
1 2024-03-23   63.0  abc
Reply
#4
Different data type seems just can be store into tuple then as array element in one array, can't store as standalone array element directly in one array.
Reply
#5
It's one array if add it like this.
>>> n = np.array([('2024-03-22', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abc')], dtype= [('date', 'datetime64[D]'), ('value', 'float64'), ('code', 'U3')]) 
>>> n
array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')],
      dtype=[('date', '<M8[D]'), ('value', '<f8'), ('code', '<U3')])
Can make it shorter like this,read doc .
>>> n = np.array([('2024-03-22', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abcyyyyyy')], dtype='datetime64[D], float64, U3')
>>> n
array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')],
      dtype=[('f0', '<M8[D]'), ('f1', '<f8'), ('f2', '<U3')])
Work the same if add date that's wrong will get error message.
>>> n = np.array([('2024-03-2299', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abcyyyyyy')], dtype='datetime64[D], float64, U3')
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
ValueError: Error parsing datetime string "2024-03-2299" at position 10
Reply
#6
What are you planning to do with the array?
Reply
#7
agree with @deanhystad

For instance, if the goal is to recover data per type, I would imagine the following if you can use 2 arrays (possible?):

import numpy as np

Array1 = np.array([['2024-03-22', 71.0, 'ceh'], 
                   ['2024-03-23', 63.0, 'abc'],
                   ['2024-03-24', -50.6, 'zzzzzzz'],
                   ['2024-03-25', 13.8, 'lkj'],
                   ['2024-03-26', 05.2, 'dsfdssss'],
                   [935.2, 'hgjhg', '2024-03-27']                   
                   ])

TypeArray = np.array([['datetime64[D]', 'float64', 'string'],
                      ['datetime64[D]', 'float64', 'string'],
                      ['datetime64[D]', 'float64', 'string'],
                      ['datetime64[D]', 'float64', 'string'],
                      ['datetime64[D]', 'float64', 'string'],
                      ['float64', 'string', 'datetime64[D]']   # !!!!!!!!!!!!!!                  
                     ])


NumberOfTypes = np.unique(TypeArray)


# results are stored in a dictionary PER type but you can proceed differently
RecoveringDictionary = {}

for ntype in NumberOfTypes:
    Index = np.where(TypeArray == ntype)
    Extract = Array1[Index]
    
    if ntype == 'float64': Extract = Extract.astype(np.float64)
    # if ntype == 'datetime64[D]': Extract = Extract.astype(np.datetime64)   
    
    RecoveringDictionary.update({ ntype: Extract, })

    
# print results
for ntype in NumberOfTypes:
    print(f"{ntype} = {RecoveringDictionary[ntype]}\n")
Output:
datetime64[D] = ['2024-03-22' '2024-03-23' '2024-03-24' '2024-03-25' '2024-03-26' '2024-03-27'] float64 = [ 71. 63. -50.6 13.8 5.2 935.2] string = ['ceh' 'abc' 'zzzzzzz' 'lkj' 'dsfdssss' 'hgjhg']
Reply
#8
As deanhystad posted more info may be needed.
paul18fr good effort,but would say that look wrong in most cases.
The TypeArray dos not work(eg try with a wrong date) and repeat data unnecessary.

(Mar-23-2024, 08:55 PM)water Wrote: and specific dtype likes:
In first post he ask about specify dtype in a NumPy array.
Then we talk about Structured arrays.
To give one more example on how Structured arrays works
import numpy as np

# Sample data: Transaction ID, Date, Amount, Transaction Type
data = [
    (1001, '2023-01-01', 250.00, 'Deposit'),
    (1002, '2023-01-03', -100.00, 'Withdrawal'),
    (1003, '2023-01-05', 200.00, 'Deposit'),
    (1004, '2023-01-07', -50.00, 'Withdrawal'),
    (1005, '2023-01-09', 300.00, 'Deposit'),
]

# Define the dtype for the structured array
dtype = [
    ('trans_id', 'int32'),
    ('date', 'datetime64[D]'),
    ('amount', 'float64'),
    ('type', 'U10')  # Transaction type with up to 10 characters
]

transactions = np.array(data, dtype=dtype)
Structured arrays are particularly useful in scenarios where working with tabular data that mixes different data types,
and where want to perform efficient, vectorized operations on this data.

Take a look at data manipulation,this would not be possible if not specify dtype.
# Get all dates
 >>> transactions['date']
array(['2023-01-01', '2023-01-03', '2023-01-05', '2023-01-07',
       '2023-01-09'], dtype='datetime64[D]')

# Find all withdrawals
>>> withdrawals = transactions[transactions['type'] == 'Withdrawal']
>>> withdrawals
array([(1002, '2023-01-03', -100., 'Withdrawal'),
       (1004, '2023-01-07',  -50., 'Withdrawal')],
      dtype=[('trans_id', '<i4'), ('date', '<M8[D]'), ('amount', '<f8'), ('type', '<U10')])

# Calculate the total amount of deposits
>>> total_deposits = transactions[transactions['type'] == 'Deposit']['amount'].sum()
>>> total_deposits
750.0
Operations that are easily vectorized, staying within NumPy can be faster and more memory efficient.
Structured arrays can also easily be taken into Pandas if need more advanced stuff like grouping, Plot...
import pandas as pd

df = pd.DataFrame(transactions)
print(df)
Output:
trans_id date amount type 0 1001 2023-01-01 250.0 Deposit 1 1002 2023-01-03 -100.0 Withdrawal 2 1003 2023-01-05 200.0 Deposit 3 1004 2023-01-07 -50.0 Withdrawal 4 1005 2023-01-09 300.0 Deposit
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  TypeError: '>' not supported between instances of 'numpy.str_' and 'int' Anouar 0 84 Yesterday, 09:34 AM
Last Post: Anouar
  Bitwise Operations in numpy Sowmya 3 282 Apr-03-2024, 02:51 PM
Last Post: deanhystad
  [Numpy] Load date/time from .txt to 'datetime64' type. water 4 631 Mar-01-2024, 11:16 PM
Last Post: Gribouillis
  numpy.ufunc - Arguments missunderstand MarioBruza 0 820 Jan-11-2023, 05:03 AM
Last Post: MarioBruza
  reshaping 2D numpy array paul18fr 3 1,023 Jan-03-2023, 06:45 PM
Last Post: paul18fr
  Pandas dataframes and numpy arrays bytecrunch 1 1,341 Oct-11-2022, 08:08 PM
Last Post: Larz60+
  Numpy returns "TypeError: unsupported operand type(s) for *: 'numpy.ufunc' and 'int'" kalle 2 2,642 Jul-19-2022, 06:31 AM
Last Post: paul18fr
Question about Numpy indexing. water 1 1,469 Jan-18-2022, 09:52 PM
Last Post: paul18fr
  numpy masking/filtering nilamo 3 3,520 Dec-04-2021, 10:28 PM
Last Post: nilamo
  Data Science - "key of type tuple not found and not a MultiIndex" priyanshuaggarwal 0 5,237 Nov-07-2021, 11:22 PM
Last Post: priyanshuaggarwal

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020