Python Forum
How to update values in a pyarrow table?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to update values in a pyarrow table?
#1
I have a python script that reads in a parquet file using pyarrow. I'm trying to loop through the table to update values in it. If I try this:

for col_name in table2.column_names:
    if col_name in my_columns:
        print('updating values in column '  + col_name)
        
        col_data = pa.Table.column(table2, col_name)
        
        row_ct = 1
        for i in col_data:
            pa.Table.column(table2, col_name)[row_ct] = change_str(pa.StringScalar.as_py(i))
            row_ct += 1
I get this error:

Error:
TypeError: 'pyarrow.lib.ChunkedArray' object does not support item assignment
How can I update these values?

I tried using pandas, but it couldn't handle null values in the original table, and it also incorrectly translated the datatypes of the columns in the original table. Does pyarrow have a native way to edit the data?

Python 3.7.3
Debian 10
Reply
#2
I was able to get it working using these references:

http://arrow.apache.org/docs/python/gene...Table.html

http://arrow.apache.org/docs/python/gene...Field.html

https://github.com/apache/arrow/blob/mas...t_table.py

Basically it loops through the original table and creates new columns (pa.array) with the adjusted text that it appends to a new table. It's probably not the best way to do it, but it worked. Most importantly, it let me preserve the nulls and specify the data type of each column.

import sys, getopt
import random
import re
import math

import pyarrow.parquet as pq
import pyarrow.csv as pcsv
import numpy as np
import pandas as pd
import pyarrow as pa
import os.path

<a lot of other code here>

changed_ct = 0
all_cols_ct = 0
table3 = pa.Table.from_arrays([pa.array(range(0,863))], names=('0')) # CREATE TEMP COLUMN!!
#print(table3)
#exit()
changed_column_list = []
for col_name in table2.column_names:
    print('processing column: ' + col_name)
    new_list = []
    col_data = pa.Table.column(table2, col_name)
    col_data_type = table2.schema.field(col_name).type
    printed_changed_flag = False
    for i in col_data:
        # GET STRING REPRESENTATION OF THE COLUMN DATA
        if(col_data_type == 'string'):
            col_str = pa.StringScalar.as_py(i)
        elif(col_data_type == 'int32'):
            col_str = pa.Int32Scalar.as_py(i)
        elif(col_data_type == 'int64'):
            col_str = pa.Int64Scalar.as_py(i)
            
            
        if col_name in change_columns:
            if printed_changed_flag == False:
                print('changing values in column '  + col_name)
                changed_column_list.append(col_name)
                changed_ct += 1
                printed_changed_flag = True

            new_list.append(change_str(col_str))
        
        else:
            new_list.append(col_str)
        
    #set data type for the column
    if(col_data_type == 'string'):
        col_data_type = pa.string()
    elif(col_data_type == 'int32'):
        col_data_type = pa.int32()
    elif(col_data_type == 'int64'):
        col_data_type = pa.int64()
        
    arr = pa.array(new_list, type=col_data_type)
        
    new_field = pa.field(col_name, col_data_type)
    
    table3 = pa.Table.append_column(table3, new_field, arr)
        
    all_cols_ct += 1
    
#for i in table3:
#   print(i)

table3 = pa.Table.remove_column(table3, 0) # REMOVE TEMP COLUMN!!
#print(table2)
#print('-------------------')
#print(table3)
#exit()

print('changed ' + str(changed_ct) + ' columns:')
print(*changed_column_list, sep='\n')

# WRITE NEW PARQUET FILE
pa.parquet.write_table(table3, out_file)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  OSError: Unable to load libjvm when connecting to hdfs with pyarrow 3.0.0 aupres 0 131 Mar-22-2021, 10:25 AM
Last Post: aupres
Question Python + Google Sheet | Best way to update specific cells in a single Update()? Vokofe 1 286 Dec-16-2020, 05:26 AM
Last Post: Vokofe
  pyarrow throws oserror winerror 193 1 is not a valid win32 application aupres 2 785 Oct-21-2020, 01:04 AM
Last Post: aupres
  Unable to Update SQLite Table sambanerjee 5 543 Sep-30-2020, 12:21 PM
Last Post: Larz60+
  Looking for help in Parse multiple XMLs and update key node values and generate Out.. rajesh3383 0 337 Sep-15-2020, 01:42 PM
Last Post: rajesh3383
  Fetch Oracle DB rows & print it in HTML file with table's col headers in table format tssr_2001 1 665 Sep-04-2020, 01:39 PM
Last Post: ibreeden
  Mysql CREATE TABLE IF NOT EXISTS dynamic table name nisusavi 0 515 Apr-29-2020, 06:45 PM
Last Post: nisusavi
  MYSQL how to assign a table for each user in my username table database? YoshikageKira 1 999 Dec-26-2019, 05:57 AM
Last Post: buran
  update dict as per range of values anna 7 894 Sep-13-2019, 04:37 PM
Last Post: anna
  update values in list based on dictionary bunti 3 1,423 Jun-10-2019, 07:26 AM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020