How to update values in a pyarrow table?

xraphael75 · (This post was last modified: Jan-25-2021, 02:14 PM by xraphael75.)

I was able to get it working using these references:

http://arrow.apache.org/docs/python/gene...Table.html

http://arrow.apache.org/docs/python/gene...Field.html

https://github.com/apache/arrow/blob/mas...t_table.py

Basically it loops through the original table and creates new columns (pa.array) with the adjusted text that it appends to a new table. It's probably not the best way to do it, but it worked. Most importantly, it let me preserve the nulls and specify the data type of each column.

import sys, getopt
import random
import re
import math

import pyarrow.parquet as pq
import pyarrow.csv as pcsv
import numpy as np
import pandas as pd
import pyarrow as pa
import os.path

<a lot of other code here>

changed_ct = 0
all_cols_ct = 0
table3 = pa.Table.from_arrays([pa.array(range(0,863))], names=('0')) # CREATE TEMP COLUMN!!
#print(table3)
#exit()
changed_column_list = []
for col_name in table2.column_names:
    print('processing column: ' + col_name)
    new_list = []
    col_data = pa.Table.column(table2, col_name)
    col_data_type = table2.schema.field(col_name).type
    printed_changed_flag = False
    for i in col_data:
        # GET STRING REPRESENTATION OF THE COLUMN DATA
        if(col_data_type == 'string'):
            col_str = pa.StringScalar.as_py(i)
        elif(col_data_type == 'int32'):
            col_str = pa.Int32Scalar.as_py(i)
        elif(col_data_type == 'int64'):
            col_str = pa.Int64Scalar.as_py(i)
            
            
        if col_name in change_columns:
            if printed_changed_flag == False:
                print('changing values in column '  + col_name)
                changed_column_list.append(col_name)
                changed_ct += 1
                printed_changed_flag = True

            new_list.append(change_str(col_str))
        
        else:
            new_list.append(col_str)
        
    #set data type for the column
    if(col_data_type == 'string'):
        col_data_type = pa.string()
    elif(col_data_type == 'int32'):
        col_data_type = pa.int32()
    elif(col_data_type == 'int64'):
        col_data_type = pa.int64()
        
    arr = pa.array(new_list, type=col_data_type)
        
    new_field = pa.field(col_name, col_data_type)
    
    table3 = pa.Table.append_column(table3, new_field, arr)
        
    all_cols_ct += 1
    
#for i in table3:
#   print(i)

table3 = pa.Table.remove_column(table3, 0) # REMOVE TEMP COLUMN!!
#print(table2)
#print('-------------------')
#print(table3)
#exit()

print('changed ' + str(changed_ct) + ' columns:')
print(*changed_column_list, sep='\n')

# WRITE NEW PARQUET FILE
pa.parquet.write_table(table3, out_file)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Using SQLAlchemy, prevent SQLite3 table update by multiple program instances	Calab	3	748	Aug-09-2023, 05:51 PM Last Post: Calab
	pyarrow error when importing pandas	sravva	1	935	Jun-06-2023, 05:09 PM Last Post: snippsat
	How to select NULL and blank values from MySQL table into csv	python300	9	2,442	Dec-27-2022, 09:43 PM Last Post: deanhystad
	store all variable values into list and insert to sql_summary table	mg24	3	1,143	Sep-28-2022, 09:13 AM Last Post: Larz60+
	Sum the values in a pandas pivot table specific columns	klllmmm	1	4,636	Nov-19-2021, 04:43 PM Last Post: klllmmm
	UPDATE SQLITE TABLE - Copy a fields content to another field.	andrewarles	14	4,397	May-08-2021, 04:58 PM Last Post: ibreeden
	OSError: Unable to load libjvm when connecting to hdfs with pyarrow 3.0.0	aupres	0	3,160	Mar-22-2021, 10:25 AM Last Post: aupres
	Python + Google Sheet \| Best way to update specific cells in a single Update()?	Vokofe	1	2,684	Dec-16-2020, 05:26 AM Last Post: Vokofe
	pyarrow throws oserror winerror 193 1 is not a valid win32 application	aupres	2	3,782	Oct-21-2020, 01:04 AM Last Post: aupres
	Unable to Update SQLite Table	sambanerjee	5	2,959	Sep-30-2020, 12:21 PM Last Post: Larz60+

How to update values in a pyarrow table?

User Panel Messages

Announcements