Python Forum
How to update values in a pyarrow table?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to update values in a pyarrow table?
#2
I was able to get it working using these references:

http://arrow.apache.org/docs/python/gene...Table.html

http://arrow.apache.org/docs/python/gene...Field.html

https://github.com/apache/arrow/blob/mas...t_table.py

Basically it loops through the original table and creates new columns (pa.array) with the adjusted text that it appends to a new table. It's probably not the best way to do it, but it worked. Most importantly, it let me preserve the nulls and specify the data type of each column.

import sys, getopt
import random
import re
import math

import pyarrow.parquet as pq
import pyarrow.csv as pcsv
import numpy as np
import pandas as pd
import pyarrow as pa
import os.path

<a lot of other code here>

changed_ct = 0
all_cols_ct = 0
table3 = pa.Table.from_arrays([pa.array(range(0,863))], names=('0')) # CREATE TEMP COLUMN!!
#print(table3)
#exit()
changed_column_list = []
for col_name in table2.column_names:
    print('processing column: ' + col_name)
    new_list = []
    col_data = pa.Table.column(table2, col_name)
    col_data_type = table2.schema.field(col_name).type
    printed_changed_flag = False
    for i in col_data:
        # GET STRING REPRESENTATION OF THE COLUMN DATA
        if(col_data_type == 'string'):
            col_str = pa.StringScalar.as_py(i)
        elif(col_data_type == 'int32'):
            col_str = pa.Int32Scalar.as_py(i)
        elif(col_data_type == 'int64'):
            col_str = pa.Int64Scalar.as_py(i)
            
            
        if col_name in change_columns:
            if printed_changed_flag == False:
                print('changing values in column '  + col_name)
                changed_column_list.append(col_name)
                changed_ct += 1
                printed_changed_flag = True

            new_list.append(change_str(col_str))
        
        else:
            new_list.append(col_str)
        
    #set data type for the column
    if(col_data_type == 'string'):
        col_data_type = pa.string()
    elif(col_data_type == 'int32'):
        col_data_type = pa.int32()
    elif(col_data_type == 'int64'):
        col_data_type = pa.int64()
        
    arr = pa.array(new_list, type=col_data_type)
        
    new_field = pa.field(col_name, col_data_type)
    
    table3 = pa.Table.append_column(table3, new_field, arr)
        
    all_cols_ct += 1
    
#for i in table3:
#   print(i)

table3 = pa.Table.remove_column(table3, 0) # REMOVE TEMP COLUMN!!
#print(table2)
#print('-------------------')
#print(table3)
#exit()

print('changed ' + str(changed_ct) + ' columns:')
print(*changed_column_list, sep='\n')

# WRITE NEW PARQUET FILE
pa.parquet.write_table(table3, out_file)
Reply


Messages In This Thread
RE: How to update values in a pyarrow table? - by xraphael75 - Jan-25-2021, 02:14 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
Question Using SQLAlchemy, prevent SQLite3 table update by multiple program instances Calab 3 748 Aug-09-2023, 05:51 PM
Last Post: Calab
  pyarrow error when importing pandas sravva 1 935 Jun-06-2023, 05:09 PM
Last Post: snippsat
Photo How to select NULL and blank values from MySQL table into csv python300 9 2,442 Dec-27-2022, 09:43 PM
Last Post: deanhystad
  store all variable values into list and insert to sql_summary table mg24 3 1,143 Sep-28-2022, 09:13 AM
Last Post: Larz60+
  Sum the values in a pandas pivot table specific columns klllmmm 1 4,636 Nov-19-2021, 04:43 PM
Last Post: klllmmm
  UPDATE SQLITE TABLE - Copy a fields content to another field. andrewarles 14 4,397 May-08-2021, 04:58 PM
Last Post: ibreeden
  OSError: Unable to load libjvm when connecting to hdfs with pyarrow 3.0.0 aupres 0 3,160 Mar-22-2021, 10:25 AM
Last Post: aupres
Question Python + Google Sheet | Best way to update specific cells in a single Update()? Vokofe 1 2,684 Dec-16-2020, 05:26 AM
Last Post: Vokofe
  pyarrow throws oserror winerror 193 1 is not a valid win32 application aupres 2 3,782 Oct-21-2020, 01:04 AM
Last Post: aupres
  Unable to Update SQLite Table sambanerjee 5 2,959 Sep-30-2020, 12:21 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020