Python Forum
[split] Rearranging CSV columns and rows
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[split] Rearranging CSV columns and rows
#1
Thank you so much - it worked! I followed your exact instructions and it worked perfectly. I now have both values for 'entityCommonStockSharesOutstanding' written in CSV instead of just one. Again thank you so much!

The only thing is that the populated cells in each row of the CSV file do not match the CSV's column headers (as you correctly indicated that this would happen in your earlier post).

For example, where there are two values for 'entityCommonStockSharesOutstanding', then one value will be in a cell in same column as the 'EntityCommonStockSharesOutstanding' header, whilst the other would be written to next cell on right and this falls under the 'DocumentCreationDate' column header.

I tried to fix this by adding one more 'EntityCommonStockSharesOutstanding' in the code as below:
ratiofile.write( "TradingSymbol,EntityRegistrantName,EntityCentralIndexKey,CurrentFiscalYearEndDate,DocumentType,DocumentPeriodEndDate,DocumentFiscalYearFocus,DocumentFiscalPeriodFocus,EntityCommonStockSharesOutstanding,EntityCommonStockSharesOutstanding,DocumentCreationDate,CommonStockSharesIssued,CommonStockSharesIssued,TreasuryStockShares,CommonStockSharesOutstanding\n" )
And it does fix the problem, but for the other security (different row in CSV), as it does not have two values for 'entityCommonStockSharesOutstanding', it writes the values to different header columns (for example: 'commonStockSharesIssued' ends up being written under the 'DocumentCreationDate' header column)

It would be great to know how to fix this - but actually, more importantly, I am now trying to learn how I can write to CSV (one cell only) the SUM of All Values that are available under the 'entityCommonStockSharesOutstanding'.

For example, if the below code writes to CSV the following values for 'entityCommonStockSharesOutstanding': 5 and 2, in the same row, but in different cells, then is there a way I can modify the code to return the SUM, being 7, as the 'entityCommonStockSharesOutstanding' and have this value of 7 written to one cell only?

I have enclosed the full script below as it stands now:

# Copyright 2014 Altova GmbH
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
#	  http
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os, re, sys
from altova import xml, xsd, xbrl

sec_ns =  '/dei/'		# was: 
fasb_ns = '/us-gaap/'	# was: 

# We need to implement a simple locking class that allows us to avoid having 
# on_xbrl_valid called from multiple threads in RaptorXML at the same time.

if sys.platform == 'win32':
	import msvcrt

	# fcntl is not available under Windows, so we'll use a 
	# file locking function from the msvcrt library instead...

	class Lock:

		def __init__(self, filename):
			self.filename = filename
			# This will create it if it does not exist already
			self.handle = open(filename, 'w')

		# Bitwise OR fcntl.LOCK_NB if you need a non-blocking lock
		def acquire(self):
			msvcrt.locking(self.handle.fileno(), msvcrt.LK_LOCK, 1)

		def release(self):
			msvcrt.locking(self.handle.fileno(), msvcrt.LK_UNLCK, 1)

		def __del__(self):
			self.handle.close()

else:
	import fcntl

	# Under Linux and MacOS we can use the fcntl library to implement
	# the simple file locking mechanism...

	class Lock:

		def __init__(self, filename):
			self.filename = filename
			# This will create it if it does not exist already
			self.handle = open(filename, 'w')

		# Bitwise OR fcntl.LOCK_NB if you need a non-blocking lock
		def acquire(self):
			fcntl.flock(self.handle, fcntl.LOCK_EX)

		def release(self):
			fcntl.flock(self.handle, fcntl.LOCK_UN)

		def __del__(self):
			self.handle.close()

def camelToSpaces( label ):
	# Utility for pretty-printing the labels
	s1 = re.sub('(.)([A-Z][a-z]+)', r'\1 \2', label)
	return re.sub('([a-z0-9])([A-Z])', r'\1 \2', s1)

def factFinder( instance, namespace, label ):
	# Locate facts in the instance document by namespace and label, ignoring facts that have a context with a segment_element
	l = []
	for f in instance.items:
		if f.qname.namespace_name.find( namespace ) and f.qname.local_name == label:
			l.append( f )
	if not l:
		print('### NOT FOUND', label, type(l))
	return l
	
def printFacts( facts, indent=1, targetDate=None ):
    my_list = []
    for fact in facts:
        if targetDate==None or fact.context.period.instant == targetDate:
            if fact.concept.item_type==fact.concept.MONETARY_ITEM_TYPE:
                my_list.append(fact.effective_numeric_value)
                #print( indent * "\t", camelToSpaces( fact.qname.local_name ).ljust(100-indent*8), "$", '{}'.format( factValue ) )
            else:
                my_list.append(fact.normalized_value)
                #print( indent * "\t", camelToSpaces( fact.qname.local_name ).ljust(100-indent*8), factValue )
    return my_list
 
def on_xbrl_valid( job, instance ):
 
    try:
        # a portable solution to get the tmp dir
        import tempfile
        tmp = tempfile.gettempdir()
        tmplk = os.path.join( tmp, "extract_ratios_lock.tmp" )
        lock = Lock(tmplk)
        lock.acquire()
 
        # Create output CSV file if it doesn't exist yet
        if not os.path.isfile( "ratios.csv" ):
            with open("ratios.csv", "a") as ratiofile:
                	ratiofile.write( "TradingSymbol,EntityRegistrantName,EntityCentralIndexKey,CurrentFiscalYearEndDate,DocumentType,DocumentPeriodEndDate,DocumentFiscalYearFocus,DocumentFiscalPeriodFocus,EntityCommonStockSharesOutstanding,EntityCommonStockSharesOutstanding,DocumentCreationDate,CommonStockSharesIssued,CommonStockSharesIssued,TreasuryStockShares,CommonStockSharesOutstanding\n" )
                	ratiofile.close()
 
   		# Extract some basic facts from the filing, such as the effective end-date for balance sheet etc.
        tradingSymbol = factFinder( instance, sec_ns, "TradingSymbol" )
        entityRegistrantName = factFinder( instance, sec_ns, "EntityRegistrantName" )
        entityCentralIndexKey = factFinder( instance, sec_ns, "EntityCentralIndexKey" )
        currentFiscalYearEndDate = factFinder( instance, sec_ns, "CurrentFiscalYearEndDate" )
        documentType = factFinder( instance, sec_ns, "DocumentType" )
        documentPeriodEndDate = factFinder( instance, sec_ns, "DocumentPeriodEndDate" )
        documentFiscalYearFocus = factFinder( instance, sec_ns, "DocumentFiscalYearFocus" )
        documentFiscalPeriodFocus = factFinder( instance, sec_ns, "DocumentFiscalPeriodFocus" )
       	entityCommonStockSharesOutstanding = factFinder( instance, sec_ns, "EntityCommonStockSharesOutstanding" )
        documentCreationDate = factFinder( instance, sec_ns, "DocumentCreationDate" )
        commonStockSharesIssued = factFinder( instance, sec_ns, "CommonStockSharesIssued" )
        treasuryStockShares = factFinder( instance, sec_ns, "TreasuryStockShares" )
        commonStockSharesOutstanding = factFinder( instance, sec_ns, "CommonStockSharesOutstanding" )
        docEndDate = "2017-07-31"
        if len(documentPeriodEndDate) > 0:
            docEndDate = documentPeriodEndDate[0].normalized_value
 
        # Print information about filing and entity
        print( "Document and Entity Information:" )
        tradingSymbol = printFacts( tradingSymbol ) if tradingSymbol else ""
        entityRegistrantName = printFacts( entityRegistrantName ) if entityRegistrantName else ""
        entityCentralIndexKey = printFacts( entityCentralIndexKey ) if entityCentralIndexKey else ""
        currentFiscalYearEndDate = printFacts( currentFiscalYearEndDate ) if currentFiscalYearEndDate else ""
        docType = printFacts( documentType )
        documentPeriodEndDate = printFacts( documentPeriodEndDate ) if documentPeriodEndDate else ""
        documentFiscalYearFocus = printFacts( documentFiscalYearFocus ) if documentFiscalYearFocus else ""
        documentFiscalPeriodFocus = printFacts( documentFiscalPeriodFocus ) if documentFiscalPeriodFocus else ""
        entityCommonStockSharesOutstanding = printFacts( entityCommonStockSharesOutstanding, 3 ) if entityCommonStockSharesOutstanding else ""
        documentCreationDate = printFacts( documentCreationDate ) if documentCreationDate else ""
        commonStockSharesIssued = printFacts( commonStockSharesIssued, 3, docEndDate ) if commonStockSharesIssued else ""
       	treasuryStockShares = printFacts( treasuryStockShares, 3, docEndDate ) if treasuryStockShares else ""
        commonStockSharesOutstanding = printFacts( commonStockSharesOutstanding, 3, docEndDate ) if commonStockSharesOutstanding else ""
 
        # Append ratios to a CSV file for further analysis
        docType = ','.join(docType)
        tradingSymbol = ','.join(tradingSymbol)
        entityRegistrantName = ','.join(entityRegistrantName)
        entityCentralIndexKey = ','.join(entityCentralIndexKey)
        currentFiscalYearEndDate = ','.join(currentFiscalYearEndDate)
        documentPeriodEndDate = ','.join(documentPeriodEndDate) 
        documentFiscalYearFocus = ','.join(documentFiscalYearFocus) 
        documentFiscalPeriodFocus = ','.join(documentFiscalPeriodFocus) 
        entityCommonStockSharesOutstanding = ','.join(entityCommonStockSharesOutstanding)
        commonStockSharesIssued = ','.join(commonStockSharesIssued)
        treasuryStockShares = ','.join(treasuryStockShares)
        commonStockSharesOutstanding = ','.join(commonStockSharesOutstanding)
        
        with open("ratios.csv", "a") as ratiofile:
        	buff = '{},{},{},{},{},{},{},{},{},{},{},{},{}\n'.format(tradingSymbol + "," + entityRegistrantName + "," + entityCentralIndexKey + ","
                                                          + currentFiscalYearEndDate + ","
                                                          + docType + "," + documentPeriodEndDate + "," + documentFiscalYearFocus + ","
                                                          + documentFiscalPeriodFocus + ","
                                                          + entityCommonStockSharesOutstanding + "," + documentCreationDate + ","
                                                          + commonStockSharesIssued + "," + treasuryStockShares + ","
                                                          + commonStockSharesOutstanding)
        	ratiofile.write(buff)
 
    finally:
        lock.release()
Reply
#2
I managed to find how to sum the values for 'entityCommonStockSharesOutstanding'. I needed to convert the string to integers and then sum it up. Below is the code that worked:

entityCommonStockSharesOutstanding = sum(list(map(int, entityCommonStockSharesOutstanding)))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Converting a json file to a dataframe with rows and columns eyavuz21 13 4,392 Jan-29-2023, 03:59 PM
Last Post: eyavuz21
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 1,872 Dec-12-2022, 08:22 PM
Last Post: jh67
  Check DataFrames with different sorting in columns and rows Foxyskippy 0 771 Nov-19-2022, 07:49 AM
Last Post: Foxyskippy
  Split single column to multiple columns SriRajesh 1 1,319 Jan-07-2022, 06:43 PM
Last Post: jefsummers
  The code I have written removes the desired number of rows, but wrong rows Jdesi1983 0 1,625 Dec-08-2021, 04:42 AM
Last Post: Jdesi1983
  making variables in my columns and rows in python kronhamilton 2 1,609 Oct-31-2021, 10:38 AM
Last Post: snippsat
  rows from sql query need to write to a file as columns sjcsvatt 6 2,382 Oct-09-2021, 12:45 AM
Last Post: snippsat
  Merging spreadsheets with the same columns and extracting rows with matching entries johnbernard 3 9,412 Aug-19-2021, 03:08 PM
Last Post: johnbernard
  Summing up rows and columns plumberpy 3 2,260 Aug-18-2021, 05:46 AM
Last Post: naughtyCat
  Pandas DataFrame combine rows by column value, where Date Rows are NULL rhat398 0 2,105 May-04-2021, 10:51 PM
Last Post: rhat398

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020