Python Forum
In CSV, how to write the header after writing the body?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
In CSV, how to write the header after writing the body?
#11
(Jan-03-2018, 11:39 PM)Gribouillis Wrote: Can't you make a two-pass program? You could read the input file once and compute the csv header without writing any output, then go back at the beginning of the input file and use a csv writer on the second pass.

Thanks for suggestion, but I wonted to be in one file, beside I have problem in the current code.

(Jan-03-2018, 11:57 PM)Larz60+ Wrote: The class that I presented allows a CSV file to be read in and modified, without compromising
file integrity. This is demonstrated in the two excel displays on my post. The first with the original
header, the second after the header was modified.

A file is a file. So long as the separators are not messed with, things like this can be done.

At any point, I cloned your github repository and will look at it after supper.

I really loved and benefit a lot from your class, even I kept it for later reference. However, when I used it on this particular problem, the CSV cells get missed up. I do not know why, but I guess the reason is that many of the fields contains commas in it.

(Jan-04-2018, 03:02 AM)Larz60+ Wrote: Question: Did the .bib file originate from a website? If so, there may be an easier way to do this.

Yes, it is generated by a IEEE website, but also I have other .bib files who are generated by different websites and the tag in them is not in the same order. That because I wrote this code to consider all tags, even if they are in different order, or some are missing.
Reply
#12
Here's how I'm approaching this (almost done), I may not finish until morning as it's
1 A.M. here now, but I'm so close, I'm going to try and finish tonight.

My thoughts were to parse the .bib file, creating a dictionary that can be accessed by the ieee id.
then you can simply access all fields by key, value, for example ieee[id]['author']
or ieee[id]['abstract']
To process an entire file, you can use something like:
for key, value in ieee.items():
    ...
The final thing I have to do is handle the exception on line end.
back soon, if not, in the A.M.
Reply
#13
(Jan-04-2018, 06:14 AM)Larz60+ Wrote: Here's how I'm approaching this (almost done), I may not finish until morning as it's
1 A.M. here now, but I'm so close, I'm going to try and finish tonight.

My thoughts were to parse the .bib file, creating a dictionary that can be accessed by the ieee id.
then you can simply access all fields by key, value, for example ieee[id]['author']
or ieee[id]['abstract']
To process an entire file, you can use something like:
for key, value in ieee.items():
    ...
The final thing I have to do is handle the exception on line end.
back soon, if not, in the A.M.

Thank you so much Larz60+ for your care and efforts, all are highly appreciated. Thumbs Up Heart
Please keep in mind that not all entries are generated by IEEE, some are generated from other journals that are not listed in IEEExplorer. So the .bib file is not necessary to follow IEEE standard.
Reply
#14
Do you use any of the available packages to parse the bib files or you write your own parser?
Reply
#15
Raj, just click on new thread

Tim,

OK, I got it to work. This will make life a lot easier for you.
first the code, and a sample run. Explanation after that:

Code:
import json


class ReadBib2:
    def __init__(self):
        self.csv_file = 'ConferencePublications.csv'
        self.bib_file = 'ConferencePublications.bib'
        self.temp = 'temp.txt'
        self.ieee = {}

    def make_dict(self):
        with open(self.bib_file, 'r', encoding="utf8") as bf:
            line1 = ''
            longline = False
            for line in bf:
                line = line.strip()
                if '@' not in line and not line.endswith('},') and not line.endswith('},}'):
                    line1 += line
                    longline = True
                    continue
                if longline:
                    longline = False
                    line = line1 + line
                    line1 = ''
                if line.startswith('@'):
                    id = line[line.index('{')+1:]
                    id = id.replace(',', '')
                    self.ieee[id] = {}
                else:
                    name = line[:line.index('=')]
                    value = line[line.index('=')+2:line.index('}')]
                    self.ieee[id][name] = value
        # Save it to json file for future use
        with open('ConferensePublications.json', 'w') as jo:
            json.dump(self.ieee, jo)


    def query_dict(self):
        # first test json load:
        with open('ConferensePublications.json', 'r') as jin:
            local_dict = json.load(jin)

        key = input('Enter ieee id: ')
        print(f'data for {key}:')

        # To get a list of keys:
        all_keys = list(local_dict.keys())
        all_keys.sort()
        print(f'allkeys: {all_keys}')

        # you can get all fields with this
        for key1, value1 in local_dict[key].items():
            print(f'    {key1}, value: {value1}')
        # Or select individual fields like this:
        print(f"\nid 6292950 abstract = {local_dict['6292950']['abstract']}")

        # Or to get entire dictionary:
        for key, value in local_dict.items():
            print(f'Data for ieee id: {key}')
            for key1, value1 in value.items():
                print(f'    {key1}: {value1}')

def main():
    rb = ReadBib2()
    rb.make_dict()
    rb.query_dict()

if __name__ == '__main__':
    main()
Test:
Output:
allkeys: ['1040292', '1188147', '1188376', '1204080', '1378454', '1506224', '4036232', '4401396', '4480099', '4518238', '4557318', '4595196', '4728502', '5069522', '5201243', '5205261', '5206022', '5407520', '5425904', '5450184', '5478770', '5502530', '5671265', '5711782', '5744885', '5745174', '5938020', '5947173', '6034093', '6058866', '6120179', '6221220', '6292950', '6292976', '6310445', '6487117', '6487241', '6487274', '6579580', '6602377', '6602383', '6612051', '6638540', '6645331', '6655322', '6655557', '6673389', '6692503', '6811744', '6811776', '681783', '6855176', '6941904', '6952399', '6952628', '6952925', '6959000', '6966079', '7037034', '7037286', '7037599', '7060279', '7075229', '7080617', '7086838', '7145750', '7151123', '7156411', '7176779', '7176783', '7178352', '7178393', '7178506', '7178517', '7178607', '7247067', '7247160', '7248768', '7249036', '7326474', '7362511', '7369548', '7383821', '7390898', '7391001', '7391114', '7394416', '7414149', '7417276', '7417578', '7417787', '7418208', '7472232', '7472609', '750925', '7510709', '7510959', '7511143', '7521914', '7536773', '7539180', '7541684', '7551750', '7760279', '778822', '7794612', '7794681', '7841677', '7841695', '7841744', '7848801', '7881128', '7905806', '7952375', '7952960', '7959945', '8006695', '8006951', '8081160', '8081408', '8081637', '8081645', '862014', '940688', '940689', '987630'] Enter ieee id: 4036232 data for 4036232: author, value: T. Y. Al-naffouri and M. Sharif and B. Hassibi booktitle, value: 2006 IEEE International Symposium on Information Theory title, value: How Much Does Transmit Correlation Affect the Sum-Rate of MIMO Downlink Channels? year, value: 2006 volume, value: number, value: pages, value: 1574-1578 abstract, value: This paper considers the effect of spatial correlation between transmit antennas on the sum-rate capacity of the MIMO broadcast channel (i.e., downlink of a cellular system).Specifically, for a system with a large number of users n, we analyze the scaling laws of the sum-rate for the dirty paper coding and for different types of beamforming transmissionschemes. When the channel is i.i.d., it has been shown that for large n, the sum rate is equal to M log log n + M log P/M + o(1) where M is the number of transmit antennas, P is theaverage signal to noise ratio, and o(1) refers to terms that go to zero as n rarr infin. When the channel exhibits some spatial correlation with a covariance matrix R (non-singularwith tr(R) = M), we prove that the sum rate of dirty paper coding is M log log n + M log P/M + log det(R) + o(1). We further show that the sum-rate of various beamforming schemesachieves M log log n + M log P/M + M log c + o(1) where c les 1 depends on the type of beamforming. We can in fact compute c for random beamforming proposed in M. Sharif et al. (2005)and more generally, for random beamforming with preceding in which beams are pre-multiplied by a fixed matrix. Simulation results are presented at the end of the paper keywords, value: MIMO systems;antenna arrays;broadcast channels;channel capacity;computational complexity;covariance matrices;radio links;transmitting antennas;wireless channels;MIMObroadcast channel;MIMO downlink channel sum-rate capacity;average signal to noise ratio;beamforming transmission schemes;covariance matrix;dirty paper coding;multiple input multipleoutput systems;random beamforming;spatial correlation;transmit antennas;transmit correlation;Array signal processing;Base stations;Broadband antennas;Broadcasting;Covariancematrix;Downlink;MIMO;Signal to noise ratio;Transmitters;Transmitting antennas doi, value: 10.1109/ISIT.2006.261541 ISSN, value: 2157-8095 month, value: July id 6292950 abstract = This paper presents a maximum likelihood (ML) approach to mitigate the effect of narrow band interference (NBI) in a zero padded orthogonal frequency division multiplexing(ZP-OFDM) system. The NBI is assumed to be time variant and asynchronous with the frequency grid of the ZP-OFDM system. The proposed structure based technique uses the fact that theNBI signal is sparse as compared to the ZP-OFDM signal in the frequency domain. The structure is also useful in reducing the computational complexity of the proposed method. The paperalso presents a data aided approach for improved NBI estimation. The suitability of the proposed method is demonstrated through simulations.
Look at the example. Now you can pick and choose what you would like to save to your csv file.
If you want to save the entire dictionary, you can do it either by:
  • Get a list of keys
  • Access each record by key
  • Or get the whole shebang
Reply
#16
Add the following to the query_dict function to show how to access entire dictionary:
        # Or to get entire dictionary:
        for key, value in local_dict.items():
            print(f'Data for ieee id: {key}')
            for key1, value1 in value.items():
                print(f'    {key1}: {value1}')
Also note, I saved the dictionary to a json file, and load it for the query
this allows using the same data for other modules without having to parse.

The ideal way to do this, is to scrape the web site, and create the csv directly using beautifulsoup
Reply
#17
Dear Larz60+, your code is a next level. I'm so impressed, thank you for your time.
I'll review it and get back to you. Thumbs Up
Reply
#18
(Jan-04-2018, 07:16 AM)Larz60+ Wrote: OK, I got it to work. This will make life a lot easier for you.
first the code, and a sample run. Explanation after that:

Thank you so much for your code. I leaned a lot from it. The idea of making dictionary for each .bib item and then export it as .json was brilliant. I've amended the code to generate both .csv file and .json file.
The new amended code is

import json
import csv
import os



class BIB2CSV:
    def __init__(self, CSVName, BIBName,JsonName):
        self.CSVName = CSVName
        self.BIBName = BIBName
        self.JsoName = JsonName
        self.BIBData = {}

    def make_dict(self):
        with open(self.BIBName, 'r', encoding="utf8") as BIBFilePointer:
            line1 = ''
            longline = False
            for line in BIBFilePointer:
                line = line.strip()
                if '@' not in line and not line.endswith('},') and not line.endswith('},}'):
                    line1 += line
                    longline = True
                    continue
                if longline:
                    longline = False
                    line = line1 + line
                    line1 = ''
                if line.startswith('@'):
                    EntryKey = line[line.index('{') + 1:]
                    EntryKey = EntryKey.replace(',', '')
                    self.BIBData[EntryKey] = {}
                else:
                    name = line[:line.index('=')]
                    value = line[line.index('=') + 2:line.index('}')]
                    self.BIBData[EntryKey][name] = value
        # Save it to json file for future use
        with open(self.JsoName, 'w') as jo:
            json.dump(self.BIBData, jo, indent=2)

    def CreateCSV(self):
        # pdb.set_trace()

        BIBKeys = [Key for Key in self.BIBData]
        CSVHeader = ["ItemKey"]
        for HeaderItem in self.BIBData[BIBKeys[1]]:
            CSVHeader.append(HeaderItem)


        with open('CSVTemp.csv', 'w', encoding='utf-8', newline='') as CSVFilePointer:
            CSVWriterPointer = csv.writer(CSVFilePointer)
            for BibKey in self.BIBData:
                CSVLineContent = []
                # This loop take an item from the CSV and put its value in the CSV line
                for HeaderItem in CSVHeader:
                    if HeaderItem == "ItemKey":
                        CSVLineContent.append(BibKey)
                        continue
                    if HeaderItem in self.BIBData[BibKey]:
                        CSVLineContent.append(self.BIBData[BibKey][HeaderItem])
                    else:
                        CSVLineContent.append('')

                # This loop search for a new head item an add it to the CSV head
                for HeaderItem in self.BIBData[BibKey]:
                    if HeaderItem not in CSVHeader:
                        CSVHeader.append(HeaderItem)
                        CSVLineContent.append(self.BIBData[BibKey][HeaderItem])

                CSVWriterPointer.writerow(CSVLineContent)

        with open('CSVTemp.csv') as CSVFilePointer:
            CSVReaderPointer = csv.reader(CSVFilePointer)
            with open(self.CSVName, 'w+', newline='') as CSVFilePointer2:
                CSVWriterPointer = csv.writer(CSVFilePointer2)
                CSVWriterPointer.writerow(CSVHeader)
                for Row in CSVReaderPointer:
                    CSVWriterPointer.writerow(Row)
        os.remove('CSVTemp.csv')



def main():
    CSVName ='CSVFile.csv'
    BIBName = 'ConferencePublications.bib'
    JsonName = 'JsonFile.json'
    BibConverter = BIB2CSV(CSVName, BIBName, JsonName)
    BibConverter.make_dict()
    BibConverter.CreateCSV()


if __name__ == '__main__':
    main()
GitHub is:
https://github.com/mTamim/Bib2CSV

Thank you so much Larz60+
Reply
#19
Glad I could help.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [BeautifulSoup] Find </body>? Winfried 3 1,240 Jul-21-2023, 11:25 AM
Last Post: Gaurav_Kumar
  Get html body of URL rama27 6 3,355 Aug-03-2020, 02:37 PM
Last Post: snippsat
  malformed header from script 'main.py': Bad header: * Serving Flask app "main" anuragsapanbharat 2 4,460 Jun-12-2019, 07:26 AM
Last Post: anuragsapanbharat
  Is it possible to perform a PUT request by passing a req body instead of an ID ary 0 1,793 Feb-20-2019, 05:55 AM
Last Post: ary

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020