Python Forum

Full Version: Convert email addresses to VCF format
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
We need to transfer some names and email addresses from Claws mail address book to a Samsung Galaxy Tab A. To understand what the Galaxy wanted , did an export of what is in the contacts app. It is in VCF version 2.1 an example from https://docs.fileformat.com/email/vcf/ as follows:

Output:
BEGIN:VCARD VERSION:2.1 N:Gump;Forrest;;Mr. FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif TEL;WORK;VOICE:(111) 555-1212 TEL;HOME;VOICE:(404) 555-1212 ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America LABEL;WORK;PREF;ENCODING#QUOTED-PRINTABLE;CHARSET#UTF-8:100 Waters Edge#0D# #0ABaytown\, LA 30314#0D#0AUnited States of America ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;HOME;ENCODING#QUOTED-PRINTABLE;CHARSET#UTF-8:42 Plantation St.#0D#0A# Baytown, LA 30314#0D#0AUnited States of America EMAIL:[email protected] REV:20080424T195243Z END:VCARD
Claws can export the address book contents to either HTML or LDIF format. Rather than get bogged down in converting from one of those formats to VCF, I have used the script at https://r3mlab.github.io/python/2018/07/...riter.html . After correcting a few errors, the current code is

#!/usr/bin/env python

import csv

def vcfWriter(name, email, phone, category):
    vcfLines = []
    vcfLines.append('BEGIN:VCARD')
    vcfLines.append('VERSION:4.0')
    vcfLines.append('FN:%s' % name)
    vcfLines.append('EMAIL:%s' % email)
    vcfLines.append('TEL:%s' % phone)
    vcfLines.append('CATEGORIES:%s' % category)
    vcfLines.append('END:VCARD')
    vcfString = '\n'.join(vcfLines) + '\n'
    return vcfString

# Get data from the CSV file
csvFile = open('contacts.csv')
csvReader = csv.reader(csvFile)
csvData = list(csvReader)

# Create the ouput file
outputFile = open('contacts.vcf', 'w')

# Iterate over the lines of the CSV table
for row in range(len(csvData)):
    if row == 0:
        continue # Skip the first row (headers)
    else:
        # Get contact data from current row
        name = csvData[row][0]
        email = csvData[row][1]
        phone = csvData[row][2]
        category = csvData[row][3]

        # Write the corresponding vCard string to the output file:
        outputFile.write(vcfWriter(name, email,phone, category))

# Don't forget to close both files
outputFile.close()      
csvFile.close()
The input data did have TAB as a delimeter, but that needed to be change to a delimeter of a COMMA to get it to work. Here is the input data, file file contacts.csv

Output:
Alice , [email protected], 0123456789, Friends Bob, [email protected], 0987654321, Work
and the output data in file contacts.vcf is

Output:
BEGIN:VCARD VERSION:4.0 FN:Bob EMAIL: [email protected] TEL: 0987654321 CATEGORIES: Work END:VCARD
Note it is only reading the second line in the file, and not both. The version is easy to modify. As the exports from Claws mail are only HTML or LDIF format, they are quite different to a CSV file where it is only one row per person and all the fields are delimitered by a specific character.

So, I definitely need to move from using CSV to some sort of flat file format. The HTML looks quite messy and possibly hard to work with as there are cells within a table, lots of HTML code,etc. The LDIF on the other hand is like this

Output:
dn: uid=538705298 objectClass: person objectClass: inetOrgPerson cn: Forrest Gump sn: Gump givenName: Forrest displayName: Forrest Gump mail: [email protected]
The "dn: uid=" indicates a new dataset, the unique number there is irrelevant for this purpose. So in summary, how do I ensure all records are read in the above script, and how can the CSV be replaced with some sort of flat file processing please ?

Is the second part of the modifications suitable addressed by https://pypi.org/project/ldif/ ?
It was easier to look at doing this with the LDIF format. Here is the code:

#!/usr/bin/env python

from ldif3 import LDIFParser
from pprint import pprint

parser = LDIFParser(open("claws_export.ldif", "rb"))

for dn, record in parser.parse():
    
    name = ""
    if 'cn' in record:
        name = record['cn'][0]
    
    surname = ""
    if 'sn' in record:
        surname = record['sn'][0]
    
    given_name = ""
    if 'givenName' in record:
        given_name = record['givenName'][0]
    
    display_name = ""
    if 'displayName' in record:
        display_name = record['displayName'][0]
    
    email = ""
    if 'mail' in record:
        email = record['mail'][0]

    print ('BEGIN:VCARD')
    print ('VERSION:2.1')
    print ("N:" + surname + ";" + given_name + ";;;")
    print ("FN:" + name)
    print ("EMAIL;HOME:" + email)
    print ("END:VCARD")
The o/p data looks like it is nearly matching what is required for import into the Galaxy contacts. If I did a
print(record)
this was the output

Quote:OrderedDict([('objectClass', ['inetOrgPerson']), ('cn', ['Forrest Gump']), ('sn', ['Gump']), ('displayName', ['Forrest Gump']), ('mail', ['[email protected]'])])

Can any improvements be done to the code ? For example, it seems a waste having to do all those "if" statements; possibly python has some sort of lookup function, to lookup within the class ?
The solutions to my last question were answered at https://python-forum.io/Thread-Can-I-rep...#pid138636