Python Forum

Full Version: extract information from a file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Dear friends:

I must extract information from a sdf file containing molecule dates:

.
.
.
43 46 1 0 0 0 0
46 47 1 0 0 0 0
46 49 2 0 0 0 0
47 48 1 0 0 0 0
50 51 3 0 0 0 0
51 52 1 0 0 0 0
M END
> <Name>
4_pentynoic_acid

> <activity>
non


$$$$
CHAPS
MOE2011 3D
Structure written by Hyleos SD API
100103 0 0 1 0 0 0 0 0999 V2000
13.3220 1.0470 0.2750 S 0 0 0 0 0 0 0 0 0 0 0 0
.
.
.

I would like to obtain the following information for each molecule in a new file (result.txt):
Name: 4_pentynoic_acid, activity: non.

This sdf file contains 400 other molecules each one with its name and its activity.
Could you help me find the way to obtain this code?
Thank you very much!!

This is what I have been trying:

infile = open('Substrates.sdf', 'r')
outfile = open('result.txt', 'w')
copy = False
tmpLines = []
for line in infile:
	if line == '<Name>':
	
		copy = True
		tmpLines = []
	elif line == '$$$$':
		copy = False
		for tmpLine in tmpLines:
			outfile.write(tmpLine)
	elif copy:
		tmpLines.append(line)
Is there a location where I can download a complete sample file?
import os


def get_data():
    # set starting directory
    os.chdir(os.path.abspath(os.path.dirname(__file__)))
    indata = []
    with open('Substrates.sdf', 'r') as fp:
        for line in fp:
            indata.append(line.strip())

    with open('result.txt', 'w') as fp_out:
        for n, line in enumerate(indata):
            if 'Name' in line:
                fp_out.write(f'Name: {indata[n+1]}')
            if 'activity' in line:
                fp_out.write(f', activity: {indata[n+1]}\n')

if __name__ == '__main__':
    get_data()
partial results:
Output:
(try_stuff_venv) > cat src/result.txt Name: 4_pentynoic_acid, activity: non Name: CHAPS, activity: non Name: D_24851, activity: non Name: NSC109350_got, activity: non Name: NSC118742_got, activity: non Name: NSC122301_got, activity: non Name: NSC132791_got, activity: non Name: NSC139490_got, activity: non Name: NSC144153_got, activity: non Name: NSC145150_got, activity: non Name: NSC152731_got, activity: non Name: NSC161128_got, activity: non Name: NSC167780_got, activity: non
Thank you very much!
It's perfect!!!