Python Forum
Syntax for array splitting in a loop?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Syntax for array splitting in a loop?
#1
I have tried numerous ways to parse a file and strip the array down to only the text inside the "". However, I seem to be unable to figure out the correct syntax for it. I assumed the issue might have been from trying to access an element which didn't exist (because no " was in the file line), but that wasn't the issue.

The code only works if I explicitly say
sa_data[1] = sa_data[1].split('"')[1]
However, that's not what I want. I want it to loop by itself.

Text.txt:
<tag1>
    <tag2 name = "tag 2">
      <tag3 name="korea"/>
      <tag4 name="china"/>
      <tag5 name="japan"/>
    </tag2>
  
</tag1>
Code:
sa_data = []

with open('Text.txt') as fin:
	s_Line = fin.readline()
	for s_Line in fin:
		sa_data.append(s_Line)

for i in sa_data:
	if range(sa_data[i].split('"')) is 3:
		sa_data[i] = sa_data[i].split('"')[1]
Ideally, what I wanted is something like:
sa_data = []

with open('Text.txt') as fin:
	s_Line = fin.readline()
	for s_Line in fin:
		sa_data.append(s_Line.split('"')[1]
Reply
#2
def get_tags():
    sa_data = []
    with open('test.txt', 'r') as f:
        for line in f:
            line = line.strip().split('"')
            if len(line) < 2:
                continue
            sa_data.append(line[1])
    return sa_data

if __name__ == '__main__':
    print(get_tags())
results:
Output:
['tag 2', 'korea', 'china', 'japan']
Reply
#3
@Larz60+, Thank you. That does the trick.

I have 2 more questions:
1-
(Apr-08-2018, 06:28 PM)Larz60+ Wrote: if len(line) < 2:
I see, so len() is for array size, and range() is for x to y within an array?

2-
(Apr-08-2018, 06:28 PM)Larz60+ Wrote: if __name__ == '__main__':
Why do you use this? This is only required in bigger programs to ensure correct scoping?
Reply
#4
Question 1:
Quote:I see, so len() is for array size, and range() is for x to y within an array?
I don't use range, so don't know how to answer that part of question
if you are trying to split on a delimiter that doesn't exist, the result will have a length of 1
otherwise it will be greater than one, and the item with index == 1 will be the data you are looking for.
the best way to see this is to put a print statement after line 5:
def get_tags():
    sa_data = []
    with open('test.txt', 'r') as f:
        for line in f:
            line = line.strip().split('"')
            print('line: {}'.format(line))
            if len(line) < 2:
                continue
            sa_data.append(line[1])
    return sa_data

if __name__ == '__main__':
    print(get_tags())
which will show:
Output:
line: ['<tag1>'] line: ['<tag2 name = ', 'tag 2', '>'] line: ['<tag3 name=', 'korea', '/>'] line: ['<tag4 name=', 'china', '/>'] line: ['<tag5 name=', 'japan', '/>'] line: ['</tag2>'] line: [''] line: ['</tag1>'] ['tag 2', 'korea', 'china', 'japan']
Question 2:
Quote:Why do you use this? This is only required in bigger programs to ensure correct scoping?
Program size has nothing to do with it.
If you program is called from command line, or from within an ide, __name__ will equal __main__
if called from another module it will have program name in __name__.
So this works for either case.
It's not required, just handy you could have added the line:
get_tags()
instead.
Reply
#5
No has data a format of HTML/XML.
Then it's better to use a parser example Beautiful Soup
from bs4 import BeautifulSoup
import re

html = '''\
<tag1>
  <tag2 name="tag 2">
    <tag3 name="korea"/>
    <tag4 name="china"/>
    <tag5 name="japan"/>
  </tag2>
</tag1>
'''
soup = BeautifulSoup(html, 'lxml')
tag_2 = soup.find('tag2')
contry_tag = tag_2.find_all(re.compile("tag\d"))
for contry in contry_tag:
    print(contry.get('name'))
Output:
korea china japan
Reply
#6
I didn't think the data quite fit the bill as XML or HTML, considered using BeautifulSoup, but didn't because
the format didn't quite look like valid markup.
Reply
#7
I've heard of soup before. I don't like the naming of everything, but looking at what it can do, it's quite handy.

P.s. I'm not reading html. It's simply a bunch of tags.
Reply
#8
BeautifulSoup is normally used for parsing html, but it's just an xml parser, and will work fine with any sort of xml (which is what you've got).
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020