Python Forum
string manipulation , code structure
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
string manipulation , code structure
#1
Hey,

Brand new to python and I’am giving it a try to learn it.
But I didn’t know that it was so difficult. Hours and hours i’ve allready spend and sometimes I don’t have the impression that i am moving forward.

In short the goal is, collect all usefull data strings that are coming in via serialline so that I can use them for making a chart.
I did some tests and I managed to collect the realtime strings and was able to save it and chop off everything I dont need. But I don’t have any knowledge of Python yet so there is absolutely a better way to do it.. But the realtime string structure is different from the other ones.
And every length of the string is differend. I mean the daily string length is not the same as the 90 days string length or the 84 mnd string length end the two words next to (in front and after) the data word I need is constantly changing.


So many questions.
How to start? First to know and to understand the head lines, The structure from where I can start with the code. One thing for example, when the data is coming in, how do I get rid of everything that I don’t need. And get rid of it before saving? Maybe so, but how? And do I have enough time to collect tree types of multiple strings sometimes more or less the same time?

Any help will be highly appreciated

Some info,

I’m using Python 2.7.14 (default, Sep 23 2017, 22:06:14)
[GCC 7.2.0] on linux2

The only data I need is highlighted
Multiple strings (only one shown)
Every 2 (odd) hour a string from the last 90 days is comming in and it looks like <msg><src>CC128v1.48</src><dsb>01339</dsb><time>15:17:50</time><hist><dsw>01341</dsw><type>1</type><units>kwhr</units><data><sensor>0</sensor><d058>1.710</d058><d057>1.460</d057><d056>1.664</d056><d055>1.585</d055></data><data><sensor>1</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>2</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>3</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>4</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>5</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>6</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>7</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>8</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>9</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data></hist></msg>


Multiple strings (only one shown)
Together with every 2 hour string of the last 84 mnd and it looks like this <msg><src>CC128v1.48</src><dsb>01339</dsb><time>15:21:20</time><hist><dsw>01341</dsw><type>1</type><units>kwhr</units><data><sensor>0</sensor><m020>279.750</m020><m019>304.500</m019><m018>326.000</m018><m017>308.750</m017></data><data><sensor>1</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>2</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>3</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>4</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>5</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>6</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>7</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>8</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>9</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data></hist></msg>

And there are two more strings. One realtime string every 6 seconds and one every odd hour for the last 31 days.
Reply
#2
So you want all children of the data element, except the sensor element, but only if the sensor element's value is 0. That's the xml path of /msg/hist/data, and only the first match.

The easiest way to get that sort of info, is to parse it as xml, and just jump to the part you care about. They're small snippets, so using a DOM parser is easier than using SAX.

Using the xml.etree.ElementTree in the standard lib, that means we could do something like this:
messages = [
    '''<msg><src>CC128v1.48</src><dsb>01339</dsb><time>15:17:50</time><hist><dsw>01341</dsw><type>1</type><units>kwhr</units><data><sensor>0</sensor><d058>1.710</d058><d057>1.460</d057><d056>1.664</d056><d055>1.585</d055></data><data><sensor>1</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>2</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>3</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>4</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>5</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>6</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>7</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>8</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data><data><sensor>9</sensor><d058>0.000</d058><d057>0.000</d057><d056>0.000</d056><d055>0.000</d055></data></hist></msg>''',
    '''<msg><src>CC128v1.48</src><dsb>01339</dsb><time>15:21:20</time><hist><dsw>01341</dsw><type>1</type><units>kwhr</units><data><sensor>0</sensor><m020>279.750</m020><m019>304.500</m019><m018>326.000</m018><m017>308.750</m017></data><data><sensor>1</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>2</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>3</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>4</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>5</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>6</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>7</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>8</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data><data><sensor>9</sensor><m020>0.000</m020><m019>0.000</m019><m018>0.000</m018><m017>0.000</m017></data></hist></msg>'''
]

import xml.etree.ElementTree as ET

for message in messages:
    root = ET.fromstring(message)
    data_elems = root.findall("./hist/data")
    data = data_elems[0]
    children = data.getchildren()
    # first element is "sensor", which we don't want
    for elem in children[1:]:
        name = elem.tag
        value = elem.text
        print("{0} => {1}".format(name, value))
    
Which gives:
Output:
>python spam.py d058 => 1.710 d057 => 1.460 d056 => 1.664 d055 => 1.585 m020 => 279.750 m019 => 304.500 m018 => 326.000 m017 => 308.750
You could also use BeautifulSoup for a nicer/easier way to get the data, but that'd be a package you'd need to install.
Reply
#3
Have you tried python's lxml module? It works pretty good for parsing through elements in xml or html form. you may need to install it though. Open up a terminal and type this command if you do

pip install lxml
or
apt-get install python-lxml

After that, you can follow a pretty good tutorial about using lxml here
http://lxml.de/tutorial.html
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [FLASK] How to structure the code in my case ? Fre3k 4 2,639 May-04-2020, 04:43 PM
Last Post: Fre3k
  News Gathering - String Manipulation Help delfar 8 7,256 Mar-24-2017, 07:09 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020