Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Text parsing
#1
Hello!

I parse emails with receipts to collect a database using Python.
Idea is to concatenate line starting with "&" or lowercase alphabet letter with previous line.

That part of the code is:
# Concatenate lines starting with '&' or starting with lowercase to the end of the previous line
    combined_lines = []
    for line in soup.stripped_strings:
    if (line.startswith('&') or line[0].islower()) and combined_lines:
        combined_lines[-1] += ' ' + line.lstrip()
    else:
        combined_lines.append(line)
The "&" is working but lowercase is not.
What is my mistake?
deanhystad write Mar-11-2024, 05:01 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
The posted code doesn't work for & or lower. Maybe it is a cut/paste errror and you meant this:
combined_lines = []
for line in soup.stripped_strings:
    if (line.startswith('&') or line[0].islower()) and combined_lines:
        combined_lines[-1] += ' ' + line.lstrip()
    else:
        combined_lines.append(line)
Reply
#3
import pandas as pd
from imap_tools import MailBox
from bs4 import BeautifulSoup
import re

# Connect to the specified mail server and login using the provided credentials
with MailBox(server).login(login, password, folder) as mailbox:
    # Fetch emails
    emails = []
    for msg in mailbox.fetch():
        # Message body
        message_body = msg.text or msg.html

        # Parse the HTML content using BeautifulSoup
        soup = BeautifulSoup(message_body, 'html.parser')

        # Concatenate lines starting with '&' to the end of the previous line
        combined_lines = []
        for line in soup.stripped_strings:
            if (line.startswith('&') or line[0].islower()) and combined_lines:
                combined_lines[-1] += ' ' + line.lstrip()
            else:
                combined_lines.append(line)

        combined_text = '\n'.join(combined_lines)

        # Replace tabs with line feeds
        combined_text = combined_text.replace('\t', '\n')
        
        # Remove blank rows
        combined_text = '\n'.join([line.strip() for line in combined_text.splitlines() if line.strip()])

        # Remove leading spaces and tabs in every row
        combined_text = '\n'.join([line.lstrip() for line in combined_text.splitlines()])

        # Remember the first row
        first_row = combined_text.split('\n', 1)[0]

        # Find the text between "Goods" and "Payment"
        match = re.search(r'Goods(.*?)Payment', combined_text, re.DOTALL)
        if match:
            combined_text = match.group(1).strip()

        # Take only the first 12 letters for "order"
        order_value = first_row[:12]

        # Collect the email data 
        email_data = {
            'subject': msg.subject,
            'from': msg.from_,
            'to': msg.to,
            'date': msg.date,
            'body': combined_text,
            'order': order_value
        }
        emails.append(email_data)

    # Convert the emails to a pandas DataFrame
    df = pd.DataFrame(emails)
Reply
#4
There are no errors.
Output is correct if the line starts with "&", but in case of the line starts with lowercase letter it is not concatenated with the previous line.
Reply
#5
Do you have any lines that start with a lowercase letter? If the first character of line is whitespace (space or tab), your test will fail. Maybe you should call strip() before you check the first letter.
Reply
#6
(Mar-11-2024, 03:19 PM)Arik Wrote: Output is correct if the line starts with "&", but in case of the line starts with lowercase letter it is not concatenated with the previous line.
Can you post an example block of lines where it doesn't work?
« We can solve any problem by introducing an extra level of indirection »
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,702 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  parsing complex text file anna 1 2,087 Apr-10-2019, 09:54 PM
Last Post: Larz60+
  Parsing file and get a specific text dds69 4 3,095 Nov-12-2018, 08:06 AM
Last Post: dds69
  Parsing Text file having repeated value key pair using python manussnair 3 3,295 Aug-04-2018, 11:48 PM
Last Post: micseydel
  Parsing and Editing a Structured Text File norsemanGrey 1 2,438 Jul-11-2018, 09:51 PM
Last Post: Larz60+
  parsing text with ply (lex/yacc) tool bb8 2 3,250 Feb-25-2018, 06:24 AM
Last Post: bb8
  Parsing Text File standenman 0 2,353 Jan-24-2018, 11:00 PM
Last Post: standenman
  Text file parsing with python and with a list in grammar pitanga 2 3,237 Aug-31-2017, 02:21 PM
Last Post: pitanga
  parsing text for common factor metulburr 4 4,985 Apr-28-2017, 11:04 PM
Last Post: Larz60+
  Parsing text list to csv using delimiter discarding non-interesting data murdock72 3 4,265 Feb-22-2017, 06:38 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020