Python Forum
Removing the unwanted data from a file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Removing the unwanted data from a file
#3
Thanks , that worked perfectly for the format of the sample data. I did run into some problems when the data was different. This is what works for the same type of sample data, plus another type. There may be a better way of addressing this, but have tried to comment here and there

#Removing the unwanted data from a file. First, test the sample data

#string = 'family/Smallville, Robert & Mary/28134: Bioelectro healing - https://t.me/bioelectromagnetic_healing'
string = 'Python Forums/Friends/35361:<a href=3D"https://python-forum.io/thread-35532.html" rel=3D"noreferrer" target=3D='

print(string)
my_list = string.split(' ')     #split the string into a list/array

iter_len = len(my_list)

for m in my_list:           # go through the list & print each element
    print(m)
    
matches = []
 
for match in my_list:       # find the element that has the string 'https' in it
    if "https" in match:
        matches.append(match)
 
print(matches)    

# Often there is no space or dash chars, but html encoding and other strange chars
#       so find the position of the 'https'

# Initializing string
ini_string1 = matches[0]
 
# Character to find
c = "https"
# printing initial string and character
print ("initial_strings : ", ini_string1,
             "\ncharacter_to_find : ", c)
 
# Using index Method
try:
    res = ini_string1.index(c)
    print ("Character {} in string {} is present at {}".format(
                                  c, ini_string1, str(res + 1)))
except ValueError as e:
    print ("No such character available in string {}".format(ini_string1))
    
Output:
Python Forums/Friends/35361:<a href=3D"https://python-forum.io/thread-35532.html" rel=3D"noreferrer" target=3D= Python Forums/Friends/35361:<a href=3D"https://python-forum.io/thread-35532.html" rel=3D"noreferrer" target=3D= ['href=3D"https://python-forum.io/thread-35532.html"'] initial_strings : href=3D"https://python-forum.io/thread-35532.html" character_to_find : https Character https in string href=3D"https://python-forum.io/thread-35532.html" is present at 9
so now I have position 9 as the starting character, to then strip out the URI. Yet it assumes the URI is complete and no 'garbage' at the end of it. I guess it may be nearly ready to put that code to run through the file, print each successfully found 'https' and possibly modify the code further to cater for any gotchas.
Reply


Messages In This Thread
RE: Removing the unwanted data from a file - by jehoshua - Nov-14-2021, 03:27 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to remove unwanted images and tables from a Word file using Python? rownong 2 974 Feb-04-2025, 08:30 AM
Last Post: Pedroski55
Question Unwanted execution of unittest ThomasFab 9 4,696 Nov-15-2022, 05:33 PM
Last Post: snippsat
  HELP on Unwanted CSV Export Output | Using Selenium to Scrape soothsayerpg 0 1,890 Jun-13-2021, 12:23 PM
Last Post: soothsayerpg
  xml file creation from an XML file template and data from an excel file naji_python 1 2,980 Dec-21-2020, 03:24 PM
Last Post: Gribouillis
  How to save CSV file data into the Azure Data Lake Storage Gen2 table? Mangesh121 0 2,739 Jun-26-2020, 11:59 AM
Last Post: Mangesh121
  How to eliminate unwanted spaces Mohan 5 4,539 Jun-04-2020, 08:34 AM
Last Post: buran
  Removing Certain Numbers From File chascp 2 3,069 Feb-07-2020, 04:04 PM
Last Post: chascp
  Unwanted delay between looped synth plays WolfeCreek 1 3,022 Aug-02-2018, 09:24 PM
Last Post: Vysero
  Unwanted variable change in module dannyH 2 3,628 May-08-2018, 05:33 PM
Last Post: dannyH
  Unwanted random generation of scripted Shapes in GASP diemildefreude 3 6,603 Oct-23-2016, 03:11 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020