Python Forum
Extract only certain text which are needed
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract only certain text which are needed
#21
(Oct-10-2022, 08:07 AM)Calli Wrote: It doesn't work even after adding 4 lines

I'll reiterate: provide a bigger sample (say 10 records) and I'll sort it out: reading 3,000,000 records into memory is not good practice, as we can do this one record at time.
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#22
(Oct-10-2022, 08:11 AM)rob101 Wrote:
(Oct-10-2022, 08:07 AM)Calli Wrote: It doesn't work even after adding 4 lines

I'll reiterate: provide a bigger sample (say 10 records) and I'll sort it out: reading 3,000,000 records into memory is not good practice, as we can do this one record at time.

{"_index":"testdataset","_type":"_doc","_id":"11234567891098646","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16536668566434698646","orderDt":"20220527","source":0,"mchntId":"0000000002","mchntOrderNo":"01a3f2b53d16290f41f","appid":"0000000003","payChannelId":"payid","amount":300,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"google_upi","timeExpire":1653678578000,"description":"","created":1653657857000,"timePaid":1653657876000,"bankType":"","paySt":2,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":21,"chnlFee":9,"settleSt":1,"rutId":"0000000098","bankRspDesc":"","bankTransactionId":"20220527212419602959728091384475","credential":"214721678116","notifyUrl":"https://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"Brazil","areaId":"","regionId":"brazil","cityId":"toto","countyId":"","modified":1124657876000,"channlInfoId":"b0768be2248a4aee94ac747c2ab0000","email":"[email protected]","mobile":"100000012457","accountOwner":"Tom Hank","merchantParam":"game","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"2022124578950","startRow":0,"pageSize":0}}
{"_index":"testdataset","_type":"_doc","_id":"16537381196745423359","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16537381196745423359","orderDt":"20220528","source":0,"mchntId":"0000000070","mchntOrderNo":"205281711586206","appid":"0000000118","payChannelId":"payid","amount":500,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"google_upi","timeExpire":1653764971000,"description":"","created":1653729120000,"timePaid":1653729131000,"bankType":"","paySt":2,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":40,"chnlFee":15,"settleSt":1,"rutId":"0000000045","bankRspDesc":"","bankTransactionId":"20220528171201602789750098488753","credential":"214839251743","notifyUrl":"https://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"UK","areaId":"","regionId":"hs dsk","cityId":"asd","countyId":"","modified":1653729131000,"channlInfoId":"b6b32e2c2dc14fa492810e1a47387a29","email":"[email protected]","mobile":"7845147210","accountOwner":"Cotton Kate","merchantParam":"","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"20220528000001","startRow":0,"pageSize":0}}
{"_index":"testdataset","_type":"_doc","_id":"16537381191385423350","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16537381191385423350","orderDt":"20220528","source":0,"mchntId":"0000000002","mchntOrderNo":"01f9c97994562920a82","appid":"0000000003","payChannelId":"","amount":300,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"","timeExpire":1653815519000,"description":"","created":1653729119000,"bankType":"","paySt":0,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":21,"chnlFee":0,"settleSt":0,"rutId":"","bankRspDesc":"","bankTransactionId":"","credential":"","notifyUrl":"hhttps://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"UK","areaId":"","regionId":"Maharashtra","cityId":"hs","countyId":"","modified":1653729119000,"channlInfoId":"","email":"[email protected]","mobile":"1457845478","accountOwner":"Stefen James","merchantParam":"rummygold","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"","startRow":0,"pageSize":0}}
{"_index":"testdataset","_type":"_doc","_id":"16537381191685423352","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16537381191685423352","orderDt":"20220528","source":0,"mchntId":"0000000037","mchntOrderNo":"42702205281711529003346340502","appid":"0000000112","payChannelId":"payid","amount":100,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"phonepe_upi","timeExpire":1653764971000,"description":"","created":1653729119000,"timePaid":1653729150000,"bankType":"","paySt":2,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":8,"chnlFee":3,"settleSt":1,"rutId":"0000000098","bankRspDesc":"","bankTransactionId":"20220528171203602959283068389297","credential":"214831644044","notifyUrl":"https://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"UK","areaId":"","regionId":"Himachal Pradesh","cityId":"Una","countyId":"","modified":1653729150000,"channlInfoId":"b0768be2248a4aee94ac747c2ab45878","email":"[email protected]","mobile":"1457812014","accountOwner":"Michel","merchantParam":"","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"20220528000001","startRow":0,"pageSize":0}}
{"_index":"testdataset","_type":"_doc","_id":"16537381191715423351","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16537381191715423351","orderDt":"20220528","source":0,"mchntId":"0000000037","mchntOrderNo":"44602205281711569669753200502","appid":"0000000112","payChannelId":"payid","amount":100,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"phonepe_upi","timeExpire":1653815519000,"description":"","created":1653729119000,"bankType":"","paySt":0,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":8,"chnlFee":3,"settleSt":0,"rutId":"0000000056","bankRspDesc":"","bankTransactionId":"20220528171202602889028098613873","credential":"","notifyUrl":"https://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"UK","areaId":"","regionId":"United kingdom","cityId":"ac","countyId":"","modified":1653729119000,"channlInfoId":"f63243ecdff349c5871c51c060a11954","email":"[email protected]","mobile":"4578412457","accountOwner":"Tom Willims","merchantParam":"","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"","startRow":0,"pageSize":0}}
rob101 likes this post
Reply
#23
@Calli

The data is not consistent.

Is there anything else I need to know about, do you think?
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#24
(Oct-10-2022, 08:07 AM)Calli Wrote: It doesn't work even after adding 4 lines
Not 4 lines, I said 4 spaces.
Reply
#25
Maybe this?

with open ('data', 'r', ) as f:
    content = 'start'
    while content:
        content = f.readline()
        temp = content.split(',')
        amount = email = mobile = accountOwner = ''
        for item in temp:
            if 'amount' in item:
                amount = item
            elif 'email' in item:
                email = item
            elif 'mobile' in item:
                mobile = item
            elif 'accountOwner' in item:
                accountOwner = item
            if amount and email and mobile and accountOwner:
                print(f"{amount} {email} {mobile} {accountOwner}")
                amount = email = mobile = accountOwner = ''
Output:
"amount":300 "email":"[email protected]" "mobile":"100000012457" "accountOwner":"Tom Hank" "amount":500 "email":"[email protected]" "mobile":"7845147210" "accountOwner":"Cotton Kate" "amount":300 "email":"[email protected]" "mobile":"1457845478" "accountOwner":"Stefen James" "amount":100 "email":"[email protected]" "mobile":"1457812014" "accountOwner":"Michel" "amount":100 "email":"[email protected]" "mobile":"4578412457" "accountOwner":"Tom Willims"
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#26
(Oct-10-2022, 09:54 AM)rob101 Wrote: Maybe this?

with open ('data', 'r', ) as f:
    content = 'start'
    while content:
        content = f.readline()
        temp = content.split(',')
        amount = email = mobile = accountOwner = ''
        for item in temp:
            if 'amount' in item:
                amount = item
            elif 'email' in item:
                email = item
            elif 'mobile' in item:
                mobile = item
            elif 'accountOwner' in item:
                accountOwner = item
            if amount and email and mobile and accountOwner:
                print(f"{amount} {email} {mobile} {accountOwner}")
                amount = email = mobile = accountOwner = ''
Output:
"amount":300 "email":"[email protected]" "mobile":"100000012457" "accountOwner":"Tom Hank" "amount":500 "email":"[email protected]" "mobile":"7845147210" "accountOwner":"Cotton Kate" "amount":300 "email":"[email protected]" "mobile":"1457845478" "accountOwner":"Stefen James" "amount":100 "email":"[email protected]" "mobile":"1457812014" "accountOwner":"Michel" "amount":100 "email":"[email protected]" "mobile":"4578412457" "accountOwner":"Tom Willims"

This worked well DM me your btc address and your telegram id if you have.
rob101 likes this post
Reply
#27
Instead using file.readline():
with open ('data', 'r', ) as f:
    content = 'start'
    while content:
        content = f.readline()
Use the lazy iterator "in":
with open ('data', 'r', ) as f:
    for content in f:
Why ignore that the file is a known format and a parser is available.
from io import StringIO
import json

# Simulate getting json strings from the file one line at a time.
file = StringIO(
"""{"_index":"testdataset","_type":"_doc","_id":"11234567891098646","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16536668566434698646","orderDt":"20220527","source":0,"mchntId":"0000000002","mchntOrderNo":"01a3f2b53d16290f41f","appid":"0000000003","payChannelId":"payid","amount":300,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"google_upi","timeExpire":1653678578000,"description":"","created":1653657857000,"timePaid":1653657876000,"bankType":"","paySt":2,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":21,"chnlFee":9,"settleSt":1,"rutId":"0000000098","bankRspDesc":"","bankTransactionId":"20220527212419602959728091384475","credential":"214721678116","notifyUrl":"https://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"Brazil","areaId":"","regionId":"brazil","cityId":"toto","countyId":"","modified":1124657876000,"channlInfoId":"b0768be2248a4aee94ac747c2ab0000","email":"[email protected]","mobile":"100000012457","accountOwner":"Tom Hank","merchantParam":"game","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"2022124578950","startRow":0,"pageSize":0}}
{"_index":"testdataset","_type":"_doc","_id":"16537381196745423359","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16537381196745423359","orderDt":"20220528","source":0,"mchntId":"0000000070","mchntOrderNo":"205281711586206","appid":"0000000118","payChannelId":"payid","amount":500,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"google_upi","timeExpire":1653764971000,"description":"","created":1653729120000,"timePaid":1653729131000,"bankType":"","paySt":2,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":40,"chnlFee":15,"settleSt":1,"rutId":"0000000045","bankRspDesc":"","bankTransactionId":"20220528171201602789750098488753","credential":"214839251743","notifyUrl":"https://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"UK","areaId":"","regionId":"hs dsk","cityId":"asd","countyId":"","modified":1653729131000,"channlInfoId":"b6b32e2c2dc14fa492810e1a47387a29","email":"[email protected]","mobile":"7845147210","accountOwner":"Cotton Kate","merchantParam":"","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"20220528000001","startRow":0,"pageSize":0}}
{"_index":"testdataset","_type":"_doc","_id":"16537381191385423350","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16537381191385423350","orderDt":"20220528","source":0,"mchntId":"0000000002","mchntOrderNo":"01f9c97994562920a82","appid":"0000000003","payChannelId":"","amount":300,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"","timeExpire":1653815519000,"description":"","created":1653729119000,"bankType":"","paySt":0,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":21,"chnlFee":0,"settleSt":0,"rutId":"","bankRspDesc":"","bankTransactionId":"","credential":"","notifyUrl":"hhttps://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"UK","areaId":"","regionId":"Maharashtra","cityId":"hs","countyId":"","modified":1653729119000,"channlInfoId":"","email":"[email protected]","mobile":"1457845478","accountOwner":"Stefen James","merchantParam":"rummygold","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"","startRow":0,"pageSize":0}}
{"_index":"testdataset","_type":"_doc","_id":"16537381191685423352","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16537381191685423352","orderDt":"20220528","source":0,"mchntId":"0000000037","mchntOrderNo":"42702205281711529003346340502","appid":"0000000112","payChannelId":"payid","amount":100,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"phonepe_upi","timeExpire":1653764971000,"description":"","created":1653729119000,"timePaid":1653729150000,"bankType":"","paySt":2,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":8,"chnlFee":3,"settleSt":1,"rutId":"0000000098","bankRspDesc":"","bankTransactionId":"20220528171203602959283068389297","credential":"214831644044","notifyUrl":"https://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"UK","areaId":"","regionId":"Himachal Pradesh","cityId":"Una","countyId":"","modified":1653729150000,"channlInfoId":"b0768be2248a4aee94ac747c2ab45878","email":"[email protected]","mobile":"1457812014","accountOwner":"Michel","merchantParam":"","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"20220528000001","startRow":0,"pageSize":0}}
{"_index":"testdataset","_type":"_doc","_id":"16537381191715423351","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16537381191715423351","orderDt":"20220528","source":0,"mchntId":"0000000037","mchntOrderNo":"44602205281711569669753200502","appid":"0000000112","payChannelId":"payid","amount":100,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"phonepe_upi","timeExpire":1653815519000,"description":"","created":1653729119000,"bankType":"","paySt":0,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":8,"chnlFee":3,"settleSt":0,"rutId":"0000000056","bankRspDesc":"","bankTransactionId":"20220528171202602889028098613873","credential":"","notifyUrl":"https://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"UK","areaId":"","regionId":"United kingdom","cityId":"ac","countyId":"","modified":1653729119000,"channlInfoId":"f63243ecdff349c5871c51c060a11954","email":"[email protected]","mobile":"4578412457","accountOwner":"Tom Willims","merchantParam":"","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"","startRow":0,"pageSize":0}}
""")

for line in file:
    # Convert json string to a dictionary, pull out the fields of interest, convert resulting dict to string and strip brackets.
    source = json.loads(line)["_source"]
    info = {field:source[field] for field in ["amount", "email", "mobile", "accountOwner"]}
    print(str(info).strip("{}"))
Output:
'amount': 300, 'email': '[email protected]', 'mobile': '100000012457', 'accountOwner': 'Tom Hank' 'amount': 500, 'email': '[email protected]', 'mobile': '7845147210', 'accountOwner': 'Cotton Kate' 'amount': 300, 'email': '[email protected]', 'mobile': '1457845478', 'accountOwner': 'Stefen James' 'amount': 100, 'email': '[email protected]', 'mobile': '1457812014', 'accountOwner': 'Michel' 'amount': 100, 'email': '[email protected]', 'mobile': '4578412457', 'accountOwner': 'Tom Willims'
For reading from a file you might want to catch json decoding exceptions to get past the brackets at the very start and end of the file.
import json
with open("data.json", "r") as file:
    for line in file:
        try:
            source = json.loads(line)["_source"]
            info = {field:source[field] for field in ["amount", "email", "mobile", "accountOwner"]}
            print(str(info).strip("{}"))
        except json.decoder.JSONDecodeError:
            pass
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  extract only text strip byte array Pir8Radio 7 2,999 Nov-29-2022, 10:24 PM
Last Post: Pir8Radio
  Extract text rektcol 6 1,691 Jun-28-2022, 08:57 AM
Last Post: Gribouillis
  Extract a string between 2 words from a text file OscarBoots 2 1,885 Nov-02-2021, 08:50 AM
Last Post: ibreeden
  Extract text based on postion and pattern guddu_12 2 1,644 Sep-27-2021, 08:32 PM
Last Post: guddu_12
  Extract specific sentences from text file Bubly 3 3,422 May-31-2021, 06:55 PM
Last Post: Larz60+
  extract color text from PDF Maha 0 2,082 May-31-2021, 04:05 PM
Last Post: Maha
Question How to extract multiple text from a string? chatguy 2 2,392 Feb-28-2021, 07:39 AM
Last Post: bowlofred
  How to extract a single word from a text file buttercup 7 3,607 Jul-22-2020, 04:45 AM
Last Post: bowlofred
  How to extract specific rows and columns from a text file with Python Farhan 0 3,396 Mar-25-2020, 09:18 PM
Last Post: Farhan
  Extract Strings From Text File - Out Put Results to Individual Files dj99 8 4,958 Jun-28-2018, 10:41 AM
Last Post: dj99

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020