Python Forum

Full Version: Extract only certain text which are needed
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Say for instance I want to extract some words which are required how should I go about doing it with Regular expression or without Regular expression

Data Sample
{"_index":"testdataset","_type":"_doc","_id":"11234567891098646","_score":1,"_source":{"_class":"net.local.host.ca","orderNo":"16536668566434698646","orderDt":"20220527","source":0,"mchntId":"0000000002","mchntOrderNo":"01a3f2b53d16290f41f","appid":"0000000003","payChannelId":"payid","amount":300,"clientIp":"192.168.0.1","currency":"0","subject":"","body":"","cpChannel":"google_upi","timeExpire":1653678578000,"description":"","created":1653657857000,"timePaid":1653657876000,"bankType":"","paySt":2,"refundSt":0,"refundedAmt":0,"checkSt":0,"fee":21,"chnlFee":9,"settleSt":1,"rutId":"0000000098","bankRspDesc":"","bankTransactionId":"20220527212419602959728091384475","credential":"214721678116","notifyUrl":"https://localhost/test/site","pageNotifyUrl":"https://localhost/test/site","notifyCnt":0,"notifySt":0,"openId":"","extra":"","countryId":"Brazil","areaId":"","regionId":"brazil","cityId":"toto","countyId":"","modified":1124657876000,"channlInfoId":"b0768be2248a4aee94ac747c2ab0000","email":"[email protected]","mobile":"100000012457","accountOwner":"Tom Hank","merchantParam":"game","transTp":0,"payAccount":"","payType":"","bankCode":"","settleBatchNo":"2022124578950","startRow":0,"pageSize":0}}
Output needed
amount:300, email:[email protected], mobile:100000012457, accountOwner:Tom Hank
Help this poor dude and leave your btc address I'll send some love thank you
This data is string or dictionary? What have you tried so far?
The data is probably coming from a Json and has been now encoded to a Python dictionary.
Then work with dictionary no regex need.
data = {
  "_index": "testdataset",
  "_type": "_doc",
  "_id": "11234567891098646",
  "_score": 1,
  "_source": {
    "_class": "net.local.host.ca",
    "orderNo": "16536668566434698646",
    "orderDt": "20220527",
    "source": 0,
    "mchntId": "0000000002",
    "mchntOrderNo": "01a3f2b53d16290f41f",
    "appid": "0000000003",
    "payChannelId": "payid",
    "amount": 300,
    "clientIp": "192.168.0.1",
    "currency": "0",
    "subject": "",
    "body": "",
    "cpChannel": "google_upi",
    "timeExpire": 1653678578000,
    "description": "",
    "created": 1653657857000,
    "timePaid": 1653657876000,
    "bankType": "",
    "paySt": 2,
    "refundSt": 0,
    "refundedAmt": 0,
    "checkSt": 0,
    "fee": 21,
    "chnlFee": 9,
    "settleSt": 1,
    "rutId": "0000000098",
    "bankRspDesc": "",
    "bankTransactionId": "20220527212419602959728091384475",
    "credential": "214721678116",
    "notifyUrl": "https://localhost/test/site",
    "pageNotifyUrl": "https://localhost/test/site",
    "notifyCnt": 0,
    "notifySt": 0,
    "openId": "",
    "extra": "",
    "countryId": "Brazil",
    "areaId": "",
    "regionId": "brazil",
    "cityId": "toto",
    "countyId": "",
    "modified": 1124657876000,
    "channlInfoId": "b0768be2248a4aee94ac747c2ab0000",
    "email": "[email protected]",
    "mobile": "100000012457",
    "accountOwner": "Tom Hank",
    "merchantParam": "game",
    "transTp": 0,
    "payAccount": "",
    "payType": "",
    "bankCode": "",
    "settleBatchNo": "2022124578950",
    "startRow": 0,
    "pageSize": 0
  }
}
Use.
>>> data['_id']
11234567891098646

>>> data["_source"]["amount"]
300
>>> data["_source"]["countryId"]
Brazil
(Oct-08-2022, 11:00 AM)snippsat Wrote: [ -> ]The data is probably coming from a Json and has been now encoded to a Python dictionary.
Then work with dictionary no regex need.
data = {
  "_index": "testdataset",
  "_type": "_doc",
  "_id": "11234567891098646",
  "_score": 1,
  "_source": {
    "_class": "net.local.host.ca",
    "orderNo": "16536668566434698646",
    "orderDt": "20220527",
    "source": 0,
    "mchntId": "0000000002",
    "mchntOrderNo": "01a3f2b53d16290f41f",
    "appid": "0000000003",
    "payChannelId": "payid",
    "amount": 300,
    "clientIp": "192.168.0.1",
    "currency": "0",
    "subject": "",
    "body": "",
    "cpChannel": "google_upi",
    "timeExpire": 1653678578000,
    "description": "",
    "created": 1653657857000,
    "timePaid": 1653657876000,
    "bankType": "",
    "paySt": 2,
    "refundSt": 0,
    "refundedAmt": 0,
    "checkSt": 0,
    "fee": 21,
    "chnlFee": 9,
    "settleSt": 1,
    "rutId": "0000000098",
    "bankRspDesc": "",
    "bankTransactionId": "20220527212419602959728091384475",
    "credential": "214721678116",
    "notifyUrl": "https://localhost/test/site",
    "pageNotifyUrl": "https://localhost/test/site",
    "notifyCnt": 0,
    "notifySt": 0,
    "openId": "",
    "extra": "",
    "countryId": "Brazil",
    "areaId": "",
    "regionId": "brazil",
    "cityId": "toto",
    "countyId": "",
    "modified": 1124657876000,
    "channlInfoId": "b0768be2248a4aee94ac747c2ab0000",
    "email": "[email protected]",
    "mobile": "100000012457",
    "accountOwner": "Tom Hank",
    "merchantParam": "game",
    "transTp": 0,
    "payAccount": "",
    "payType": "",
    "bankCode": "",
    "settleBatchNo": "2022124578950",
    "startRow": 0,
    "pageSize": 0
  }
}
Use.
>>> data['_id']
11234567891098646

>>> data["_source"]["amount"]
300
>>> data["_source"]["countryId"]
Brazil

Yes it's a json file
Donating some bitcoin whoever solve this
To be 'literal', this will do what you've asked for...

print(f"amount: {data['_source']['amount']}, email: {data['_source']['email']}, mobile: {data['_source']['mobile']}, accountOwner: {data['_source']['accountOwner']}")
Output:
amount: 300, email: [email protected], mobile: 100000012457, accountOwner: Tom Hank
... but my guess is that you don't want to be that literal.

So, what's your search criteria?
(Oct-09-2022, 06:25 AM)rob101 Wrote: [ -> ]To be 'literal', this will do what you've asked for...

print(f"amount: {data['_source']['amount']}, email: {data['_source']['email']}, mobile: {data['_source']['mobile']}, accountOwner: {data['_source']['accountOwner']}")
Output:
amount: 300, email: [email protected], mobile: 100000012457, accountOwner: Tom Hank
... but my guess is that you don't want to be that literal.

So, what's your search criteria?

NameError: name 'data' is not defined

f = open('df.json', 'r')

content = f.read(f"amount: {data['_source']['amount']}, email: {data['_source']['email']}, mobile: {data['_source']['mobile']}, accountOwner: {data['_source']['accountOwner']}")

print(content)
Okay, so the jason file has yet to be translated; I'll work on that.



What about:
with open ('df.jason', 'r') as f:
    content = f.read()

temp = content.split(',')

for item in temp:
    if 'amount' in item:
        amount = item.strip()
    elif 'email' in item:
        email = item.strip()
    elif 'mobile' in item:
        mobile = item.strip()
    elif 'accountOwner' in item:
        accountOwner = item.strip()

print(amount,email,mobile,accountOwner)
Output:
"amount":300 "email":"[email protected]" "mobile":"100000012457" "accountOwner":"Tom Hank"
(Oct-09-2022, 07:39 AM)rob101 Wrote: [ -> ]Okay, so the jason file has yet to be translated; I'll work on that.
I think there is nothing wrong with the json. You should use the json module for the translating.

(Oct-09-2022, 07:22 AM)Calli Wrote: [ -> ]NameError: name 'data' is not defined
Of course, you should define data.
The whole problem can be solved in 4 lines of code.
import json     # First import the json module.

f = open('df.json', 'r')  # You did that right.
data = json.load(f)
# Now "data" contains the content of the file. According to what you
# showed us, it is a nested dictionary.

# You can now print it like rob101 showed you.
print(f"amount: {data['_source']['amount']}, email: {data['_source']['email']}, mobile: {data['_source']['mobile']}, accountOwner: {data['_source']['accountOwner']}")
Output:
amount: 300, email: [email protected], mobile: 100000012457, accountOwner: Tom Hank
Oooo... does that mean I get some SATS? Big Grin
Pages: 1 2 3