Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
faster code for my code
#11
to all who reply using grep(linux), is that really faster?
so I must use that subprocess.run(grep thing)?
for PowerShell, I tried it, when the data is loaded to RAM, the RAM data is 30x of the original data,
if my data is 100mb, the data loaded to ram is 3gb, and it takes ages to do some "replace"
that is why I am using python, I don't know, but the "OPEN" data file size is relatively same as RAM data size
I can show you the PowerShell code, but I am afraid I will get some punishment if I put PowerShell question in python forum
Reply
#12
(Aug-08-2022, 12:44 AM)kucingkembar Wrote: anyway is there any regex tutorial with lots of example?

Even reading the posts here, in this Forum site, you'll find a tonne of Regex info, but (IMHO) Regex is over used. A good rule of thumb (again, IMHO) is that if you know what you're looking for, then it's not pattern matching, so why use Regex.

Four digits between zero and nine, back-to-back (such as a Year, 2022), in a string object, is pattern matching: re.search('[0-9][0-9][0-9][0-9]', dateString) as that will find any four digit number, but finding specifically the year 2022 in a string, is not pattern matching; you know what you're looking for, so I'd say the same thing with your issue: you know what you're looking for (square brackets), so it's not pattern matching.
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#13
(Aug-08-2022, 07:47 AM)rob101 Wrote:
(Aug-08-2022, 12:44 AM)kucingkembar Wrote: anyway is there any regex tutorial with lots of example?

Even reading the posts here, in this Forum site, you'll find a tonne of Regex info, but (IMHO) Regex is over used. A good rule of thumb (again, IMHO) is that if you know what you're looking for, then it's not pattern matching, so why use Regex.

Four digits between zero and nine, back-to-back (such as a Year, 2022), in a string object, is pattern matching: re.search('[0-9][0-9][0-9][0-9]', dateString) as that will find any four digit number, but finding specifically the year 2022 in a string, is not pattern matching; you know what you're looking for, so I'd say the same thing with your issue: you know what you're looking for (square brackets), so it's not pattern matching.
I know what the regex is, I using it since ms-office
when I study it , they to many explanation theories with minimal examples,
It will be nice is there a page that explain with lot of examples
Reply
#14
How does the data look. You wrote that you want to remove [ and ]. This is not exact enough.
The naive answer without thinking long about this: line.lstrip().removeprefix("[") and line.rstrip().removesuffix("]").
But this will remove all white spaces from left and right.

  1. Are white spaces before [?
  2. Are white spaces after ]?
  3. Keeping white spaces on the right side?
  4. What if the line starts with [[? Are there any special cases?

Post some example data.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#15
(Aug-08-2022, 09:45 AM)kucingkembar Wrote: I know what the regex is, I using it since ms-office
when I study it , they to many explanation theories with minimal examples,
It will be nice is there a page that explain with lot of examples

Personally, I like this site:

https://www.pythontutorial.net/python-regex/
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#16
(Aug-08-2022, 10:18 AM)DeaD_EyE Wrote: How does the data look. You wrote that you want to remove [ and ]. This is not exact enough.
The naive answer without thinking long about this: line.lstrip().removeprefix("[") and line.rstrip().removesuffix("]").
But this will remove all white spaces from left and right.

  1. Are white spaces before [?
  2. Are white spaces after ]?
  3. Keeping white spaces on the right side?
  4. What if the line starts with [[? Are there any special cases?

Post some example data.

I put some code in my post at line 8
lina = line.strip()
if you ask for some data, it similar to this
remove this
[remove this too
this remove too]
[this not remove]
and before this code, I add this code
byte = byte.replace(b'\x5D',b'\x5D\x0A')
byte = byte.replace(b'\x5B',b'\x0A\x5B')
so the [ ] must be at first or/and end of the lines
Reply
#17
(Aug-08-2022, 10:18 AM)rob101 Wrote:
(Aug-08-2022, 09:45 AM)kucingkembar Wrote: I know what the regex is, I using it since ms-office
when I study it , they to many explanation theories with minimal examples,
It will be nice is there a page that explain with lot of examples

Personally, I like this site:

https://www.pythontutorial.net/python-regex/

wow, so many detailed, I will study it, thank you
rob101 likes this post
Reply
#18
Based on your input after the strip, this should keep only the lines where both square-brackts are at the start and the end of the str.

input_data = """

           remove this
     [remove this too
   this remove too]
  [this not remove]
"""

lines = input_data.splitlines()

def is_valid(line):
    return line.startswith("[") and line.endswith("]")


for line in lines:
    stripped = line.strip()
    # skipping empty lines
    if not stripped:
        continue
    if is_valid(stripped):
        print(line)
Output:
[this not remove]
If you print(stripped) you'll get the version without whitespace on left and right side.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#19
(Aug-07-2022, 07:51 AM)ndc85430 Wrote: Is there a reason you're writing this yourself and not using existing tools like, say, grep?
There is a working solution in 8 lines of Python code, why look for another solution? There may be other issues with grep, such as handling unicode. Do you have a working command line for this problem?
Reply
#20
Hi,
This also seems to work , don't know about speed:

import fnmatch
with open('test.txt', 'r') as br:
    for idx, line in enumerate(br):
        if fnmatch.fnmatch(line,'[*\n'):
            print(line)
My 2 cts,
Paul

Edit: not a viable solution probably, because "[ ]" is a reserved format in fnmatch. Looking for a workaround.
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  My code works on Jupyter Lab/Notebook, but NOT on Visual Code Editor jst 4 879 Nov-15-2023, 06:56 PM
Last Post: jst
  python multiple try except block in my code -- can we shorten code mg24 10 5,894 Nov-10-2022, 12:48 PM
Last Post: DeaD_EyE
  Putting code into a function breaks its functionality, though the code is identical! PCesarano 1 1,949 Apr-05-2021, 05:40 PM
Last Post: deanhystad
  HackerRank Problem: Code works on VS Code but not on the HackerRank site Pnerd 3 2,594 Feb-28-2021, 07:12 PM
Last Post: Pnerd
  Converting SQL Code To Python Code Query eddywinch82 13 27,841 Feb-15-2020, 06:42 PM
Last Post: buran
  how can I improve the code to get it faster? aquerci 2 1,671 Feb-15-2020, 02:52 PM
Last Post: aquerci
  code not writing to projNameVal portion of code. umkc1 1 1,644 Feb-05-2020, 10:05 PM
Last Post: Larz60+
  Simple code works in Jupyter but not VS Code Matt_O 2 3,878 Nov-17-2019, 01:15 AM
Last Post: Matt_O
  How does the code run? My code wrong? jollydragon 0 1,719 Oct-10-2019, 06:24 AM
Last Post: jollydragon
  Can someone please help me convert this simple C ROT cipher code to Python code? boohoo9 5 3,390 Jun-14-2019, 03:02 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020