Python Forum

Full Version: faster code for my code
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
to all who reply using grep(linux), is that really faster?
so I must use that subprocess.run(grep thing)?
for PowerShell, I tried it, when the data is loaded to RAM, the RAM data is 30x of the original data,
if my data is 100mb, the data loaded to ram is 3gb, and it takes ages to do some "replace"
that is why I am using python, I don't know, but the "OPEN" data file size is relatively same as RAM data size
I can show you the PowerShell code, but I am afraid I will get some punishment if I put PowerShell question in python forum
(Aug-08-2022, 12:44 AM)kucingkembar Wrote: [ -> ]anyway is there any regex tutorial with lots of example?

Even reading the posts here, in this Forum site, you'll find a tonne of Regex info, but (IMHO) Regex is over used. A good rule of thumb (again, IMHO) is that if you know what you're looking for, then it's not pattern matching, so why use Regex.

Four digits between zero and nine, back-to-back (such as a Year, 2022), in a string object, is pattern matching: re.search('[0-9][0-9][0-9][0-9]', dateString) as that will find any four digit number, but finding specifically the year 2022 in a string, is not pattern matching; you know what you're looking for, so I'd say the same thing with your issue: you know what you're looking for (square brackets), so it's not pattern matching.
(Aug-08-2022, 07:47 AM)rob101 Wrote: [ -> ]
(Aug-08-2022, 12:44 AM)kucingkembar Wrote: [ -> ]anyway is there any regex tutorial with lots of example?

Even reading the posts here, in this Forum site, you'll find a tonne of Regex info, but (IMHO) Regex is over used. A good rule of thumb (again, IMHO) is that if you know what you're looking for, then it's not pattern matching, so why use Regex.

Four digits between zero and nine, back-to-back (such as a Year, 2022), in a string object, is pattern matching: re.search('[0-9][0-9][0-9][0-9]', dateString) as that will find any four digit number, but finding specifically the year 2022 in a string, is not pattern matching; you know what you're looking for, so I'd say the same thing with your issue: you know what you're looking for (square brackets), so it's not pattern matching.
I know what the regex is, I using it since ms-office
when I study it , they to many explanation theories with minimal examples,
It will be nice is there a page that explain with lot of examples
How does the data look. You wrote that you want to remove [ and ]. This is not exact enough.
The naive answer without thinking long about this: line.lstrip().removeprefix("[") and line.rstrip().removesuffix("]").
But this will remove all white spaces from left and right.

  1. Are white spaces before [?
  2. Are white spaces after ]?
  3. Keeping white spaces on the right side?
  4. What if the line starts with [[? Are there any special cases?

Post some example data.
(Aug-08-2022, 09:45 AM)kucingkembar Wrote: [ -> ]I know what the regex is, I using it since ms-office
when I study it , they to many explanation theories with minimal examples,
It will be nice is there a page that explain with lot of examples

Personally, I like this site:

https://www.pythontutorial.net/python-regex/
(Aug-08-2022, 10:18 AM)DeaD_EyE Wrote: [ -> ]How does the data look. You wrote that you want to remove [ and ]. This is not exact enough.
The naive answer without thinking long about this: line.lstrip().removeprefix("[") and line.rstrip().removesuffix("]").
But this will remove all white spaces from left and right.

  1. Are white spaces before [?
  2. Are white spaces after ]?
  3. Keeping white spaces on the right side?
  4. What if the line starts with [[? Are there any special cases?

Post some example data.

I put some code in my post at line 8
lina = line.strip()
if you ask for some data, it similar to this
remove this
[remove this too
this remove too]
[this not remove]
and before this code, I add this code
byte = byte.replace(b'\x5D',b'\x5D\x0A')
byte = byte.replace(b'\x5B',b'\x0A\x5B')
so the [ ] must be at first or/and end of the lines
(Aug-08-2022, 10:18 AM)rob101 Wrote: [ -> ]
(Aug-08-2022, 09:45 AM)kucingkembar Wrote: [ -> ]I know what the regex is, I using it since ms-office
when I study it , they to many explanation theories with minimal examples,
It will be nice is there a page that explain with lot of examples

Personally, I like this site:

https://www.pythontutorial.net/python-regex/

wow, so many detailed, I will study it, thank you
Based on your input after the strip, this should keep only the lines where both square-brackts are at the start and the end of the str.

input_data = """

           remove this
     [remove this too
   this remove too]
  [this not remove]
"""

lines = input_data.splitlines()

def is_valid(line):
    return line.startswith("[") and line.endswith("]")


for line in lines:
    stripped = line.strip()
    # skipping empty lines
    if not stripped:
        continue
    if is_valid(stripped):
        print(line)
Output:
[this not remove]
If you print(stripped) you'll get the version without whitespace on left and right side.
(Aug-07-2022, 07:51 AM)ndc85430 Wrote: [ -> ]Is there a reason you're writing this yourself and not using existing tools like, say, grep?
There is a working solution in 8 lines of Python code, why look for another solution? There may be other issues with grep, such as handling unicode. Do you have a working command line for this problem?
Hi,
This also seems to work , don't know about speed:

import fnmatch
with open('test.txt', 'r') as br:
    for idx, line in enumerate(br):
        if fnmatch.fnmatch(line,'[*\n'):
            print(line)
My 2 cts,
Paul

Edit: not a viable solution probably, because "[ ]" is a reserved format in fnmatch. Looking for a workaround.
Pages: 1 2