Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pattern recogintion
#1
i was wondering whether you could help me or direct me to some tutorial where i can learn how to extract dates from a CSV file where there are certain patterns in the data,

i'll make it more understandable:

i have a csv file: AAPL.csv (for example), where i have Date Open High Low Close Adj Close Volume

what i wanna have is a new file where its content will be dates from the original file (AAPL.csv) where there was a 2 day pattern (in this case - where Open (of the second day) < Low (of the first day) and Close(second day) > High (of the first day))

i don't expect you to write me a script for this, i rather simply wanna have something to read which will be more custom made for this kind of operation...

thank you !
tal

btw, i have already installed Numpy, Pandas and TA-Lib

it's simply i don't know how to begin, and i don't wanna be all-over the place with this (i mean, i wanna learn something specific which will enable me to get the job done)

thanks once again...
Reply
#2
search here to start: https://pypi.org/search/?q=%27find+dates%27
Reply
#3
how do i write the condition(s) in a context ?
this is all seem very confusing to me...

i read about reading & writing files using panda, which gives me a bit more of the way,

but i don't know where to write the condition (or how to write it, (using what ? for loop (within another one ?))

Larz60+,

i went over the page you gave me - it is about extracting dates in general,
what i wanna do is extract the dates based on a condition...
Reply
#4
Can you attach a sample file?
Reply
#5
of the csv you mean ?

there seem not to be an option of attaching files here...

maybe if i put a link to where i download the file, then you can have the same file...

a link to yahoo
Reply
#6
I think pandas could do this in a easier way.
But I'm not using this so much much, I have always too look up the docs.


import csv


def condition(first, second):
    try:
        s_open = float(second["Open"])
        f_low = float(first["Low"])
        s_close = float(second["Close"])
        f_high = float(first["High"])
    except ValueError:
        # throw bad data away
        return False
    return s_open < f_low and s_close > f_high


def get_data(csv_file, new_csv):
    with open(csv_file, "rt", encoding="utf8", newline="") as fd_in:
        with open(new_csv, "wt", encoding="utf8", newline="") as fd_out:
            reader = csv.DictReader(fd_in)
            writer = csv.writer(fd_out)
            writer.writerow(reader.fieldnames)
            first_day, second_day = [iter(reader)] * 2
            # a little trick to get the same identical iterator twice
            for first, second in zip(first_day, second_day):
                if condition(first, second):
                    writer.writerow(first.values())
                    writer.writerow(second.values())
The newline="" is something for Windows.
On Linux I don't use this hack.

During testing I've seen that your Data has "null" inside, which should represent a missing values.
In the condition function I catch the ValueError comming from float() and return False to throw bad data away.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#7
to attach file, click on new reply, for some reason doesn't appear otherwise.
But I'll get it from link.
I have some errands that I need to run, will get to this when I return.
In the meantime, please post what you've tried so far.
Reply
#8
Dead_Eye - what you wrote is very impressive, but i still don't know what to do with it in order to make it work...i feel ignorant...

Larz60+ - what Dead_Eye wrote is by far much more than i am able to write...

i tried to attach the file - but it told me it is too a large a file in order to be attached...it wrote the maximum for file attachment is 250k...

by the way Dead_Eye - i am using Linux myself...
Linux Ubuntu...
Reply
#9
I know your situation.

This technique is called chunking.
Output:
0 1 2 3 4 5 6 7 8 9
Think this numbers are your rows.

If you want to compare last with current and step one forward, then it's called windowing.
You don't need to implement this by your self. If you want, you could use more_itertools,
where the implementation is done and safe against corner case errors.
But I think you meant chunking and not windowing.

Output:
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)]
This pattern makes it possible to compare row 0 with row 1 and compare row 1 with row 2 and so on...

The iterator thing:
my_range = range(10)
print(my_range) # -> range(0, 10)

iterators = [iter(my_range)] * 2
print(iterators) # -> [<range_iterator object at 0x7f862e51b870>, <range_iterator object at 0x7f862e51b870>] 
As you can see the first and second range_iterator do have the same ID in memory.
They refer to the same object.

How iter works:
sequence = (1, 2, 3, 4)

iterator = iter(sequence)

print(next(iterator)) # -> 1
print(next(iterator)) # -> 2
print(next(iterator)) # -> 3
print(next(iterator)) # -> 4
print(next(iterator)) # -> StopIteration
Iterable is everything which is a sequence or a collection.
Iterators just do have one method: __next__() and they know the position where they are.


I commented it a bit:
import csv

 
# first is a dict
# second is a dict
def condition(first, second):
    """
    This function return True, if the condition fit, otherwise it returns False
    In the case of missing or wrong data, the function returns False.
    """
    try:
        # A ValueError in this block
        # is catched by `except ValueError`
        s_open = float(second["Open"])    # <- ValueError here
        f_low = float(first["Low"])       # <- ValueError here
        s_close = float(second["Close"])  # <- ValueError here
        f_high = float(first["High"])     # <- ValueError here
        # A KeyError could not happen, because the data comes from
        # the dict reader. Once the keys were parsed in the header of the file
        # they don't change.
    except ValueError: # <- this block is executed, if ValueError was raised in the try block
        # throw bad data away
        return False # <- leave the function here
    # here is the condition
    # this line is only reached if no exception happened
    return s_open < f_low and s_close > f_high
#          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#        Expression, which could be True or False

 
def get_data(csv_file, new_csv):
    """
    Open the csv_file, read always two lines of data,
    check the condition and if the condition is True,
    save the two rows of data in the new_csv
    """

    # usually I use on Linux the shorter form without encoding and newline
    # with open(csv_file) as fd_in: ...
    with open(csv_file, "rt", encoding="utf8", newline="") as fd_in:
        with open(new_csv, "wt", encoding="utf8", newline="") as fd_out:
            # two levels deep
            # we have now fd_in and fd_out, which are both file objects
            # the csv.DictReader does, what the name implies
            # it returns dicts for each iteration
            # where the fields are the keys and the data are the vlaues
            reader = csv.DictReader(fd_in)
            # we use a normal writer (not DictWriter) to write the data back
            writer = csv.writer(fd_out)
            # then write the header to the csv file
            # the DictReader object has the attribute fieldname, which is created
            # automatically
            writer.writerow(reader.fieldnames)

            first_day, second_day = [iter(reader)] * 2
            # a little trick to get the same identical iterator twice
            # The returned object of `iter(something)`
            # is an Iterator, which knows the actual position of the current iteration.
            # The multiplication of the list just copies the references.
            # So you get __identical__ iterators back.
            # The trick is now following:
            #   - the zip function takes one element from `first` and `second`
            #   - when one element was taken from the first iterator
            #     then the second iterator proegress by one
            #   - the second iterator gives you then the next element from the sequence  

            # you need always the zip functio. Look what this does.
            for first, second in zip(first_day, second_day):
                # checking condition, where first and second are dict
                # you see it in the function condition
                if condition(first, second):
                    # keep in mind, that the reader is a DictReader,
                    # which returns the rows as prepared dicts
                    # The keys are the fieldnames
                    # The values are the data
                    # + since Python 3.6 the order is preserved
                    # + since Python 3.7 it's a language feature
                    # before Python 3.6 the dicts are "scrambled"
                    writer.writerow(first.values())
                    writer.writerow(second.values())
I hope this does not lead into a wrong direction.
Hopefully someone posts a two-liner with pandas :-D

But to understand iteration, the difference between iterators and iterables, are essential to understand how Python works.
This may not help you direct with your task, but in future.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#10
Dead_Eye - thank you so much for the consideration,

i ran the code in python 3.7 - and it did not produce any file,
if i try to put AAPL.csv where there is 'csv_file' - it gives me an error as the following:

Error:
File "try6.py", line 16 def get_data(AAPL.csv, new_csv): ^ SyntaxError: invalid syntax
it might be there is no such pattern throughout the chart, but i'm sure i'm gettin something wrong throughout the way...

you know, the thing with learning from websites, or from books for that matter...is that they teach you bits & pieces - but they don't they teach you how to assemble all of these and create a cohesive program/script...this is frustrating...
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020