Separating unique, stable, samples using pandas - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Separating unique, stable, samples using pandas (/thread-37504.html) |
Separating unique, stable, samples using pandas - keithpfio - Jun-20-2022 I'm using pandas to process a simple two column table that has a series of input and output integer data samples. Not all samples are useful, because I have to wait for the data to stabilize. I'm looking to keep unique I/O pairs that repeat back-to-back for at least 3 rows. My thoughts are to treat this normally with how I would process a list, by iterating through, and then compare the current row with "previous_row", and "row_before_that" and if all three are equal, then add it to a separate list. Then deduplicate the separate list. But this doesn't seem to be the pandas way with the general vectorization principles I've been reading about. I thought about adding new columns that are shifted once and then twice, and then comparing across them. The traditional way would work, but is there a better way? Thanks RE: Separating unique, stable, samples using pandas - keithpfio - Jun-20-2022 For future googlers, here's the method I figured out: df['InputStable'] = df['InputBus'] == df['InputBus'].shift(periods=1, fill_value=0) & df['InputBus'].shift(periods=2, fill_value=0) This creates a new boolean column, which contains true/false based on whether the current value equals the previous value and the one before that. I think I'm binary bitwise ANDing two integers here and then comparing that result, which is probably breaking a thousand rules, but it seems to be working just fine in practice. This utilizes the shift() function in pandas. I also used drop() and drop_duplicates() to get rid repeated rows. |