python - Defining a custom pandas aggregation function using Cython -


i have big dataframe in pandas 3 columns: 'col1' string, 'col2' , 'col3' numpy.int64. need groupby, apply custom aggregation function using apply, follows:

pd = pandas.read_csv(...) groups = pd.groupby('col1').apply(my_custom_function) 

each group can seen numpy array 2 integers columns 'col2' , 'col3'. understand doing, can think of each row ('col2','col3') time interval; checking whether there no intervals intersecting. first sort array first column, test whether second column value @ index i smaller first column value @ index + 1.

first question: idea use cython define custom aggregate function. idea?

i tried following definition in .pyx file:

cimport nump c_np  def c_my_custom_function(my_group_df):     cdef py_ssize_t l = len(my_group_df.index)     if l < 2:         return false      cdef c_np.int64_t[:, :] temp_array     temp_array = my_group_df[['col2','col3']].sort(columns='col2').values     cdef py_ssize_t      in range(l - 1):         if temp_array[i, 1] > temp_array[i + 1, 0]:             return true     return false 

i defined version in pure python/pandas:

def my_custom_function(my_group_df):     l = len(my_group_df.index)     if l < 2:         return false      temp_array = my_group_df[['col2', 'col3']].sort(columns='col2').values      in range(l - 1):         if temp_array[i, 1] > temp_array[i + 1, 0]:             return true     return false 

second question: timed 2 versions, , both take same time. cython version not seem speed anything. happening?

bonus question: see better way implement algorithm?

a vector numpy test be:

np.any(temp_array[:-1,1]>temp_array[1:,0]) 

whether better python or cython iteration depends on true occurs, if @ all. if return @ step in iteration, iteration better. , cython version won't have of advantage. test step faster sort step.

but if iteration steps way through, vector test faster python iteration, , faster sort. may though slower coded cython iteration.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -