Is there a way to use a memoryview with regexes in Python 2? -


in python 3, re module can used memoryview:

~$ python3 python 3.2.3 (default, feb 20 2013, 14:44:27) [gcc 4.7.2] on linux2 type "help", "copyright", "credits" or "license" more information. >>> x = b"abc" >>> import re >>> re.search(b"b", memoryview(x)) <_sre.sre_match object @ 0x7f14b5fb8988> 

however, in python 2, not seem case:

~$ python python 2.7.3 (default, mar 13 2014, 11:03:55) [gcc 4.7.2] on linux2 type "help", "copyright", "credits" or "license" more information. >>> x = "abc" >>> import re >>> re.search(b"b", memoryview(x)) traceback (most recent call last):   file "<stdin>", line 1, in <module>   file "/usr/lib/python2.7/re.py", line 142, in search     return _compile(pattern, flags).search(string) typeerror: expected string or buffer 

i can cast string buffer, looking @ buffer documentation, doesn't mention how buffer works compared memoryview.

doing empirical comparison shows using buffer object in python 2 not offer performance benefits of using memoryview in python 3:

playground$ cat speed-test.py import timeit import sys  print(timeit.timeit("regex.search(mv[10:])", setup=''' import re regex = re.compile(b"abc") python_3 = sys.version_info >= (3, ) if python_3:     mv = memoryview(b"can count 3 or sing 'abc?'" * 1024) else:     mv = buffer(b"can count 3 or sing 'abc?'" * 1024) ''')) playground$ python2.7 speed-test.py 2.33041596413 playground$ python2.7 speed-test.py 2.3322429657 playground$ python3.2 speed-test.py 0.381270170211792 playground$ python3.2 speed-test.py 0.3775448799133301 playground$ 

if regex.search argument changed mv[10:] mv, python 2's performance same python 3's, in code i'm writing, there's lots of repeated string slicing.

is there way circumvent issue in python 2 while still having zero-copy performance benefits of memoryview?

the way understand buffer object in python 2, you’re supposed use without slicing:

>>> s = b"can count 3 or sing 'abc?'" >>> str(buffer(s, 10)) "unt 3 or sing 'abc?'" 

so instead of slicing resulting buffer, use buffer function directly perform slicing results in fast access substring interested in:

import timeit import sys import re  r = re.compile(b'abc') s = b"can count 3 or sing 'abc?'" * 1024  python_3 = sys.version_info >= (3, ) if len(sys.argv) > 1: # standard slicing     print(timeit.timeit("r.search(s[10:])", setup='from __main__ import r, s')) elif python_3: # memoryview in python 3     print(timeit.timeit("r.search(s[10:])", setup='from __main__ import r, s; s = memoryview(s)')) else: # buffer in python 2     print(timeit.timeit("r.search(buffer(s, 10))", setup='from __main__ import r, s')) 

i got similar results in python 2 , 3 suggests using buffer re module has similar effect newer memoryview (which seems lazily evaluated buffer):

$ python2 .\speed-test.py 0.681979371561 $ python3 .\speed-test.py 0.5693422508853488 

and comparison standard string slicing:

$ python2 .\speed-test.py standard-slicing 7.92006735956 $ python3 .\speed-test.py standard-slicing 7.817641705304309 

if want support slice access (so can use same syntax everywhere), create type dynamically creates new buffer when slice on it:

class slicingbuffer:     def __init__ (self, source):         self.source = source     def __getitem__ (self, index):         if not isinstance(index, slice):             return buffer(self.source, index, 1)         elif index.stop none:             return buffer(self.source, index.start)         else:             end = max(index.stop - index.start, 0)             return buffer(self.source, index.start, end) 

if use re module, work direct drop-in replacement memoryview. however, tests show gives large overhead. might want opposite instead , wrap python 3’s memoryview object in wrapper gives same interface buffer:

def memoryviewbuffer (source, start, end = -1):     return source[start:end]  python_3 = sys.version_info >= (3, ) if python_3:     b = memoryviewbuffer     s = memoryview(s) else:     b = buffer  print(timeit.timeit("r.search(b(s, 10))", setup='from __main__ import r, s, b')) 

Comments

Popular posts from this blog

jquery - How do you format the date used in the popover widget title of FullCalendar? -

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -