Is there a way to use a memoryview with regexes in Python 2? -
in python 3, re
module can used memoryview
:
~$ python3 python 3.2.3 (default, feb 20 2013, 14:44:27) [gcc 4.7.2] on linux2 type "help", "copyright", "credits" or "license" more information. >>> x = b"abc" >>> import re >>> re.search(b"b", memoryview(x)) <_sre.sre_match object @ 0x7f14b5fb8988>
however, in python 2, not seem case:
~$ python python 2.7.3 (default, mar 13 2014, 11:03:55) [gcc 4.7.2] on linux2 type "help", "copyright", "credits" or "license" more information. >>> x = "abc" >>> import re >>> re.search(b"b", memoryview(x)) traceback (most recent call last): file "<stdin>", line 1, in <module> file "/usr/lib/python2.7/re.py", line 142, in search return _compile(pattern, flags).search(string) typeerror: expected string or buffer
i can cast string buffer
, looking @ buffer documentation, doesn't mention how buffer
works compared memoryview
.
doing empirical comparison shows using buffer
object in python 2 not offer performance benefits of using memoryview
in python 3:
playground$ cat speed-test.py import timeit import sys print(timeit.timeit("regex.search(mv[10:])", setup=''' import re regex = re.compile(b"abc") python_3 = sys.version_info >= (3, ) if python_3: mv = memoryview(b"can count 3 or sing 'abc?'" * 1024) else: mv = buffer(b"can count 3 or sing 'abc?'" * 1024) ''')) playground$ python2.7 speed-test.py 2.33041596413 playground$ python2.7 speed-test.py 2.3322429657 playground$ python3.2 speed-test.py 0.381270170211792 playground$ python3.2 speed-test.py 0.3775448799133301 playground$
if regex.search
argument changed mv[10:]
mv
, python 2's performance same python 3's, in code i'm writing, there's lots of repeated string slicing.
is there way circumvent issue in python 2 while still having zero-copy performance benefits of memoryview
?
the way understand buffer object in python 2, you’re supposed use without slicing:
>>> s = b"can count 3 or sing 'abc?'" >>> str(buffer(s, 10)) "unt 3 or sing 'abc?'"
so instead of slicing resulting buffer, use buffer function directly perform slicing results in fast access substring interested in:
import timeit import sys import re r = re.compile(b'abc') s = b"can count 3 or sing 'abc?'" * 1024 python_3 = sys.version_info >= (3, ) if len(sys.argv) > 1: # standard slicing print(timeit.timeit("r.search(s[10:])", setup='from __main__ import r, s')) elif python_3: # memoryview in python 3 print(timeit.timeit("r.search(s[10:])", setup='from __main__ import r, s; s = memoryview(s)')) else: # buffer in python 2 print(timeit.timeit("r.search(buffer(s, 10))", setup='from __main__ import r, s'))
i got similar results in python 2 , 3 suggests using buffer
re
module has similar effect newer memoryview
(which seems lazily evaluated buffer):
$ python2 .\speed-test.py 0.681979371561 $ python3 .\speed-test.py 0.5693422508853488
and comparison standard string slicing:
$ python2 .\speed-test.py standard-slicing 7.92006735956 $ python3 .\speed-test.py standard-slicing 7.817641705304309
if want support slice access (so can use same syntax everywhere), create type dynamically creates new buffer when slice on it:
class slicingbuffer: def __init__ (self, source): self.source = source def __getitem__ (self, index): if not isinstance(index, slice): return buffer(self.source, index, 1) elif index.stop none: return buffer(self.source, index.start) else: end = max(index.stop - index.start, 0) return buffer(self.source, index.start, end)
if use re
module, work direct drop-in replacement memoryview
. however, tests show gives large overhead. might want opposite instead , wrap python 3’s memoryview object in wrapper gives same interface buffer
:
def memoryviewbuffer (source, start, end = -1): return source[start:end] python_3 = sys.version_info >= (3, ) if python_3: b = memoryviewbuffer s = memoryview(s) else: b = buffer print(timeit.timeit("r.search(b(s, 10))", setup='from __main__ import r, s, b'))
Comments
Post a Comment