x86 - SSE intrinsics: Convert 32-bit floats to UNSIGNED 8-bit integers -
using sse intrinsics, i've gotten vector of 4 32-bit floats clamped range 0-255 , rounded nearest integer. i'd write 4 out bytes.
there intrinsic _mm_cvtps_pi8 convert 32-bit 8-bit signed int, problem there value on 127 gets clamped 127. can't find instructions clamp unsigned 8-bit values.
i have intuition may want combination of _mm_cvtps_pi16 , _mm_shuffle_pi8 followed move instruction 4 bytes care memory. best way it? i'm going see if can figure out how encode shuffle control mask.
update: following appears want. there better way?
#include <tmmintrin.h> #include <stdio.h> unsigned char out[8]; unsigned char shuf[8] = { 0, 2, 4, 6, 128, 128, 128, 128 }; float ins[4] = {500, 0, 120, 240}; int main() { __m128 x = _mm_load_ps(ins); // load floats __m64 y = _mm_cvtps_pi16(x); // convert them 16-bit ints __m64 sh = *(__m64*)shuf; // shuffle mask register y = _mm_shuffle_pi8(y, sh); // shuffle lower byte of each first 4 bytes *(int*)out = _mm_cvtsi64_si32(y); // store lower 32 bits printf("%d\n", out[0]); printf("%d\n", out[1]); printf("%d\n", out[2]); printf("%d\n", out[3]); return 0; } update2: here's better solution based on harold's answer:
#include <smmintrin.h> #include <stdio.h> unsigned char out[8]; float ins[4] = {10.4, 10.6, 120, 100000}; int main() { __m128 x = _mm_load_ps(ins); // load floats __m128i y = _mm_cvtps_epi32(x); // convert them 32-bit ints y = _mm_packus_epi32(y, y); // pack down 16 bits y = _mm_packus_epi16(y, y); // pack down 8 bits *(int*)out = _mm_cvtsi128_si32(y); // store lower 32 bits printf("%d\n", out[0]); printf("%d\n", out[1]); printf("%d\n", out[2]); printf("%d\n", out[3]); return 0; }
there no direct conversion float byte, _mm_cvtps_pi8 composite. _mm_cvtps_pi16 composite, , in case it's doing pointless stuff undo shuffle. return annoying __m64's.
anyway, can convert dwords (signed, doesn't matter), , pack (unsigned) or shuffle them bytes. _mm_shuffle_(e)pi8 generates pshufb, core2 45nm , amd processors aren't fond of , have mask somewhere.
either way don't have round nearest integer first, convert that. @ least, if haven't messed rounding mode.
using packs: (not tested)
cvtps2dq xmm0, xmm0 packusdw xmm0, xmm0 packuswb xmm0, xmm0 movd somewhere, xmm0 using shuffle: (not tested)
cvtps2dq xmm0, xmm0 pshufb xmm0, [shufmask] movd somewhere, xmm0 shufmask: db 0, 4, 8, 12, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h
Comments
Post a Comment