x86 - SSE intrinsics: Convert 32-bit floats to UNSIGNED 8-bit integers -


using sse intrinsics, i've gotten vector of 4 32-bit floats clamped range 0-255 , rounded nearest integer. i'd write 4 out bytes.

there intrinsic _mm_cvtps_pi8 convert 32-bit 8-bit signed int, problem there value on 127 gets clamped 127. can't find instructions clamp unsigned 8-bit values.

i have intuition may want combination of _mm_cvtps_pi16 , _mm_shuffle_pi8 followed move instruction 4 bytes care memory. best way it? i'm going see if can figure out how encode shuffle control mask.

update: following appears want. there better way?

#include <tmmintrin.h> #include <stdio.h>  unsigned char out[8]; unsigned char shuf[8] = { 0, 2, 4, 6, 128, 128, 128, 128 }; float ins[4] = {500, 0, 120, 240};  int main() {     __m128 x = _mm_load_ps(ins);    // load floats     __m64 y = _mm_cvtps_pi16(x);    // convert them 16-bit ints     __m64 sh = *(__m64*)shuf;       // shuffle mask register     y = _mm_shuffle_pi8(y, sh);     // shuffle lower byte of each first 4 bytes     *(int*)out = _mm_cvtsi64_si32(y); // store lower 32 bits      printf("%d\n", out[0]);     printf("%d\n", out[1]);     printf("%d\n", out[2]);     printf("%d\n", out[3]);     return 0; } 

update2: here's better solution based on harold's answer:

#include <smmintrin.h> #include <stdio.h>  unsigned char out[8]; float ins[4] = {10.4, 10.6, 120, 100000};  int main() {        __m128 x = _mm_load_ps(ins);       // load floats     __m128i y = _mm_cvtps_epi32(x);    // convert them 32-bit ints     y = _mm_packus_epi32(y, y);        // pack down 16 bits     y = _mm_packus_epi16(y, y);        // pack down 8 bits     *(int*)out = _mm_cvtsi128_si32(y); // store lower 32 bits      printf("%d\n", out[0]);     printf("%d\n", out[1]);     printf("%d\n", out[2]);     printf("%d\n", out[3]);     return 0; } 

there no direct conversion float byte, _mm_cvtps_pi8 composite. _mm_cvtps_pi16 composite, , in case it's doing pointless stuff undo shuffle. return annoying __m64's.

anyway, can convert dwords (signed, doesn't matter), , pack (unsigned) or shuffle them bytes. _mm_shuffle_(e)pi8 generates pshufb, core2 45nm , amd processors aren't fond of , have mask somewhere.

either way don't have round nearest integer first, convert that. @ least, if haven't messed rounding mode.

using packs: (not tested)

cvtps2dq xmm0, xmm0   packusdw xmm0, xmm0 packuswb xmm0, xmm0 movd somewhere, xmm0 

using shuffle: (not tested)

cvtps2dq xmm0, xmm0 pshufb xmm0, [shufmask] movd somewhere, xmm0  shufmask: db 0, 4, 8, 12, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h 

Comments

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -