simd - ARM NEON count compare result -


i need make parallel compare under uint16x8_t vectors, , increment local variable (counter) according it, example +8 increment, if elements of vector compared true. implement algorithm:

... register int objects = 0; uint16x8_t vcmp0,vobj; uint32x2_t dobj; register uint32_t temp0; ... vobj = vreinterpretq_u16_u8(vcntq_u8(vreinterpretq_u8_u16(vcmp0)));  vobj = vpaddlq_u8(vreinterpretq_u8_u16(vobj));  vobj = vreinterpretq_u16_u32(vpaddlq_u16(vobj));  vobj = vreinterpretq_u16_u64(vpaddlq_u32(vreinterpretq_u32_u16(vobj)));  dobj = vmovn_u64(vreinterpretq_u64_u16(vobj)); dobj = vreinterpret_u32_u64(vpaddl_u32(dobj));     __asm__ __volatile__             (              "vmov.u32  %[temp0] , %[dobj][0]               \n\t"              "add  %[objects] ,%[objects], %[temp0], asr #4               \n\t"              : [dobj]"+w"(dobj), [temp0]"=r"(temp0), [objects]"+r"(objects)              :              : "memory"             );  ... 

vector vcmp0 contains results of compare, vobj, dobj used computation, objects counter. using count of set bits , pairwise add computation. there faster way work?


Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -