最終更新:2012-03-02 (金) 12:04:32 (2845d)  

smmintrin.h はてなブックマークを見る
Top / smmintrin.h

SSE4.1ヘッダファイル

Principal header file for Intel(R) Core(TM) 2 Duo processor SSE4.1 intrinsics

関数

MACRO functions for ceil/floor intrinsics

  • _mm_ceil_pd?(val) _mm_round_pd?(val)
  • _mm_ceil_sd?(dst, val) _mm_round_sd?(dst, val)
  • _mm_floor_pd?(val) _mm_round_pd?(val)
  • _mm_floor_sd?(dst, val) _mm_round_sd?(dst, val)
  • _mm_ceil_ps?(val) _mm_round_ps?(val)
  • _mm_ceil_ss?(dst, val) _mm_round_ss?(dst, val)
  • _mm_floor_ps?(val) _mm_round_ps?(val)
  • _mm_floor_ss?(dst, val) _mm_round_ss?(dst, val)
  • _mm_test_all_zeros?(mask, val) _mm_testz_si128?(mask, val)

MACRO functions for packed integer 128-bit comparison intrinsics.

  • _mm_test_all_ones?(val)
    • _mm_testc_si128?(val, _mm_cmpeq_epi32?(val,val))
  • _mm_test_mix_ones_zeros?(mask, val) _mm_testnzc_si128?(mask, val)

Integer blend instructions - select data from 2 sources using constant/variable mask

Float single precision blend instructions - select data from 2 sources using constant/variable mask

  • __m128 _mm_blend_ps?(__m128 v1, __m128 v2, const int mask);
  • __m128 _mm_blendv_ps?(__m128 v1, __m128 v2, __m128 v3);

Float double precision blend instructions - select data from 2 sources using constant/variable mask

  • __m128d _mm_blend_pd?(__m128d v1, __m128d v2, const int mask);
  • __m128d _mm_blendv_pd?(__m128d v1, __m128d v2, __m128d v3);

Dot product instructions with mask-defined summing and zeroing of result's parts

  • __m128 _mm_dp_ps?(__m128 val1, __m128 val2, const int mask);
  • __m128d _mm_dp_pd?(__m128d val1, __m128d val2, const int mask);

Packed integer 64-bit comparison, zeroing or filling with ones corresponding parts of result

  • __m128i _mm_cmpeq_epi64?(__m128i val1, __m128i val2);

Min/max packed integer instructions

  • __m128i _mm_min_epi8?(__m128i val1, __m128i val2);
  • __m128i _mm_max_epi8?(__m128i val1, __m128i val2);
  • __m128i _mm_min_epu16?(__m128i val1, __m128i val2);
  • __m128i _mm_max_epu16?(__m128i val1, __m128i val2);
  • __m128i _mm_min_epi32?(__m128i val1, __m128i val2);
  • __m128i _mm_max_epi32?(__m128i val1, __m128i val2);
  • __m128i _mm_min_epu32?(__m128i val1, __m128i val2);
  • __m128i _mm_max_epu32?(__m128i val1, __m128i val2);

Packed integer 32-bit multiplication with truncation of upper halves of results

  • __m128i _mm_mullo_epi32?(__m128i a, __m128i b);

Packed integer 32-bit multiplication of 2 pairs of operands producing two 64-bit results

  • __m128i _mm_mul_epi32?(__m128i a, __m128i b);

Packed integer 128-bit bitwise comparison. return 1 if(val 'and' mask) == 0

  • int _mm_testz_si128?(__m128i mask, __m128i val);

Packed integer 128-bit bitwise comparison. return 1 if(val 'and_not' mask) == 0

  • int _mm_testc_si128?(__m128i mask, __m128i val);

Packed integer 128-bit bitwise comparison

ZF =((val 'and' mask) == 0)  CF =((val 'and_not' mask) == 0)
return 1 if both ZF and CF are 0
  • int _mm_testnzc_si128?(__m128i mask, __m128i s2);

Insert single precision float into packed single precision array element selected by index.

The bits [7-6] of the 3d parameter define src index,
the bits [5-4] define dst index, and bits [3-0] define zeroing mask for dst
  • __m128 _mm_insert_ps?(__m128 dst, __m128 src, const int ndx);

Helper macro to create ndx-parameter value for _mm_insert_ps

  • _MM_MK_INSERTPS_NDX?(srcField, dstField, zeroMask) ((srcField<<6) |(dstField<<4) |zeroMask)

Extract binary representation of single precision float from packed single precision array element selected by index

  • int _mm_extract_ps?(__m128 src, const int ndx);

Extract single precision float from packed single precision array element selected by index into dest

  • _MM_EXTRACT_FLOAT?(dest, src, ndx)((int*)&dest) = _mm_extract_ps?(src,ndx)

Extract specified single precision float element into the lower part of __m128

  • _MM_PICK_OUT_PS?(src, num) _mm_insert_ps?(_mm_setzero_ps?(),src, _MM_MK_INSERTPS_NDX?(num, 0, 0x0e))

Insert integer into packed integer array element selected by index

  • __m128i _mm_insert_epi8?(__m128i dst, int s, const int ndx);
  • __m128i _mm_insert_epi32?(__m128i dst, int s, const int ndx);
  • __m128i _mm_insert_epi64?(__m128i dst, __int64 s, const int ndx);

Extract integer from packed integer array element selected by index

  • __int64? _mm_extract_epi64?(__m128i src, const int ndx);

Horizontal packed word minimum and its index in result[15:0] and result[18:16] respectively

  • __m128i _mm_minpos_epu16?(__m128i shortValues);

Packed/single float double precision rounding

  • __m128d _mm_round_pd?(__m128d val, int iRoundMode?);
  • __m128d _mm_round_sd?(__m128d dst, __m128d val, int iRoundMode?);

Packed/single float single precision rounding

  • __m128 _mm_round_ps?(__m128 val, int iRoundMode?);
  • __m128 _mm_round_ss?(__m128 dst, __m128 val, int iRoundMode?);

Packed integer sign-extension

  • __m128i _mm_cvtepi8_epi32?(__m128i byteValues);
  • __m128i _mm_cvtepi16_epi32?(__m128i shortValues);
  • __m128i _mm_cvtepi8_epi64?(__m128i byteValues);
  • __m128i _mm_cvtepi32_epi64?(__m128i intValues);
  • __m128i _mm_cvtepi16_epi64?(__m128i shortValues);
  • __m128i _mm_cvtepi8_epi16?(__m128i byteValues);

Packed integer zero-extension

  • __m128i _mm_cvtepu8_epi32?(__m128i byteValues);
  • __m128i _mm_cvtepu16_epi32?(__m128i shortValues);
  • __m128i _mm_cvtepu8_epi64?(__m128i shortValues);
  • __m128i _mm_cvtepu32_epi64?(__m128i intValues);
  • __m128i _mm_cvtepu16_epi64?(__m128i shortValues);
  • __m128i _mm_cvtepu8_epi16?(__m128i byteValues);

Pack 8 double words from 2 operands into 8 words of result with unsigned saturation

  • __m128i _mm_packus_epi32?(__m128i val1, __m128i val2);

Sum absolute 8-bit integer difference of adjacent groups of 4 byte integers in operands. Starting offsets within operands are determined by mask

  • __m128i _mm_mpsadbw_epu8?(__m128i s1, __m128i s2, const int msk);

Load double quadword using non-temporal aligned hint

  • __m128i _mm_stream_load_si128?(__m128i* v1);

関連