最終更新:2023-08-01 (火) 17:51:39 (271d)  

VC4CL
Top / VC4CL

an implementation of the OpenCL 1.2

https://github.com/doe300/VC4CL

対応

ドキュメント

HWについて

修論らしい

2020/04

  • What comes to mind is that the kernel only uses up to 4 of 16 available SIMD-elements (by using a float4 vector). Since there is not yet a built-in auto-vectorization, at most 1/4 of the processing power is actually used. And as always, the main factor in performace is most likely the memory interface, e.g. loading/storing vectors of 16 elements instead of single words can give a 8-10x speed-up.