最終更新:2023-08-01 (火) 17:51:39 (292d)
VC4CL
Top / VC4CL
an implementation of the OpenCL 1.2
https://github.com/doe300/VC4CL
対応
- VideoCore IV
- Raspberry Pi 1?
- Raspberry Pi 2
- Raspberry Pi 3
ドキュメント
HWについて
修論らしい
2020/04
- What comes to mind is that the kernel only uses up to 4 of 16 available SIMD-elements (by using a float4 vector). Since there is not yet a built-in auto-vectorization, at most 1/4 of the processing power is actually used. And as always, the main factor in performace is most likely the memory interface, e.g. loading/storing vectors of 16 elements instead of single words can give a 8-10x speed-up.