-
~4x performance gain just by understanding how PyOpenCL arrays work! Rendering is still the slowest part, this video is captured in real time but with rendering turned off the process takes 52ms only.
-
If I can share the buffer with an OpenGL frag shader then the memory can stay on the GPU for the whole time and I should be able to run at (nearly) the same speed :)