-
~4x performance gain just by understanding how PyOpenCL arrays work! Rendering is still the slowest part, this video is captured in real time but with rendering turned off the process takes 52ms only.
-
the algorithm is not speeding up towards the end btw, thats just the renderer getting faster because every 'uncertain' tile is up to 21 tiles rendered on top of each other (each tile again consisting of up to four rotations of the same image)
-
If I can share the buffer with an OpenGL frag shader then the memory can stay on the GPU for the whole time and I should be able to run at (nearly) the same speed :)