Godot Engine threads and optimization

Submitted by dmrokan on

This article discusses the improvement of the performance in terms of framework per seconds (FPS) when heavy calculations are evaluated in Godot Engine threads, by providing an image filtering example which utilizes GDBlas module.

GDBlas is an extension for Godot Game Engine which adds advanced real and complex matrix manipulation routines to the engine. It also adds ordinary differential equation solver and geometry algorithms of Boost C++. It is implemented in C++ and based on Eigen library and Boost C++ library. Although it is highly performant, there is always room for further performance improvement.

Godot Engine covers performance requirements of most use cases. However, it may not be enough if your application evaluates costly matrix operations. In such cases, Godot engine documents proposes a few ways to overcome performance issues.

  1. Implement it in C++ and create GDScript interfaces to access native high performance functionality from a native extension or a custom server embedded into the engine.
  2. Use threads to distribute heavy calculation to the CPU cores.
  3. Better, use both.

GDBlas helps doing the first option by implementing highly optimized linear algebra and image processing in C++. This article explains how to carry out second option by improving 3D demo project provided with GDBlas.

The mentioned 3D demo runs an image filtering algorithm and displays filtered versions of what player sees in small rectangles at the corners of game window. Briefly, steps of the algorithm are:

  1. Capture the texture in the main viewport as an image
  2. Downscale the image
  3. Separate color channels (RGB) which coverts the color channels to the matrices of size 384 columns, 216 rows
  4. Combine RGB matrices to obtain a gray scale image
  5. Filter the gray scale image by using GDBlas's conv method with predefined image filtering kernels
  6. Convert the filtered matrices to gray scale images which can be displayed in Sprite2D nodes
GDBlas 3D demo screenshot
Screenshot from GDBlas's 3D demo

The image at the top left is obtained after an edge detection filter (kernel size is 5x5) is applied. The image on bottom right shows the result when motion blur filter (kernel size is 5x5) is applied.

Improvement

Separating the color channels, calculating the gray scale image and applying convolutions takes around 15-20 milliseconds on my machine with Intel i7-8750H @ 2.20GHz processor. It is computed in Godot's main processing loop by using _process function for the non-threaded version. When the same computation is moved to a separate thread, more than 50% improvement in FPS can be observed.

You can see the difference between implementations in this diff .