On linux hosts with Nvidia GPU and CUDA support enabled, a CUDA kernel
is used to convert captured RGBA frames to NV12 before encoding. This
kernel contained a bug affecting image quality, in particular when
rendering high-contrast colored text and sharp lines. See [1] for more
information.
This commit fixes the format conversion kernel by taking 2x2 RGBA blocks
to generate 4 luma (Y) values and 1 chroma (UV) pair, ie: 12 bits per
pixel YUV420 (NV12). Previous code incorrectly generated 1 UV pair for
every 2 pixels.
[1] https://github.com/LizardByte/Sunshine/issues/154