AMD Recently published a patent for load spreading without rendering between several GPU chiplets. The game scene is divided into individual blocks and distributed into chiplets to optimize the use of shaders in games. For this, two-level stacking of chiplets is used.

AMD publishes patent for GPU chiplet implementation to make better use of shader technology

A new patent published by AMD reveals more about what the company plans to do with next-level GPU and CPU technology in the coming years. At the end of June, it was revealed that fifty-four patent applications would be sent for publication. It is not known which of the more than fifty published patents will be used in AMD’s plans. The applications discussed in the patents detail the company’s approach in later years.

An application mentioned on the website by community member @ETI1120 ComputerBasepatent number US20220207827, discusses critical image data in two stages to efficiently transfer workloads from GPU rendering over multiple chiplets. CPU first filed this with the US Patent Office late last year.

When the image data on the GPU is rasterized by standard means, a shader unit known as the ALU does the same job and assigns a color name to the individual pixels. In turn, texture polygons found at a specific pixel in a given game scene are mapped directly to the pixel. Finally, the designed task will preserve the atypical principles and will differ only through other textures located in different pixels. This method is called SIMD or Single Instruction – Multiple Data.

For most current games, shading is not the only task provided by the GPU. But instead, several post-processing elements are inserted after the initial shading. Actions that the GPU would add would be, for example, anti-aliasing, shading and occlusion of the game environment. However, ray tracing occurs in tandem with shading, creating a new method of calculation.

When talking about the GPU that controls the graphics in current games, the load generated by the computer increases exponentially to thousands of computing units.

In games on GPUs, this computational load is somewhat ideally scaled to a few thousand computational units. It differs from processors in that programs must be specially written to add more cores. A CPU scheduler creates this movement, dividing the work into more digestible tasks that are processed by the compute units from the GPU, also called binning. The image received from the game is displayed and then divided into separate blocks containing a certain number of pixels. A block is calculated by the GPU subsystem, where it is synchronized and generated. After this action, the pixels waiting to be calculated are inserted into the block until the graphics card subunit is finally used. Computational power, memory bandwidth and cache sizes of shaders are considered.

Source: AMD via ComputerBase

AMD explains in the patent that partitioning and coupling requires comprehensive and complete data connectivity between all elements of the GPU, which poses a problem. Transfers of data that are not located in the mold have a high level of latency, which causes the process to be slower.

CPUs have barely made this transition to chiplets due to their ability to send a task to multiple cores, making it available to chiplets. GPUs don’t offer the same flexibility, placing their scheduler comparable to entry-level dual-core processors.

Source: AMD via ComputerBase

AMD recognizes the need and is trying to address these issues by changing the rasterization pipeline and dispatching tasks between multiple GPU chiplets similar to CPUs. For this, the company requires an advanced binning technology that introduces ‘two-level binning’, also known as ‘hybrid binning’.

2-640-90ad82f9
1-640-ba183ee2

In hyper-blocking, the segmentation is processed in two separate stages instead of processing directly pixel-by-pixel blocks. The first step is to calculate the equation, take the 3D environment and create a two-dimensional image from the original. The step is called vertex shading and is completed before rasterization, and the process is quite minimal on the first chiplet of the GPU. Once complete, the game scene begins to assemble, turning into coarse boxes and a single GPU chiplet. The usual tasks such as rasterization and post-processing can then begin.

9-312-37907b0e
7-312-3ba6f6ab

It is not known when AMD intends to start using this new process or if it will be approved. However, it does give us a glimpse into the future of more efficient GPU processing.

News sources: ComputerBase, Free online patents

Leave a Reply

Your email address will not be published. Required fields are marked *