Compression description
To enlarge / These jagged, colorful blocks resemble the concept of image compression.

Benj Edwards / Ars Technica

Swiss software engineer Matthias Bühlmann last week discovered popular image synthesis model Stable diffusion although with significant caveats, it can compress existing bitmap images with fewer visual artifacts than JPEG or WebP at high compression ratios.

Stable diffusion is one AI image synthesis model usually creates images based on text descriptions (called “requests”). The AI ​​model learned this skill by studying millions of images taken from the Internet. During the training process, the model creates statistical associations between images and related words, creates a smaller representation of key information about each image, and stores them as “weights,” which are mathematical values ​​that represent what the AI ​​image model knows. to talk

When Stable Diffusion analyzes images and “compresses” them into weight form, they reside in what researchers call “hidden space,” which is a way of saying that they exist as a kind of fuzzy potential that, once decoded, can be realized in images. . With stable Diffusion 1.4, the weights file is about 4 GB, but it represents knowledge of hundreds of millions of images.

Examples of using Stable Diffusion to compress images.
To enlarge / Examples of using Stable Diffusion to compress images.

While most people use Stable Diffusion with text suggestions, Bühlmann ditched the text encoder and instead forced his images through Stable Diffusion’s image encoder process, which takes a low-resolution 512×512 image and converts it to a higher-resolution 64×64 hidden image. representation of space. At this point, the image is available at a smaller data size than the original, but can still be expanded (decoded) to a 512 × 512 image with fairly good results.

In his tests, Bühlmann found that images compressed with Stable Diffusion subjectively looked better at higher compression ratios (smaller file size) than JPEG or WebP. In one example, it shows an image of a candy store compressed to 5.68KB using JPEG, 5.71KB using WebP, and 4.98KB using Fixed Diffusion. A fixed diffusion image has more resolved detail and fewer obvious compression artifacts than those compressed in other formats.

Experimental examples of using Stable Diffusion to compress images.  SD results are on the far right.
To enlarge / Experimental examples of using Stable Diffusion to compress images. SD results are on the far right.

Bühlmann’s method currently comes with significant limitations, however: It doesn’t do well with faces or text, and in some cases it can hallucinate detailed features in the decoded image that aren’t in the source image. (You probably don’t want your image compressor to invent details in an image that doesn’t exist.) Also, decoding requires a 4GB Stable Diffusion weights file and additional decoding time.

While this use of Stable Diffusion is more of an unconventional and fun hack than a practical solution, it could potentially point to future uses of image synthesis models. May be a Bühlmann code Found on Google Colab, and more technical details about his experiment can be found in his paper Write an article towards artificial intelligence.

Leave a Reply

Your email address will not be published. Required fields are marked *