OpenAI’s DALL-E open-source rival runs on your graphics card
Image: Stable Broadcast
OpenAI’s DALL-E 2 gets free competition. Behind it is an open source AI movement and the Stability AI startup.
Artificial intelligence capable of generating images from textual descriptions has made rapid progress since early 2021. At that time, OpenAI showed impressive results with DALL-E 1 and CLIP. The open source community has used CLIP for many alternative projects throughout the year. Then in 2022, OpenAI released the impressive DALL-E 2, Google showed off Imagen and Parti, Midjourney hit millions, and Craiyon flooded social media with AI images.
Startup Stability AI has now announced the release of Steady broadcastanother DALL-E 2-like system that will initially be gradually made available to new researchers and other groups through a Discord server.
After a test phase, Stable Diffusion will then be released for free – the code and a trained model will be released as open source. There will also be a hosted version with a web interface allowing users to test the system.
Stability AI funds competitor DALL-E 2 for free
Stable Diffusion is the result of a collaboration between researchers from Stability AI, RunwayML, LMU Munich, EleutherAI and LAION. The EleutherAI research collective is known for its open-source language models GPT-J-6B and GPT-NeoX-20B, among others, and also conducts research on multimodal models.
The non-profit organization LAION (Large-scale Artificial Intelligence Open Network) provided the training data with the open source dataset LAION 5B, which the team filtered with human comments in an initial testing phase to create the final LAION-Aesthetics training dataset.
Patrick Esser from Track and Robin Rombach of LMU Munich led the project, building on their work in the CompVis group at the University of Heidelberg. There they created the widely used VQGAN and Latent diffusion. The latter served as the basis for Stable Diffusion with research from OpenAI and Google Brain.
– Stable Broadcast Pics (@DiffusionPics) August 14, 2022
Stability AI, founded in 2020, is supported by mathematician and computer scientist Emad Mostaque. He worked as an analyst for various hedge funds for a few years before turning to public works. In 2019, he participated in the founding of Symmitree, a project that aims to lower the cost of smartphones and Internet access for disadvantaged populations.
With Stability AI and his private fortune, Mostaque aims to foster the open source AI research community. His startup, for example, supported the creation of the “LAION 5B” dataset. To train the stable broadcast model, Stability AI provided servers with 4,000 Nvidia A100 GPUs.
“No one has voting rights except our 75 employees – no billionaires, big money, governments or anyone controlling the company or the communities we support. We are completely independent,” said Mostaque at TechCrunch “We plan to use our compute to accelerate open source fundamental AI.”
Stable Diffusion is an open-source milestone
Currently, a stable release test is underway, with new additions being released in waves. The results, visible on Twitter for example, show that a real competitor to the DALL-E-2 is emerging here.
Unlike DALL-E 2, Stable Diffusion can generate pictures of personalities and other topics that OpenAI prohibits in DALL-E 2. Other systems like Midjourney or Pixelz.ai can also do this, but do not achieve a quality comparable to the great diversity seen in Stable Diffusion – and none of the other systems is only open source.
It appears that #stable broadcast can do some really impressive interpolations between text prompts if you fix initialization noise and slerp between prompt conditioning vectors: pic.twitter.com/lWOOETYVZ3
— Xander Steenbrugge (@xsteenbrugge) August 7, 2022
Stable Diffusion is expected to run on a single graphics card with 5.1 gigabytes of VRAM already, bringing artificial intelligence technology to the edge that until now has only been available through cloud services. Stable Diffusion thus offers researchers and interested parties without access to GPU servers the opportunity to experiment with a modern model of generative AI. The model is also supposed to work on MacBooks with Apple’s M1 chip. However, image generation here takes several minutes instead of seconds.
Stability AI itself also wants to allow companies to train their variation of Stable Diffusion. Multimodal models thus follow the path previously taken by large language models: away from a single supplier and towards the wide availability of numerous alternatives thanks to open source.
Runway is already researching text-to-video editing enabled by Stable Diffusion.
Working on a more permissive version and inpainting checkpoints.
—Patrick Esser (@pess_r) August 11, 2022
Stable Diffusion: Pandora’s Box and Net Benefits
Of course, with open access and the ability to run the model on a widely available GPU, the potential for abuse increases dramatically.
“A percentage of people are just rude and weird, but that’s humanity,” Mostaque said. “Indeed, we believe this technology will be widespread, and the paternalistic and somewhat condescending attitude of many AI aficionados is misguided in not trusting the company.”
Mostaque points out, however, that the free availability allows the community to develop countermeasures.
“We are taking extensive security measures, including developing state-of-the-art tools to help mitigate potential harm in publishing and our own services. With hundreds of thousands of people growing on this model, we are confident that the net benefit will be overwhelmingly positive and as billions of people use this technology, the damage will be undone.
More information is available on the Stable release github. You can find many examples of Stable Diffusion’s imaging capabilities in the Stable Diffusion subreddit. Go here for the Beta registration for Stable Diffusion.