Wix: Integrating LLMs and Diffusion Models for Website Design Automation

Overview

Wix, a leading website building platform serving over 200 million users worldwide, has been actively exploring how AI-powered creativity tools can enhance the website building experience. This case study focuses on their development of the Diffusion Layout Transformer (DLT), a novel framework for automated layout generation that was presented at the International Conference on Computer Vision (ICCV) 2023. While the primary innovation here is in generative AI for graphic design rather than pure LLM operations, the work demonstrates how Wix integrates multiple AI technologies—including Large Language Models for text generation and text-to-image models like DALL-E for visual content—into a cohesive production system for website creation.

The Problem

The traditional website design process involves multiple complex steps: selecting an appropriate layout that balances text and visual content, creating compelling titles and textual content, and incorporating relevant visual assets. While Wix had already developed AI solutions for some of these challenges—such as their AI Text Creator suite that leverages LLMs for generating website copy and integration with text-to-image models like DALL-E for visual content—the layout design step remained a significant bottleneck.

Designing layouts for professional-looking websites is inherently challenging because it requires genuine design expertise and consumes considerable time. Users faced a limiting choice between selecting from pre-designed templates (which lack personalization) or hiring professional designers (which adds cost and complexity). This gap represented an opportunity for Wix’s AI research group to create a more flexible solution that could generate unique, highly personalized layouts rather than merely suggesting template options.

The Solution: Diffusion Layout Transformer (DLT)

Wix’s AI research group developed the Diffusion Layout Transformer (DLT), a general framework for conditioned layout generation. The key innovation lies in its flexible conditioning mechanism that provides users with intuitive yet detailed control over the design process while ensuring high-quality outputs.

Technical Architecture

At the core of DLT is a Transformer encoder architecture. This design choice offers two critical advantages: it makes the model non-autoregressive (meaning it can generate all components simultaneously rather than sequentially), and it allows for flexible conditioning during inference. This flexibility distinguishes DLT from previous layout generation methods.

The model operates on layout representations where each layout consists of a set of components. Each component includes several attributes: category (such as image, title, or button), position, and size. Importantly, real-world applications often incorporate additional attributes like color, text style, and content to achieve modern and creative designs, though the core research focused on the fundamental geometric and categorical attributes.

Joint Discrete-Continuous Diffusion Process

A particularly innovative aspect of DLT is its joint discrete-continuous diffusion process. Unlike previous approaches that handled either continuous or discrete layout attributes separately, DLT provides a unified generative diffusion process that operates jointly on both types:

Continuous attributes: Size and position coordinates are handled through a continuous diffusion process, similar to how image diffusion models like DALL-E operate on pixel values
Discrete attributes: Component categories (e.g., image, text, button) are handled through a discrete diffusion process

The model receives embeddings of components along with a timestamp as inputs. After applying multiple diffusion iterations, it outputs clean coordinates and classes of all components. The training uses a combined loss function that integrates both the discrete and continuous parts, enabling end-to-end learning of the complete layout generation process.

Training Methodology

During training, the model randomly masks certain components within layouts or specific attributes, with the objective of reconstructing the complete layout. This masking approach strengthens the model’s robustness to diverse conditioning scenarios that it might encounter in production use. The training data comes from annotated layout datasets representing various graphic design contexts.

Flexible Conditioning for User Interaction

The conditioning mechanism is essential for real-world production applications involving user interaction. DLT allows practitioners and end-users to fix specific component attributes and have the system generate the remaining attributes. The framework supports multiple conditioning scenarios:

Unconditioned generation: The user specifies only the total number of components, and the model generates the complete layout from scratch. This is particularly useful for inspiration and initial design exploration.
Type-conditioned generation: Users specify the types of components they want (e.g., “I need two images, one title, and a button”) and the model generates appropriate positions and sizes.
Size-conditioned generation: Users specify component sizes and let the model determine optimal positioning.
Partial layout completion: Users place a few components manually and let the generative model complete the remaining layout.

This flexibility means the system can accommodate various user preferences and skill levels, from those who want fully automated design to those who prefer more control over specific elements.

Evaluation and Results

The research team evaluated DLT using popular layout datasets covering diverse graphic design tasks:

PubLayNet: Annotated document images representing scientific and professional documents
RICO: Android UI screens representing mobile application interfaces
Magazine: Digital image layouts representing editorial and magazine-style designs

Evaluation used common design aesthetic metrics that assess factors like alignment, spacing, overlap avoidance, and overall visual harmony. The results demonstrated that DLT outperformed existing solutions on both layout synthesis (generating layouts from scratch) and layout editing tasks (modifying or completing partial layouts). Importantly, the computational complexity remained on par with previous approaches, making it suitable for production deployment.

Broader Applications

While Wix’s primary focus is enhancing the website building experience, the flexible and general design of DLT makes it suitable for a wide variety of graphic design applications beyond websites. These include creating layouts for mobile app user interfaces, generating designs for information slides, magazines, scientific papers, infographics, and even indoor scene layouts. This generality suggests potential for the technology to be applied across multiple product lines or licensed for other use cases.

Integration with the Broader AI Ecosystem at Wix

It’s important to understand DLT within the context of Wix’s broader AI strategy. The company employs multiple AI technologies working together:

LLM-based text generation: The Wix AI Text Creator suite uses Large Language Models to help users create compelling titles and engaging content for their websites
Text-to-image generation: Integration with models like DALL-E allows users to generate fresh and relevant visual content for their sites
Layout generation: DLT automates the structural design aspect of website creation

The vision articulated by Wix is that AI for website creation should extend beyond design alone. It should be combined with a comprehensive system of business functionalities including SEO, analytics, payments, and more. This suggests that the production deployment of these AI systems is part of a larger platform strategy where multiple AI capabilities work together to reduce complexity and create value for users.

Considerations and Limitations

While the case study presents promising results, there are some considerations worth noting. The evaluation was conducted on academic benchmark datasets, and production deployment at scale with real user interactions may present additional challenges around latency, reliability, and user satisfaction that aren’t fully addressed in the research paper. Additionally, the aesthetic metrics used for evaluation, while standard in the research community, may not perfectly correlate with real-world user preferences and design effectiveness.

The case study is primarily a research publication rather than a production deployment case study, so details about actual production infrastructure, A/B testing results with real users, or operational metrics are not provided. However, given Wix’s scale (200+ million users) and their explicit focus on production AI systems, it’s reasonable to assume that successful research outcomes like DLT are intended for eventual production integration.

Future Outlook

Wix explicitly states their belief that AI technologies will bring many opportunities to significantly improve how people and organizations create and maintain their online presence over the coming years. They position the current AI revolution as just beginning to unleash AI’s true potential, with the goal of reducing complexity and creating value for users while driving innovation in the website building space. This suggests ongoing investment in AI capabilities and continued development of production AI systems.

Integrating LLMs and Diffusion Models for Website Design Automation

Industry

Technologies