In 1947, André Malraux published Le Musée Imaginaire, arguing that the photographic reproduction of artworks had created a "museum without walls" — a body of imagery freed from any single physical institution, available to be seen, compared, and contemplated by anyone with access to a book. Each subsequent generation of image technology has extended Malraux's premise. The slide projector brought the museum into the seminar room. The internet brought it into the home. High-resolution digitization brought it into the pocket. The question worth asking now is what unified multimodal artificial intelligence brings into the museum without walls — and what the museum becomes when its walls can also be generated.
On May 19, 2026, Google is expected to announce
Gemini Omni, a video generation system capable of producing synchronized image, sound, voice, and on-screen text from a single written prompt. The model's significance for the art world lies less in any single capability than in the way several capabilities arrive together: visual coherence over time, accurate text rendering across scripts, controllable scene composition, and audio that is generated alongside rather than added to the image. For digital artists, museum educators, and exhibition designers, the implications deserve serious attention.
Studio Practice in a Multimodal Era
Digital artists have spent the past several years adapting their studios to generative tools. Workflows that begin with text prompts and end with stylized still imagery have entered the practice of artists working in genres from abstraction to documentary photography. The arrival of unified video generation introduces a different question: not how to produce an image, but how to produce a duration.
Several material consequences follow.
The first is that temporal compositional decisions — the rhythm of a cut, the pacing of a camera move, the relationship between sound and image — enter the practice of artists who have not previously worked with time-based media. A photographer accustomed to selecting a single moment now considers what came before and what follows. A painter whose practice has been still becomes responsible for the seconds on either side of stillness. This is less a replacement of existing skills than an extension of them, but the extension is meaningful.
The second is that the relationship between sketch and finished work changes character. Where previous tools produced output requiring further refinement, the unified generation of image, sound, and text means that the first output is closer to a complete artifact. Whether this proves liberating or constraining for individual practitioners will depend on their relationship to iteration. Artists who work through accumulated drafts may find the tool less aligned with their process than those who work through decisive single gestures.
The third is that the question of authorship becomes more explicitly philosophical. AI-generated outputs are mediated by training on the work of others, and the boundary between citation and substitution is increasingly difficult to draw. Artists, critics, and institutions will continue to negotiate this terrain for years.
Museum Education and the Generated Image
Museum education departments have been among the more thoughtful adopters of digital media over the past two decades. Audio guides became podcasts. Wall labels became interactive screens. Educational programming moved into livestreams and online courses. Each adoption raised the same question: how does new technology serve the work itself rather than overshadow it?
Unified AI video generation raises that question with new force. The capability to produce instructional or contextual video material at speeds previously impossible — visualizing a historical artist's studio, animating a static painting's compositional choices, reconstructing the physical context in which a piece was originally encountered — offers museum educators a tool of substantial reach.
Several uses suggest themselves as worth careful experimentation. Curatorial introductions to exhibitions can be produced in multiple languages without separate filming. Historical context videos can be generated for specific objects in collections too large to support filmed educational material for each piece individually. Accessibility content — audio description, sign language interpretation through generated avatars, simplified visual explanations — becomes economically feasible for institutions with limited educational production budgets.
These applications come with familiar caveats. Generated content remains subject to the accuracy limitations of the underlying model. Museum educators cannot delegate scholarly responsibility to AI systems that do not understand the material they are visualizing. The discipline of subject matter review remains essential, and the production-time savings AI offers may be substantially consumed by the verification time required for institutional content standards.
Exhibition Design and the Question of Scale
For exhibition designers — those tasked with making physical and digital encounters with art meaningful — Gemini Omni and similar tools arrive at an interesting moment. The architectural ambition of contemporary exhibition design has expanded considerably over the past decade. Immersive environments, projection-mapped installations, and multimedia introductions have become standard elements of major exhibitions in both museums and commercial galleries.
The economics of producing this content have, however, remained largely unchanged. A projection-mapped installation requires teams of designers, programmers, and post-production professionals working over months. Materials tracked through
the public Gemini Omni reference index suggest that the technical capabilities arriving in this generation of AI video will not replace these teams in the near term, but will substantially expand what smaller institutions and independent designers can attempt.
This is, in the long view, consistent with how other production technologies have entered the design profession. The arrival of accessible digital projection in the 2000s did not eliminate exhibition designers but expanded the range of institutions that could attempt projection-based work. The arrival of unified multimodal generation may follow a similar trajectory, particularly for institutions producing thematic exhibitions on tighter timelines than the major retrospectives that have historically defined the field.
Continuity and Change
Each genuinely new technology in the visual arts has been received initially as a threat to existing practice, then as an extension of it, and finally as an unremarkable element of the studio, the museum, and the exhibition. Photography, video art, digital media, and immersive installation each followed this trajectory. There is no obvious reason unified AI video generation will be exceptional in this regard.
What may be exceptional is the speed. The trajectory from threat to extension to absorption that took decades for photography is taking years for AI image generation, and may take months for AI video. Critics, curators, and artists will be writing the first sustained reflections on the work this technology enables before the technology itself has stabilized.
The most useful position, for those whose lives are organized around the visual arts, is probably to remain curious. The tools will arrive. The studios, museums, and exhibitions that find ways to work thoughtfully with them will produce work worth seeing. Those that do not will find other tools, as they always have.
The museum without walls continues. What it contains is being negotiated in real time.
________________________________________
Further documentation, post-launch capability tracking, and ongoing reference material related to Google's anticipated Gemini Omni release are aggregated at
gemini-omni.ai, an independent index compiled from publicly available leaks, developer reports, and official channels as new information surfaces.