Anish Athalye

An AI That Can Mimic Any Artist

Take a look at the following two pictures. One was painted by contemporary artist Leonid Afremov, and the other was painted by an algorithm mimicking his style.

Leonid Afremov's Rain PrincessLeonid Afremov's Rain Princess
MIT's Great Dome in the style of Leonid AfremovMIT's Great Dome in the style of Leonid Afremov

The first image is the Afremov’s Rain Princess, and the second image is an imitation. What’s fascinating is that a computer algorithm automatically “painted” the imitation, given only a photograph of the dome and the image of Rain Princess as input.

Algorithm

The algorithm used to produce the above image is described in full in a paper titled A Neural Algorithm of Artistic Style.

In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality.

The algorithm is based on one key insight — neural networks trained to perform object detection end up learning to separate content from style. Intuitively, this makes a lot of sense, because an object’s identity is dependent on shapes and their arrangement, and it’s usually invariant of colors, textures, and other aspects of artistic style. The authors of the paper realized this, and they figured out a way to leverage this effect to synthesize images that match the content of one image and the style of another.

At a high level, the general technique is pretty straightforward. First, define a function that describes the quality of a generated image intended to match a particular content and style. Then, synthesize an image that maximizes quality. This is done by starting with a random image (or the content image) and then optimizing using backpropagation, just like what’s done for training neural nets. Here’s what the optimization process looks like when starting with this image and rendering it in the style of Van Gogh’s Starry Night.

Implementation

Following the description in the research paper, I wrote an open-source implementation of the algorithm on top of TensorFlow, Google’s new deep learning library.

Others have produced open-source implementations of the neural style algorithm, but this is the first one in TensorFlow. Because TensorFlow supports automatic differentiation and has a clean API, the description from the paper translates to code that’s pretty straightforward to follow.

Extensions

The implementation works pretty well on still images. I thought it would be cool to try to make it work on video. Simply running the algorithm separately on every frame doesn’t produce the best results — it has a tendency to produce jarring outputs, where each frame is very different from the one before it. Luckily, there’s an elegant way to fix this. Instead of starting from scratch when generating each frame, initializing with the output of the previously rendered frame helps preserve visual continuity, and it also speeds up render times.

The results are quite pretty (original video on the top, rendered in the style of Starry Night on the bottom):