A Step-by-Step Guide to Synthesizing Adversarial Examples

Synthesizing adversarial examples for neural networks is surprisingly easy: small, carefully-crafted perturbations to inputs can cause neural networks to misclassify inputs in arbitrarily chosen ways. Given that adversarial examples transfer to the physical world and can be made extremely robust, this is a real security concern.

In this post, we give a brief introduction to algorithms for synthesizing adversarial examples, and we walk through the process of implementing attacks in TensorFlow, building up to synthesizing a robust adversarial example following this technique.

This post is an executable Jupyter notebook: you’re encouraged to download it and experiment with the examples yourself!


We choose to attack an Inception v3 network trained on ImageNet. In this section, we load a pre-trained network from the TF-slim image classification library. This part isn’t particularly interesting, so feel free to skip this section.

import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
sess = tf.InteractiveSession()

First, we set up the input image. We use a tf.Variable instead of a tf.placeholder because we will need it to be trainable. We can still feed it when we want to.

image = tf.Variable(tf.zeros((299, 299, 3)))

Next, we load the Inception v3 model.

def inception(image, reuse):
    preprocessed = tf.multiply(tf.subtract(tf.expand_dims(image, 0), 0.5), 2.0)
    arg_scope = nets.inception.inception_v3_arg_scope(weight_decay=0.0)
    with slim.arg_scope(arg_scope):
        logits, _ = nets.inception.inception_v3(
            preprocessed, 1001, is_training=False, reuse=reuse)
        logits = logits[:,1:] # ignore background class
        probs = tf.nn.softmax(logits) # probabilities
    return logits, probs

logits, probs = inception(image, reuse=False)

Next, we load pre-trained weights. This Inception v3 has a top-5 accuracy of 93.9%.

import tempfile
from urllib.request import urlretrieve
import tarfile
import os
data_dir = tempfile.mkdtemp()
inception_tarball, _ = urlretrieve(
tarfile.open(inception_tarball, 'r:gz').extractall(data_dir)
restore_vars = [
    var for var in tf.global_variables()
    if var.name.startswith('InceptionV3/')
saver = tf.train.Saver(restore_vars)
saver.restore(sess, os.path.join(data_dir, 'inception_v3.ckpt'))

Next, we write some code to show an image, classify it, and show the classification result.

import json
import matplotlib.pyplot as plt
imagenet_json, _ = urlretrieve(
with open(imagenet_json) as f:
    imagenet_labels = json.load(f)
def classify(img, correct_class=None, target_class=None):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 8))
    p = sess.run(probs, feed_dict={image: img})[0]
    topk = list(p.argsort()[-10:][::-1])
    topprobs = p[topk]
    barlist = ax2.bar(range(10), topprobs)
    if target_class in topk:
    if correct_class in topk:
    plt.ylim([0, 1.1])
               [imagenet_labels[i][:15] for i in topk],

Example image

We load our example image and make sure it’s classified correctly.

import PIL
import numpy as np
img_path, _ = urlretrieve('http://www.anishathalye.com/media/2017/07/25/cat.jpg')
img_class = 281
img = PIL.Image.open(img_path)
big_dim = max(img.width, img.height)
wide = img.width > img.height
new_w = 299 if not wide else int(img.width * 299 / img.height)
new_h = 299 if wide else int(img.height * 299 / img.width)
img = img.resize((new_w, new_h)).crop((0, 0, 299, 299))
img = (np.asarray(img) / 255.0).astype(np.float32)
classify(img, correct_class=img_class)


Adversarial examples

Given an image , our neural network outputs a probability distribution over labels, . When we craft an adversarial input, we want to find an where is maximized for a target label : that way, our input will be misclassified as the target class. We can ensure that doesn’t look too different from the original by constraining ourselves to some box with radius , requiring that .

In this framework, an adversarial example is the solution to a constrained optimization problem that we can solve using backpropagation and projected gradient descent, basically the same techniques that are used to train networks themselves. The algorithm is simple:

We begin by initializing our adversarial example as . Then, we repeat the following until convergence:


We start with the easiest part: writing a TensorFlow op for initialization.

x = tf.placeholder(tf.float32, (299, 299, 3))

x_hat = image # our trainable adversarial input
assign_op = tf.assign(x_hat, x)

Gradient descent step

Next, we write the gradient descent step to maximize the log probability of the target class (or equivalently, minimize the cross entropy).

learning_rate = tf.placeholder(tf.float32, ())
y_hat = tf.placeholder(tf.int32, ())

labels = tf.one_hot(y_hat, 1000)
loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=[labels])
optim_step = tf.train.GradientDescentOptimizer(
    learning_rate).minimize(loss, var_list=[x_hat])

Projection step

Finally, we write the projection step to keep our adversarial example visually close to the original image. Additionally, we clip to to keep it a valid image.

epsilon = tf.placeholder(tf.float32, ())

below = x - epsilon
above = x + epsilon
projected = tf.clip_by_value(tf.clip_by_value(x_hat, below, above), 0, 1)
with tf.control_dependencies([projected]):
    project_step = tf.assign(x_hat, projected)


Finally, we’re ready to synthesize an adversarial example. We arbitrarily choose “guacamole” (imagenet class 924) as our target class.

demo_epsilon = 2.0/255.0 # a really small perturbation
demo_lr = 1e-1
demo_steps = 100
demo_target = 924 # "guacamole"

# initialization step
sess.run(assign_op, feed_dict={x: img})

# projected gradient descent
for i in range(demo_steps):
    # gradient descent step
    _, loss_value = sess.run(
        [optim_step, loss],
        feed_dict={learning_rate: demo_lr, y_hat: demo_target})
    # project step
    sess.run(project_step, feed_dict={x: img, epsilon: demo_epsilon})
    if (i+1) % 10 == 0:
        print('step %d, loss=%g' % (i+1, loss_value))

adv = x_hat.eval() # retrieve the adversarial example
step 10, loss=4.18923
step 20, loss=0.580237
step 30, loss=0.0322334
step 40, loss=0.0209522
step 50, loss=0.0159688
step 60, loss=0.0134457
step 70, loss=0.0117799
step 80, loss=0.0105757
step 90, loss=0.00962179
step 100, loss=0.00886694

This adversarial image is visually indistinguishable from the original, with no visual artifacts. However, it’s classified as “guacamole” with high probability!

classify(adv, correct_class=img_class, target_class=demo_target)


Robust adversarial examples

Now, we go through a more advanced example. We follow our approach for synthesizing robust adversarial examples to find a single perturbation of our cat image that’s simultaneously adversarial under some chosen distribution of transformations. We could choose any distribution of differentiable transformations; in this post, we’ll synthesize a single adversarial input that’s robust to rotation by .

Before we proceed, let’s check if our previous example is still adversarial if we rotate it, say by an angle of .

ex_angle = np.pi/8

angle = tf.placeholder(tf.float32, ())
rotated_image = tf.contrib.image.rotate(image, angle)
rotated_example = rotated_image.eval(feed_dict={image: adv, angle: ex_angle})
classify(rotated_example, correct_class=img_class, target_class=demo_target)


Looks like our original adversarial example is not rotation-invariant!

So, how do we make an adversarial example robust to a distribution of transformations? Given some distribution of transformations , we can maximize , subject to . We can solve this optimization problem via projected gradient descent, noting that is and approximating with samples at each gradient descent step.

Rather than manually implementing the gradient sampling, we can use a trick to get TensorFlow to do it for us: we can model our sampling-based gradient descent as doing gradient descent over an ensemble of stochastic classifiers that randomly sample from the distribution and transform their input before classifying it.

num_samples = 10
average_loss = 0
for i in range(num_samples):
    rotated = tf.contrib.image.rotate(
        image, tf.random_uniform((), minval=-np.pi/4, maxval=np.pi/4))
    rotated_logits, _ = inception(rotated, reuse=True)
    average_loss += tf.nn.softmax_cross_entropy_with_logits(
        logits=rotated_logits, labels=labels) / num_samples

We can reuse our assign_op and project_step, though we’ll have to write a new optim_step for this new objective.

optim_step = tf.train.GradientDescentOptimizer(
    learning_rate).minimize(average_loss, var_list=[x_hat])

Finally, we’re ready to run PGD to generate our adversarial input. As in the previous example, we’ll choose “guacamole” as our target class.

demo_epsilon = 8.0/255.0 # still a pretty small perturbation
demo_lr = 2e-1
demo_steps = 300
demo_target = 924 # "guacamole"

# initialization step
sess.run(assign_op, feed_dict={x: img})

# projected gradient descent
for i in range(demo_steps):
    # gradient descent step
    _, loss_value = sess.run(
        [optim_step, average_loss],
        feed_dict={learning_rate: demo_lr, y_hat: demo_target})
    # project step
    sess.run(project_step, feed_dict={x: img, epsilon: demo_epsilon})
    if (i+1) % 50 == 0:
        print('step %d, loss=%g' % (i+1, loss_value))

adv_robust = x_hat.eval() # retrieve the adversarial example
step 50, loss=0.0804289
step 100, loss=0.0270499
step 150, loss=0.00771527
step 200, loss=0.00350717
step 250, loss=0.00656128
step 300, loss=0.00226182

This adversarial image is classified as “guacamole” with high confidence, even when it’s rotated!

rotated_example = rotated_image.eval(feed_dict={image: adv_robust, angle: ex_angle})
classify(rotated_example, correct_class=img_class, target_class=demo_target)



Let’s examine the rotation-invariance of the robust adversarial example we produced over the entire range of angles, looking at over .

thetas = np.linspace(-np.pi/4, np.pi/4, 301)

p_naive = []
p_robust = []
for theta in thetas:
    rotated = rotated_image.eval(feed_dict={image: adv_robust, angle: theta})
    p_robust.append(probs.eval(feed_dict={image: rotated})[0][demo_target])
    rotated = rotated_image.eval(feed_dict={image: adv, angle: theta})
    p_naive.append(probs.eval(feed_dict={image: rotated})[0][demo_target])

robust_line, = plt.plot(thetas, p_robust, color='b', linewidth=2, label='robust')
naive_line, = plt.plot(thetas, p_naive, color='r', linewidth=2, label='naive')
plt.ylim([0, 1.05])
plt.xlabel('rotation angle')
plt.ylabel('target class probability')
plt.legend(handles=[robust_line, naive_line], loc='lower right')


It’s super effective!


Seashells is a service that lets you pipe output from command-line programs to the web in real-time:

$ python train.py | seashells
serving at https://seashells.io/v/{random url}

There’s even a netcat API, so there’s no need to install software to use the service:

$ echo 'Hello, Seashells!' | nc seashells.io 1337
serving at https://seashells.io/v/{random url}

Any data that’s sent to Seashells is viewable in a web-based terminal, with data streamed to the browser in real-time using websockets:

Web terminal demo

It’s a super simple service that requires zero setup to use.

Seashells was created because I wanted an easy way to monitor long-running experiments: I regularly train neural nets and run related experiments, and I like to semi-obsessively track the progress of my code. From my computer, I can connect to the VPN, log in to the GPU box over SSH, and look at the output of a program running in a screen session. But I can’t do this from my phone or from another computer. All I wanted was to see a small amount of information like epoch 13/100, train loss=0.3825, test loss=0.4830 at a glance.

I didn’t find existing tools that satisfied my needs, so I built Seashells as a low-overhead way of monitoring programs that print progress to the console.

Seashells been serving my needs pretty well so far, and I hope others will find it useful too!

Testing Distributed Systems for Linearizability

Distributed systems are challenging to implement correctly because they must handle concurrency and failure. Networks can delay, duplicate, reorder, and drop packets, and machines can fail at any time. Even when designs are proven correct on paper, it is difficult to avoid subtle bugs in implementations.

Unless we want to use formal methods1, we have to test systems if we want assurance that implementations are correct. Testing distributed systems is challenging, too. Concurrency and nondeterminism make it difficult to catch bugs in tests, especially when the most subtle bugs surface only under scenarios that are uncommon in regular operation, such as simultaneous machine failure or extreme network delays.


Before we can discuss testing distributed systems for correctness, we need to define what we mean by “correct”. Even for seemingly simple systems, specifying exactly how the system is supposed to behave is an involved process2.

Consider a simple key-value store, similar to etcd, that maps strings to strings and supports two operations: Put(key, value) and Get(key). First, we consider how it behaves in the sequential case.

Sequential Specifications

We probably have a good intuitive understanding of how a key-value store is supposed to behave under sequential operation: Get operations must reflect the result of applying all previous Put operations. For example, we could run a Put("x", "y") and then a subsequent Get("x") should return "y". If the operation returned, say, a "z", that would be incorrect.

More formal than an English-language description, we can write a specification for our key-value store as executable code:

class KVStore:
    def __init__(self):
        self._data = {}

    def put(self, key, value):
        self._data[key] = value

    def get(self, key):
        return self._data.get(key, "")

The code is short, but it nails down all the important details: the start state, how the internal state is modified as a result of operations, and what values are returned as a result of calls on the key-value store. The spec solidifies some details like what happens when Get() is called on a nonexistent key, but in general, it lines up with our intuitive definition of a key-value store.


Next, we consider how our key-value store can behave under concurrent operation. Note that the sequential specification does not tell us what happens under concurrent operation. For example, the sequential spec doesn’t say how our key-value store is allowed to behave in this scenario:


It’s not immediately obvious what value the Get("x") operation should be allowed to return. Intuitively, we might say that because the Get("x") is concurrent with the Put("x", "y") and Put("x", "z"), it can return either value or even "". If we had a situation where another client executed a Get("x") much later, we might say that the operation must return "z", because that was the value written by the last write, and the last write operation was not concurrent with any other writes.

We formally specify correctness for concurrent operations based on a sequential specification using a consistency model known as linearizability. In a linearizable system, every operation appears to execute atomically and instantaneously at some point between the invocation and response. There are other consistency models besides linearizability, but many distributed systems provide linearizable behavior: linearizability is a strong consistency model, so it’s relatively easy to build other systems on top of linearizable systems.

Consider an example history with invocations and return values of operations on a key-value store:


This history is linearizable. We can show this by explicitly finding linearization points for all operations (drawn in blue below). The induced sequential history, Put("x", "0"), Get("x") -> "0", Put("x", "1"), Get("x") -> "1", is a correct history with respect to the sequential specification.


In contrast, this history is not linearizable:


There is no linearization of this history with respect to the sequential specification: there is no way to assign linearization points to operations in this history. We could start assigning linearization points to the operations from clients 1, 2, and 3, but then there would be no way to assign a linearization point for client 4: it would be observing a stale value. Similarly, we could start assigning linearization points to the operations from clients 1, 2, and 4, but then the linearization point of client 2’s operation would be after the start of client 4’s operation, and then we wouldn’t be able to assign a linearization point for client 3: it could legally only read a value of "" or "0".


With a solid definition of correctness, we can think about how to test distributed systems. The general approach is to test for correct operation while randomly injecting faults such as machine failures and network partitions. We could even simulate the entire network so it’s possible to do things like cause extremely long network delays. Because tests are randomized, we would want to run them a bunch of times to gain assurance that a system implementation is correct.

Ad-hoc testing

How do we actually test for correct operation? With the simplest software, we test it using input-output cases like assert(expected_output == f(input)). We could use a similar approach with distributed systems. For example, with our key-value store, we could have the following test where multiple clients are executing operations on the key-value store in parallel:

for client_id = 0..10 {
    spawn thread {
        for i = 0..1000 {
            value = rand()
            kvstore.put(client_id, value)
            assert(kvstore.get(client_id) == value)
wait for threads

It is certainly the case that if the above test fails, then the key-value store is not linearizable. However, this test is not that thorough: there are non-linearizable key-value stores that would always pass this test.


A better test would be to have parallel clients run completely random operations: e.g. repeatedly calling kvstore.put(rand(), rand()) and kvstore.get(rand()), perhaps limited to a small set of keys to increase contention. But in this case, how would we determine what is “correct” operation? With the simpler test, we had each client operating on a separate key, so we could always predict exactly what the output had to be.

When clients are operating concurrently on the same set of keys, things get more complicated: we can’t predict what the output of every operation has to be because there isn’t only one right answer. So we have to take an alternative approach: we can test for correctness by recording an entire history of operations on the system and then checking if the history is linearizable with respect to the sequential specification.

Linearizability Checking

A linearizability checker takes as input a sequential specification and a concurrent history, and it runs a decision procedure to check whether the history is linearizable with respect to the spec.


Unfortunately, linearizability checking is NP-complete. The proof is actually quite simple: we can show that linearizability checking is in NP, and we can show that an NP-hard problem can be reduced to linearizability checking. Clearly, linearizability checking is in NP: given a linearization, i.e. the linearization points of all operations, we can check in polynomial time if it is a valid linearization with respect to the sequential spec.

To show that linearizability checking is NP-hard, we can reduce the subset sum problem to linearizability checking. Recall that in the subset sum problem, we are given a set of non-negative integers and a target value , and we have to determine whether there exists a subset of that sums to . We can reduce this problem to linearizability checking as follows. Consider the sequential spec:

class Adder:
    def __init__(self):
        self._total = 0

    def add(self, value):
        self._total += value

    def get(self):
        return self._total

And consider this history:


This history is linearizable if and only if the answer to the subset sum problem is “yes”. If the history is linearizable, then we can take all the operations Add(s_i) that have linearization points before that of the Get() operation, and those correspond to elements in a subset whose sum is . If the set does have a subset that sums to , then we can construct a linearization by having the operations Add(s_i) corresponding to the elements in the subset take place before the Get() operation and having the rest of the operations take place after the Get() operation.


Even though linearizability checking is NP-complete, in practice, it can work pretty well on small histories. Implementations of linearizability checkers take an executable specification along with a history, and they run a search procedure to try to construct a linearization, using tricks to constrain the size of the search space.

There are existing linearizability checkers like Knossos, which is used in the Jepsen test system. Unfortunately, when trying to test an implementation of a distributed key-value store that I had written, I couldn’t get Knossos to check my histories. It seemed to work okay on histories with a couple concurrent clients, with about a hundred history events in total, but in my tests, I had tens of clients generating histories of thousands of events.

To be able to test my key-value store, I wrote Porcupine, a fast linearizability checker implemented in Go. Porcupine checks if histories are linearizable with respect to executable specifications written in Go. Empirically, Porcupine is thousands of times faster than Knossos. I was able to use it to test my key-value store because it is capable of checking histories of thousands of events in a couple seconds.


Testing linearizable distributed systems using fault injection along with linearizability checking is an effective approach.

To compare ad-hoc testing with linearizability checking using Porcupine, I tried testing my distributed key-value store using the two approaches. I tried introducing different kinds of design bugs into the implementation of the key-value store, such as modifications that would result in stale reads, and I checked to see which tests failed. The ad-hoc tests caught some of the most egregious bugs, but the tests were incapable of catching the more subtle bugs. In contrast, I couldn’t introduce a single correctness bug that the linearizability test couldn’t catch.

  1. Formal methods can provide strong guarantees about the correctness of distributed systems. For example, the UW PLSE research group has recently verified an implementation of the Raft consensus protocol using the Coq proof assistant. Unfortunately, verification requires specialized knowledge, and verifying realistic systems involves huge effort. Perhaps one day systems used in the real world will be proven correct, but for now, production systems are tested but not verified. 

  2. Ideally, all production systems would have formal specifications. Some systems that are being used in the real world today do have formal specs: for example, Raft has a formal spec written in TLA+. But unfortunately, the majority of real-world systems do not have formal specs.