Adversarial Perturbation in Logistic Regression

Machine Learning · Hard · Free problem

You have a fitted logistic regression classifier with weight vector $w$ and bias $b$. For a given input $x$ with true label $y \in \{0, 1\}$, the model predicts $\hat{y} = \sigma(w^T x + b)$, where $\sigma$ is the sigmoid function.

The cross-entropy loss is: $L(x, y) = -\left[y \log \sigma(w^T x + b) + (1 - y) \log(1 - \sigma(w^T x + b))\right]$

How would you add a small perturbation $\delta$ to the input $x$ to maximally increase the loss?
Derive the gradient $\nabla_x L$ for logistic regression.
Describe the Fast Gradient Sign Method (FGSM) and explain why it is the optimal perturbation under an $\ell_\infty$ constraint.

Hints

To increase the loss, move the input in the direction that the loss increases fastest -- that is, in the direction of $\nabla_x L$.
For logistic regression, use the chain rule: $\nabla_x L = \frac{\partial L}{\partial z} \cdot \nabla_x z$ where $z = w^T x + b$. The key identity is $\frac{\partial L}{\partial z} = \sigma(z) - y$.
Under an $\ell_\infty$ constraint on $\delta$, the optimal perturbation is $\delta = \epsilon \cdot \text{sign}(\nabla_x L)$. This follows from the duality between $\ell_\infty$ and $\ell_1$ norms.

Worked Solution

How to Think About It: This is an adversarial attack problem. You want to find the worst-case input perturbation -- the small change to $x$ that makes the model's prediction as wrong as possible. The natural approach is gradient ascent on the loss: move the input in the direction that increases the loss fastest. For logistic regression, the loss landscape is simple enough that we can compute the gradient in closed form. The result is the Fast Gradient Sign Method (FGSM), which turns out to be the optimal single-step attack under an $\ell_\infty$ constraint.

Key Insight: The gradient of the loss with respect to the input $x$ points in the direction of steepest loss increase. For logistic regression, this gradient has a beautifully simple form: it is proportional to the weight vector $w$, scaled by the prediction error.

The Method:

Step 1: Compute $\nabla_x L$.

Let $z = w^T x + b$. The cross-entropy loss is: $L = -y \log \sigma(z) - (1-y) \log(1 - \sigma(z))$

Using the fact that $\sigma'(z) = \sigma(z)(1 - \sigma(z))$ and the chain rule: $\frac{\partial L}{\partial z} = \sigma(z) - y = \hat{y} - y$

Since $z = w^T x + b$, we have $\nabla_x z = w$, so: $\nabla_x L = (\hat{y} - y) \cdot w$

This is the prediction error times the weight vector. When the model is wrong ($\hat{y}$ far from $y$), the gradient is large; when correct, it is small.

Step 2: Choose the perturbation.

To maximize the loss, we want to move $x$ in the direction of $\nabla_x L$. Under an $\ell_\infty$ constraint $\|\delta\|_\infty \le \epsilon$, the optimal perturbation is: $\delta^{*} = \epsilon \cdot \text{sign}(\nabla_x L) = \epsilon \cdot \text{sign}((\hat{y} - y) \cdot w)$

Since $\hat{y} - y$ is a scalar, $\text{sign}((\hat{y} - y) \cdot w) = \text{sign}(\hat{y} - y) \cdot \text{sign}(w)$. For a misclassified point (say $y = 1$ but $\hat{y} < 0.5$), the scalar $\hat{y} - y$ is negative, so the sign flips.

The adversarial example is: $x_{\text{adv}} = x + \epsilon \cdot \text{sign}(\nabla_x L)$

Step 3: Why sign is optimal under $\ell_\infty$.

The first-order Taylor expansion gives: $L(x + \delta) \approx L(x) + \nabla_x L^T \delta$

Maximizing $\nabla_x L^T \delta$ subject to $\|\delta\|_\infty \le \epsilon$ is a linear optimization. By the duality between $\ell_\infty$ and $\ell_1$ norms: $\max_{\|\delta\|_\infty \le \epsilon} \nabla_x L^T \delta = \epsilon \|\nabla_x L\|_1$

This maximum is achieved by $\delta_i = \epsilon \cdot \text{sign}((\nabla_x L)_i)$. So FGSM is exactly the solution to the constrained optimization problem.

Practical Considerations:

$\ell_2$ constraint alternative: Under $\|\delta\|_2 \le \epsilon$, the optimal perturbation is $\delta = \epsilon \cdot \nabla_x L / \|\nabla_x L\|_2$ (normalize the gradient instead of taking its sign).
For logistic regression specifically, $\nabla_x L$ is always proportional to $w$, so the adversarial direction is the same for every input -- only the magnitude of the error term $(\hat{y} - y)$ changes. This means the adversarial vulnerability is entirely determined by the model weights.
Perturbation budget $\epsilon$: In practice, $\epsilon$ is chosen small enough that the perturbation is imperceptible (for images) or within measurement noise (for financial features).

Answer: Add perturbation $\delta = \epsilon \cdot \text{sign}(w)$ (scaled by the sign of the prediction error). The gradient is $\nabla_x L = (\hat{y} - y) \cdot w$, and FGSM is optimal under $\ell_\infty$ because maximizing a linear function over an $\ell_\infty$ ball yields the sign of the gradient. For logistic regression, the adversarial direction is always along $\text{sign}(w)$, independent of the specific input.

Intuition

Adversarial perturbations exploit the fact that a model's decision boundary is a hyperplane (in the logistic regression case) or a curved surface (in neural networks), and inputs near the boundary can be pushed across it with tiny changes. FGSM is the simplest version of this idea: take one gradient step in the direction that hurts the model most. The sign operation is not a heuristic -- it is the mathematically optimal choice under an $\ell_\infty$ budget, because you want to maximize the dot product between the gradient and the perturbation, and the $\ell_\infty$ ball's extreme points are the sign vectors.

For logistic regression, the result is particularly clean because the model is linear: the gradient with respect to the input is always proportional to $w$, so every input gets attacked in the same direction. This is both a strength (the attack is trivial to compute) and a weakness of the model (you cannot make any input robust without changing $w$ itself). In more complex models like neural networks, the gradient depends on $x$, so the attack direction varies per input -- but the FGSM principle remains the starting point for all gradient-based adversarial methods.

Open the full interactive solver →