$$heading_1$$ Comparing Numpy, Pytorch, and autograd on CPU and GPU¶

by Chuck Anderson, Pattern Exploration

In $$1$$:
import numpy as np
import time
import torch
import matplotlib.pyplot as plt
%matplotlib inline


$$heading_1$$ Very Brief Introduction to Autograd¶

Let’s use torch.autograd to calculate the derivative of the sine function. Here is what we should get.

In $$14$$:
x = np.linspace$$extract_itex$$-2*np.pi, 2*np.pi, 100$$/extract_itex$$
y = np.sin$$extract_itex$$x$$/extract_itex$$
dy = np.cos$$extract_itex$$x$$/extract_itex$$
plt.plot$$extract_itex$$x, y$$/extract_itex$$
plt.plot$$extract_itex$$x, dy$$/extract_itex$$
plt.legend$$extract_itex$$('$$extract_itex$$\sin(x$$/extract_itex$$$$/extract_itex$$', '$$extract_itex$$\\frac{d \sin$$extract_itex$$x$$/extract_itex$$}{dx} = \cos$$extract_itex$$x$$/extract_itex$$$$/extract_itex$$'));


The autograd module in pytorch is designed to calculate gradients of scalar-valued functions with respect to parameters. So, how can we use autograd to calculate $$extract_itex$$\frac{d \sin$$extract_itex$$x$$/extract_itex$$}{dx}$$/extract_itex$$ for multiple values of $$extract_itex$$x$$/extract_itex$$? Well, we can’t with a single call to backward. Instead, we must include a vector as an argument to backward that has as many elements as $$extract_itex$$x$$/extract_itex$$ has. The dot product of the vector and the calculated gradient is calculated, to sum up the full gradient with respect to each component in the vector. This is just what is needed when calculating a gradient of a model’s mean squared error, averaged over all outputs, with respect to the model’s parameters.

Back to our original problem. We can calculate the derivative of $$extract_itex$$\sin$$extract_itex$$x$$/extract_itex$$$$/extract_itex$$ with respect to each value of $$extract_itex$$x$$/extract_itex$$ by calling backward once for each value of $$extract_itex$$x$$/extract_itex$$ and setting the vector argument to backward to all zeros except one value corresponding to the position of the value of $$extract_itex$$x$$/extract_itex$$ we want the derivative for.

The following bit of code illustrates this. Sharif first showed this to me. This site also helps.

In $$15$$:
x = torch.autograd.Variable$$extract_itex$$torch.linspace(-2*np.pi, 2*np.pi, steps=100$$/extract_itex$$, requires_grad=True)


The variable x will contain the gradient of y $$extract_itex$$defined in next cell$$/extract_itex$$ with respect to x, but only after y.backward$$extract_itex$$...$$/extract_itex$$ is called. Any additional calls to y.backward$$extract_itex$$...$$/extract_itex$$ will add gradient values to the current gradient values. The following test and call to x.grad.data.zero_ will zero the gradient values, to take care of the case when then next three cells are executed additional times.

In $$16$$:
if x.grad is not None:
x.grad.data.zero_

y = torch.sin$$extract_itex$$x$$/extract_itex$$


In $$17$$:
dout = torch.zeros$$extract_itex$$100$$/extract_itex$$
for i in range$$extract_itex$$100$$/extract_itex$$:
dout$$:$$ = 0
dout$$i$$ = 1
y.backward$$extract_itex$$dout, retain_graph=True$$/extract_itex$$


In $$18$$:
plt.plot$$extract_itex$$x.data.numpy($$/extract_itex$$, y.data.numpy);
plt.plot$$extract_itex$$x.data.numpy($$/extract_itex$$, x.grad.data.numpy)
plt.legend$$extract_itex$$('$$extract_itex$$\sin(x$$/extract_itex$$$$/extract_itex$$', '$$extract_itex$$\\frac{d \sin$$extract_itex$$x$$/extract_itex$$}{dx} = \cos$$extract_itex$$x$$/extract_itex$$$$/extract_itex$$'));


$$heading_1$$ Using Numpy to Fit a Polynomial to Data¶

Let’s try to fit a polynomial to the sine function. First, here is the parameterized polynomial model of degree 5 and its derivative.

In $$7$$:
def poly$$extract_itex$$x, w$$/extract_itex$$:
''' poly$$extract_itex$$x,w$$/extract_itex$$, where x is Nx1 samples and w is 1xD+1 coefficients for x^0, x^1, ..., x^D'''
D = w.size
xPowers = x ** range$$extract_itex$$D$$/extract_itex$$
return xPowers @ w


The derivative of a polynomial of degree 3 is

$$extract_tex$$\frac{d $$extract_itex$$w_0 + w_1 x + w_2 x^2 + w_3 x^3}{dw_i} = (1, x, x^2, x^3$$/extract_itex$$$$/extract_tex$$

and in python it is

In $$8$$:
def dpoly_dw$$extract_itex$$x, w$$/extract_itex$$:
D = w.size
xPowers = x ** range$$extract_itex$$D$$/extract_itex$$
return xPowers


Let’s test these functions.

In $$9$$:
x = np.linspace$$extract_itex$$-5, 5, 20$$/extract_itex$$.reshape$$extract_itex$$(-1,1$$/extract_itex$$)
w = 0.1*np.array$$extract_itex$$$$3.0, -2.0, -1.5, 5$$$$/extract_itex$$.reshape$$extract_itex$$(-1,1$$/extract_itex$$)
x.shape, w.shape


Out$$9$$:
$$extract_itex$$(20, 1$$/extract_itex$$, $$extract_itex$$4, 1$$/extract_itex$$)

In $$10$$:
poly$$extract_itex$$x, w$$/extract_itex$$.shape, dpoly_dw$$extract_itex$$x,w$$/extract_itex$$.shape


Out$$10$$:
$$extract_itex$$(20, 1$$/extract_itex$$, $$extract_itex$$20, 4$$/extract_itex$$)

In $$11$$:
plt.subplot$$extract_itex$$2, 1, 1$$/extract_itex$$
plt.plot$$extract_itex$$poly(x, w$$/extract_itex$$)
plt.ylabel$$extract_itex$$'poly(x,w$$/extract_itex$$')
plt.subplot$$extract_itex$$2,1,2$$/extract_itex$$
plt.plot$$extract_itex$$dpoly_dw(x,w$$/extract_itex$$)
plt.ylabel$$extract_itex$$'d poly(x,w$$/extract_itex$$/ dw');


Now, some data to fit. 100 samples of part of the sine curve.

In $$12$$:
x = np.linspace$$extract_itex$$0, 5, 100$$/extract_itex$$.reshape$$extract_itex$$(-1, 1$$/extract_itex$$)
y = np.sin$$extract_itex$$x$$/extract_itex$$
plt.plot$$extract_itex$$x, y, 'o-'$$/extract_itex$$;


Okay, ready to fit the polynomial to this data. Steps are simple. Initialize w to zeros. Calculate output of polynomial model. Update w by negative gradient of mean squared error with respect to w, multiplied by a small learning rate. To make plot of mse versus number of update steps, calculate mean squared error between model output and data.

In $$13$$:
startTime = time.time

nSteps = 200000
learnRate = 0.00001
degree = 4

w = np.zeros$$extract_itex$$(degree+1, 1$$/extract_itex$$)
mseTrace = np.zeros$$extract_itex$$nSteps$$/extract_itex$$
nSamples = x.shape$$0$$

for step in range$$extract_itex$$nSteps$$/extract_itex$$:

yModel = poly$$extract_itex$$x, w$$/extract_itex$$
grad = -2/nSamples * dpoly_dw$$extract_itex$$x, w$$/extract_itex$$.T @ $$extract_itex$$y - yModel$$/extract_itex$$

if step == 0:
print$$extract_itex$$'First step gradient:'$$/extract_itex$$
print$$extract_itex$$grad$$/extract_itex$$

mse = $$extract_itex$$(y - yModel$$/extract_itex$$**2).mean
mseTrace$$step$$ = mse

print$$extract_itex$$'Numpy took {} seconds'.format(time.time($$/extract_itex$$-startTime))


First step gradient:
$$[ -0.27402023$$
$$0.98929269$$
$$7.41287669$$
$$38.07474351$$
$$180.01689072$$]
Numpy took 13.437073707580566 seconds


In $$16$$:
plt.figure$$extract_itex$$figsize=(15, 5$$/extract_itex$$)
plt.subplot$$extract_itex$$1, 2, 1$$/extract_itex$$
plt.plot$$extract_itex$$mseTrace$$/extract_itex$$

plt.subplot$$extract_itex$$1, 2, 2$$/extract_itex$$
plt.plot$$extract_itex$$x, y$$/extract_itex$$
plt.plot$$extract_itex$$x, yModel$$/extract_itex$$;


$$heading_1$$ Now, with Pytorch¶

Now we repeat all of the above function definitions with changes needed for implementation in torch instead of numpy.

In $$17$$:
dtype = torch.FloatTensor


In $$23$$:
def poly_torch$$extract_itex$$x, w$$/extract_itex$$:
''' poly$$extract_itex$$x,w$$/extract_itex$$, where x is Nx1 samples and w is 1xD+1 coefficients for x^0, x^1, ..., x^D'''
# D = w.size
D = w.shape$$0$$
xPowers = x ** torch.arange$$extract_itex$$0.0, D$$/extract_itex$$
# return xPowers @ w
return xPowers.mm$$extract_itex$$w$$/extract_itex$$

# No changes needed from dpoly

def dpoly_dw_torch$$extract_itex$$x, w$$/extract_itex$$:
D = w.shape$$0$$
xPowers = x ** torch.arange$$extract_itex$$0.0, D$$/extract_itex$$
return xPowers


In $$24$$:
x_torch = torch.from_numpy$$extract_itex$$x$$/extract_itex$$.type$$extract_itex$$dtype$$/extract_itex$$
y_torch = torch.from_numpy$$extract_itex$$y$$/extract_itex$$.type$$extract_itex$$dtype$$/extract_itex$$


In $$25$$:
startTime = time.time

nSteps = 200000
learnRate = 0.00001
degree = 4

# w = np.zeros$$extract_itex$$(degree+1, 1$$/extract_itex$$)
w_torch = torch.zeros$$extract_itex$$(degree+1, 1$$/extract_itex$$).type$$extract_itex$$dtype$$/extract_itex$$

# mseTrace = np.zeros$$extract_itex$$nSteps$$/extract_itex$$
mseTrace = torch.zeros$$extract_itex$$nSteps$$/extract_itex$$
nSamples = x_torch.shape$$0$$

for step in range$$extract_itex$$nSteps$$/extract_itex$$:

# yModel = poly$$extract_itex$$x, w$$/extract_itex$$
# grad = -2/nSamples * dpoly$$extract_itex$$x, w$$/extract_itex$$.T @ $$extract_itex$$y - yModel$$/extract_itex$$
yModel = poly_torch$$extract_itex$$x_torch, w_torch$$/extract_itex$$
grad = -2/nSamples * dpoly_dw_torch$$extract_itex$$x_torch, w_torch$$/extract_itex$$.t.mm$$extract_itex$$y_torch - yModel$$/extract_itex$$

if step == 0:
print$$extract_itex$$'First step gradient:'$$/extract_itex$$
print$$extract_itex$$grad$$/extract_itex$$

mse = $$extract_itex$$(y_torch - yModel$$/extract_itex$$**2).mean
mseTrace$$step$$ = mse

print$$extract_itex$$'Pytorch took {} seconds'.format(time.time($$/extract_itex$$-startTime))


First step gradient:

-0.2740
0.9893
7.4129
38.0747
180.0169
$$torch.FloatTensor of size 5x1$$

Pytorch took 11.846007585525513 seconds


In $$26$$:
plt.figure$$extract_itex$$figsize=(15,5$$/extract_itex$$)
plt.subplot$$extract_itex$$1,2,1$$/extract_itex$$
plt.plot$$extract_itex$$mseTrace.numpy($$/extract_itex$$)

plt.subplot$$extract_itex$$1,2,2$$/extract_itex$$
plt.plot$$extract_itex$$x, y$$/extract_itex$$
plt.plot$$extract_itex$$x, yModel.numpy($$/extract_itex$$);


$$heading_1$$ Pytorch with Autograd¶

Now let’s remove the call to dpoly_dw_torch and use autograd to calculate the gradient of the polynomial with respect to w.

In $$28$$:
def poly_torch$$extract_itex$$x, w$$/extract_itex$$:
''' poly$$extract_itex$$x,w$$/extract_itex$$, where x is Nx1 samples and w is 1xD+1 coefficients for x^0, x^1, ..., x^D'''
# D = w.size
# D = w.shape$$0$$
D = w.data.shape$$0$$
xPowers = x ** torch.autograd.Variable$$extract_itex$$torch.arange(0.0, D$$/extract_itex$$)
# return xPowers @ w
return xPowers.mm$$extract_itex$$w$$/extract_itex$$


In $$29$$:
startTime = time.time

nSteps = 200000
learnRate = 0.00001
degree = 4

# w = np.zeros$$extract_itex$$(degree+1, 1$$/extract_itex$$)
# w_torch = torch.zeros$$extract_itex$$(degree+1, 1$$/extract_itex$$).type$$extract_itex$$dtype$$/extract_itex$$
w_torch_Var = torch.autograd.Variable$$extract_itex$$torch.zeros((degree+1, 1$$/extract_itex$$).type$$extract_itex$$dtype$$/extract_itex$$, requires_grad=True)

x_torch_Var = torch.autograd.Variable$$extract_itex$$x_torch, requires_grad=False$$/extract_itex$$
y_torch_Var = torch.autograd.Variable$$extract_itex$$y_torch, requires_grad=False$$/extract_itex$$

# mseTrace = np.zeros$$extract_itex$$nSteps$$/extract_itex$$
mseTrace = torch.zeros$$extract_itex$$nSteps$$/extract_itex$$
nSamples = x_torch_Var.data.shape$$0$$

for step in range$$extract_itex$$nSteps$$/extract_itex$$:

# yModel = poly$$extract_itex$$x, w$$/extract_itex$$
# grad = -2/nSamples * dpoly$$extract_itex$$x, w$$/extract_itex$$.T @ $$extract_itex$$y - yModel$$/extract_itex$$
yModel = poly_torch$$extract_itex$$x_torch_Var, w_torch_Var$$/extract_itex$$
mse = $$extract_itex$$(y_torch_Var - yModel$$/extract_itex$$**2).mean

if step > 0:
w_torch_Var.grad.data.zero_
mse.backward

# grad = -2/nSamples * dpoly_torch$$extract_itex$$x_torch, w_torch$$/extract_itex$$.t.mm$$extract_itex$$y_torch - yModel$$/extract_itex$$

if step == 0:
print$$extract_itex$$w_torch_Var.grad.data$$/extract_itex$$

mseTrace$$step$$ = mse.data$$0$$

print$$extract_itex$$'Pytorch with autograd took {} seconds'.format(time.time($$/extract_itex$$-startTime))


  -0.2740
0.9893
7.4129
38.0747
180.0169
$$torch.FloatTensor of size 5x1$$

Pytorch with autograd took 44.704875230789185 seconds


In $$31$$:
plt.figure$$extract_itex$$figsize=(15,5$$/extract_itex$$)
plt.subplot$$extract_itex$$1,2,1$$/extract_itex$$
plt.plot$$extract_itex$$mseTrace.numpy($$/extract_itex$$)

plt.subplot$$extract_itex$$1,2,2$$/extract_itex$$
plt.plot$$extract_itex$$x, y$$/extract_itex$$
plt.plot$$extract_itex$$x, yModel.data.numpy($$/extract_itex$$);


$$heading_1$$ Pytorch with autograd on GPU¶

To run our torch implementation on the GPU, we need to change the data type and also call cpu on variables to move them back to the CPU when needed.

First, here are the details of the GPU on this machine.

In $$28$$:
!nvidia-smi


Tue Oct 10 15:35:55 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.59                 Driver Version: 384.59                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVS 315             Off  | 00000000:02:00.0 N/A |                  N/A |
| 30%   45C    P8    N/A /  N/A |    104MiB /   964MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 00000000:03:00.0 Off |                  N/A |
| 22%   43C    P8    16W / 250W |     12MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
+-----------------------------------------------------------------------------+


In $$29$$:
!uptime


 15:35:55 up 10 days, 21:34,  1 user,  load average: 0.84, 0.40, 0.33


In $$33$$:
dtype = torch.cuda.FloatTensor


In $$32$$:
def poly_torch$$extract_itex$$x, w$$/extract_itex$$:
''' poly$$extract_itex$$x,w$$/extract_itex$$, where x is Nx1 samples and w is 1xD+1 coefficients for x^0, x^1, ..., x^D'''
# D = w.size
# D = w.shape$$0$$
dtype = x.data.type
D = w.data.shape$$0$$
xPowers = x ** torch.autograd.Variable$$extract_itex$$torch.arange(0.0, D$$/extract_itex$$.type$$extract_itex$$type(x.data$$/extract_itex$$)) # .type$$extract_itex$$dtype$$/extract_itex$$)
# return xPowers @ w
return xPowers.mm$$extract_itex$$w$$/extract_itex$$


In $$42$$:
startTime = time.time

nSteps = 200000
learnRate = 0.00001
degree = 4

# w = np.zeros$$extract_itex$$(degree+1, 1$$/extract_itex$$)
# w_torch = torch.zeros$$extract_itex$$(degree+1, 1$$/extract_itex$$).type$$extract_itex$$dtype$$/extract_itex$$
w_torch_Var = torch.autograd.Variable$$extract_itex$$torch.zeros((degree+1, 1$$/extract_itex$$).type$$extract_itex$$dtype$$/extract_itex$$, requires_grad=True)

x_torch_Var = torch.autograd.Variable$$extract_itex$$x_torch.type(torch.cuda.FloatTensor$$/extract_itex$$, requires_grad=False)
y_torch_Var = torch.autograd.Variable$$extract_itex$$y_torch.type(torch.cuda.FloatTensor$$/extract_itex$$, requires_grad=False)

# mseTrace = np.zeros$$extract_itex$$nSteps$$/extract_itex$$
mseTrace = torch.zeros$$extract_itex$$nSteps$$/extract_itex$$.type$$extract_itex$$torch.cuda.FloatTensor$$/extract_itex$$
nSamples = x_torch_Var.data.shape$$0$$

for step in range$$extract_itex$$nSteps$$/extract_itex$$:

# yModel = poly$$extract_itex$$x, w$$/extract_itex$$
# grad = -2/nSamples * dpoly$$extract_itex$$x, w$$/extract_itex$$.T @ $$extract_itex$$y - yModel$$/extract_itex$$
yModel = poly_torch$$extract_itex$$x_torch_Var, w_torch_Var$$/extract_itex$$
mse = $$extract_itex$$(y_torch_Var - yModel$$/extract_itex$$**2).mean

if step > 0:
w_torch_Var.grad.data.zero_
mse.backward

# grad = -2/nSamples * dpoly_torch$$extract_itex$$x_torch, w_torch$$/extract_itex$$.t.mm$$extract_itex$$y_torch - yModel$$/extract_itex$$

if step == 0:
print$$extract_itex$$w_torch_Var.grad.data$$/extract_itex$$

mseTrace$$step$$ = mse.data$$0$$

print$$extract_itex$$'Pytorch with autograd on GPU took {} seconds'.format(time.time($$/extract_itex$$-startTime))


  -0.2740
0.9893
7.4129
38.0747
180.0169
$$torch.cuda.FloatTensor of size 5x1 [extract_itex$$GPU 0$$/extract_itex$$]

Pytorch with autograd on GPU took 243.74856853485107 seconds


In $$52$$:
plt.figure$$extract_itex$$figsize=(15,5$$/extract_itex$$)
plt.subplot$$extract_itex$$1,2,1$$/extract_itex$$
plt.plot$$extract_itex$$mseTrace.cpu($$/extract_itex$$.numpy)

plt.subplot$$extract_itex$$1,2,2$$/extract_itex$$
plt.plot$$extract_itex$$x, y$$/extract_itex$$
plt.plot$$extract_itex$$x, yModel.data.cpu($$/extract_itex$$.numpy);


$$heading_1$$ Wrapped up in one function¶

We can use the type of the data passed into these functions to select code appropriate for use with numpy.ndarray, torch.FloatTensor, or torch.autograd.variable.Variable.

In $$43$$:
def poly$$extract_itex$$x, w$$/extract_itex$$:
''' poly$$extract_itex$$x,w$$/extract_itex$$, where x is Nx1 samples and w is 1xD+1 coefficients for x^0, x^1, ..., x^D'''
typex = type$$extract_itex$$x$$/extract_itex$$

D = w.data.shape$$0$$
exponents = torch.autograd.Variable$$extract_itex$$torch.arange(0.0, D$$/extract_itex$$.type$$extract_itex$$type(x.data$$/extract_itex$$))

elif typex is torch.FloatTensor or typex is torch.cuda.FloatTensor:
D = w.shape$$0$$
exponents = torch.arange$$extract_itex$$0.0, D$$/extract_itex$$.type$$extract_itex$$typex$$/extract_itex$$

else: # numpy
D = w.shape$$0$$
exponents = np.arange$$extract_itex$$D, dtype=x.dtype$$/extract_itex$$

xPowers = x ** exponents

if typex is np.ndarray:
return xPowers @ w
else:
return xPowers.mm$$extract_itex$$w$$/extract_itex$$

def dpoly_dw$$extract_itex$$x, w$$/extract_itex$$:
typex = type$$extract_itex$$x$$/extract_itex$$

D = w.data.shape$$0$$
exponents = torch.autograd.Variable$$extract_itex$$torch.arange(0.0, D$$/extract_itex$$.type$$extract_itex$$type(x.data$$/extract_itex$$))

elif typex is torch.FloatTensor or typex is torch.cuda.FloatTensor:
D = w.shape$$0$$
exponents = torch.arange$$extract_itex$$0.0, D$$/extract_itex$$.type$$extract_itex$$typex$$/extract_itex$$

else: # numpy
D = w.shape$$0$$
exponents = np.arange$$extract_itex$$D,dtype=x.dtype$$/extract_itex$$

return x ** exponents


In $$44$$:
def train$$extract_itex$$x, y, nSteps=200000, learnRate=0.00001, degree=4, use_torch=False, use_autograd=False, use_gpu=False$$/extract_itex$$:
startTime = time.time

nSamples = x.shape$$0$$

# Make sure use_torch is true if either of use_autograd or use_gpu is true
if use_gpu:
use_torch = True
use_torch = True

# initialize weights to be all zeros
w = np.zeros$$extract_itex$$(degree+1, 1$$/extract_itex$$, dtype=np.float32)
if use_torch:
w = torch.from_numpy$$extract_itex$$w$$/extract_itex$$
if use_gpu:
w = w.type$$extract_itex$$torch.cuda.FloatTensor$$/extract_itex$$
w = torch.autograd.Variable$$extract_itex$$w, requires_grad=True$$/extract_itex$$

# Change type of input samples, x, and targets, y
if use_torch:
x = torch.from_numpy$$extract_itex$$x$$/extract_itex$$.type$$extract_itex$$torch.FloatTensor$$/extract_itex$$
y = torch.from_numpy$$extract_itex$$y$$/extract_itex$$.type$$extract_itex$$torch.FloatTensor$$/extract_itex$$
if use_gpu:
x = x.type$$extract_itex$$torch.cuda.FloatTensor$$/extract_itex$$
y = y.type$$extract_itex$$torch.cuda.FloatTensor$$/extract_itex$$
x = torch.autograd.Variable$$extract_itex$$x, requires_grad=False$$/extract_itex$$
y = torch.autograd.Variable$$extract_itex$$y, requires_grad=False$$/extract_itex$$

# Set up array to store trace of MSE values for plotting later
mseTrace = np.zeros$$extract_itex$$nSteps, dtype=np.float32$$/extract_itex$$
if use_torch:
mseTrace = torch.from_numpy$$extract_itex$$mseTrace$$/extract_itex$$
if use_gpu:
mseTrace = mseTrace.type$$extract_itex$$torch.cuda.FloatTensor$$/extract_itex$$

# Train for nSteps passes through data set
for step in range$$extract_itex$$nSteps$$/extract_itex$$:

# Forward pass through model, for all samples in x
yModel = poly$$extract_itex$$x, w$$/extract_itex$$  # poly uses type of x to figure out what to do

# MSE, necessary for autograd. For all, needed for mseTrace.
mse = $$extract_itex$$(y - yModel$$/extract_itex$$**2).mean

# Backward pass to calculate gradient
if step > 0:
w.grad.data.zero_
mse.backward
elif use_torch:
grad = -2/nSamples * dpoly_dw$$extract_itex$$x, w$$/extract_itex$$.t.mm$$extract_itex$$y - yModel$$/extract_itex$$
else:  # must be numpy
grad = -2/nSamples * dpoly_dw$$extract_itex$$x, w$$/extract_itex$$.T @ $$extract_itex$$y - yModel$$/extract_itex$$

mseTrace$$step$$ = mse.data$$0$$
else:
mseTrace$$step$$ = mse

elapsedTime = time.time - startTime

return {'mseTrace': mseTrace, 'w': w, 'learnRate': learnRate, 'seconds': elapsedTime,


In $$45$$:
train$$extract_itex$$x, y, nSteps=200000, learnRate=0.00001, degree=4, use_torch=False, use_autograd=False, use_gpu=False$$/extract_itex$$


Out$$45$$:
{'learnRate': 1e-05,
'mseTrace': array$$extract_itex$$$$0.5265038 , 0.34783229, 0.34722364, ..., 0.00790985, 0.00790982, 0.00790979$$, dtype=float32$$/extract_itex$$,
'seconds': 10.86628246307373,
'use_gpu': False,
'use_torch': False,
'w': array$$extract_itex$$$$[ 0.31165126$$, $$0.37640169$$, $$0.26351294$$, $$-0.21930896$$, $$0.0283497$$], dtype=float32$$/extract_itex$$}

In $$46$$:
train$$extract_itex$$x, y, nSteps=200000, learnRate=0.00001, degree=4, use_torch=True, use_autograd=False, use_gpu=False$$/extract_itex$$


Out$$46$$:
{'learnRate': 1e-05, 'mseTrace':
0.5265
0.3478
0.3472
⋮
0.0079
0.0079
0.0079
$$torch.FloatTensor of size 200000$$, 'seconds': 9.932718992233276, 'use_autograd': False, 'use_gpu': False, 'use_torch': True, 'w':
0.3117
0.3764
0.2635
-0.2193
0.0283
$$torch.FloatTensor of size 5x1$$}

In $$47$$:
train$$extract_itex$$x, y, nSteps=200000, learnRate=0.00001, degree=4, use_torch=True, use_autograd=True, use_gpu=False$$/extract_itex$$


Out$$47$$:
{'learnRate': 1e-05, 'mseTrace':
0.5265
0.3478
0.3472
⋮
0.0079
0.0079
0.0079
$$torch.FloatTensor of size 200000$$, 'seconds': 87.66066813468933, 'use_autograd': True, 'use_gpu': False, 'use_torch': True, 'w': Variable containing:
0.3117
0.3764
0.2635
-0.2193
0.0283
$$torch.FloatTensor of size 5x1$$}

In $$48$$:
train$$extract_itex$$x, y, nSteps=200000, learnRate=0.00001, degree=4, use_torch=True, use_autograd=False, use_gpu=True$$/extract_itex$$


Out$$48$$:
{'learnRate': 1e-05, 'mseTrace':
0.5265
0.3478
0.3472
⋮
0.0079
0.0079
0.0079
$$torch.cuda.FloatTensor of size 200000 [extract_itex$$GPU 0$$/extract_itex$$], 'seconds': 33.87844920158386, 'use_autograd': False, 'use_gpu': True, 'use_torch': True, 'w':
0.3117
0.3764
0.2635
-0.2193
0.0283
$$torch.cuda.FloatTensor of size 5x1 [extract_itex$$GPU 0$$/extract_itex$$]}

In $$53$$:
train$$extract_itex$$x, y, nSteps=200000, learnRate=0.00001, degree=4, use_torch=True, use_autograd=True, use_gpu=True$$/extract_itex$$


Out$$53$$:
{'learnRate': 1e-05, 'mseTrace':
0.5265
0.3478
0.3472
⋮
0.0079
0.0079
0.0079
$$torch.cuda.FloatTensor of size 200000 [extract_itex$$GPU 0$$/extract_itex$$], 'seconds': 228.2244691848755, 'use_autograd': True, 'use_gpu': True, 'use_torch': True, 'w': Variable containing:
0.3117
0.3764
0.2635
-0.2193
0.0283
$$torch.cuda.FloatTensor of size 5x1 [extract_itex$$GPU 0$$/extract_itex$$]}

In $$55$$:
for use_torch, use_autograd, use_gpu in $$extract_itex$$(False, False, False$$/extract_itex$$,
$$extract_itex$$True, False, False$$/extract_itex$$,
$$extract_itex$$True, False, True$$/extract_itex$$,
$$extract_itex$$True, True, False$$/extract_itex$$,
$$extract_itex$$True, True, True$$/extract_itex$$):

result = train$$extract_itex$$x, y, nSteps=200000, learnRate=0.00001, degree=4, use_torch=use_torch, use_autograd=use_autograd, use_gpu=use_gpu$$/extract_itex$$

if not use_torch:
print$$extract_itex$$'{:20} {:6.2f} seconds, final error {:.4f}'.format('numpy', result$$'seconds'$$, result$$'mseTrace'$$$$-1$$$$/extract_itex$$)
elif not use_autograd and not use_gpu:
print$$extract_itex$$'{:20} {:6.2f} seconds, final error {:.4f}'.format('torch', result$$'seconds'$$, result$$'mseTrace'$$$$-1$$$$/extract_itex$$)
print$$extract_itex$$'{:20} {:6.2f} seconds, final error {:.4f}'.format('torch-gpu', result$$'seconds'$$, result$$'mseTrace'$$$$-1$$$$/extract_itex$$)
print$$extract_itex$$'{:20} {:6.2f} seconds, final error {:.4f}'.format('torch-autograd', result$$'seconds'$$, result$$'mseTrace'$$$$-1$$$$/extract_itex$$)
print$$extract_itex$$'{:20} {:6.2f} seconds, final error {:.4f}'.format('torch-autograd-gpu', result$$'seconds'$$, result$$'mseTrace'$$$$-1$$$$/extract_itex$$)


numpy                 10.91 seconds, final error 0.0079
torch                 10.13 seconds, final error 0.0079
torch-gpu             34.50 seconds, final error 0.0079
torch-autograd        80.43 seconds, final error 0.0079
torch-autograd-gpu   233.13 seconds, final error 0.0079


These results are obviously for a small data set and a model with very few parameters. As the size of the data set and the number of parameters increase, the advantage of the GPU will become apparent. It is, however, disappointing that autograd increases execution time about 8 times in this simple example.

I would appreciate comments on changes to my code that will result in faster autograd execution.