Read, understand, and complete and run the following notebook. You must return the completed notebook, including your answers and illustrations (you may need to add cells to write your code or comments).
To execute a notebook, you will need to install jupyter. If you are using anaconda (strongly advised). If you cannot/don't want to use notebooks, you can return both your python code and a report in pdf. Note that all the TPs will use python2. If you are using python3, you might need to do a few changes to the provided code.
Return your work by e-mail using a single file (ipynb or zip) with the format 'introvis17_tp1_yourname.ipynb'
The first part is about doing basic image processing in Python, the second breaks down the different steps in implementing the Canny edge detector.
Here are a set of useful libraries for image processing. You will need to refer to the online documentation of the different libraries to complete the TP.
import numpy as np
# this is the key library for manipulating arrays. Use the online ressources! http://www.numpy.org/
import matplotlib.pyplot as plt
# used to read images, display and plot http://matplotlib.org/api/pyplot_api.html .
# You can also check this simple intro to using ipython notebook with images
# https://matplotlib.org/users/image_tutorial.html .
%matplotlib inline
# to display directly in the notebook
import scipy.ndimage as ndimage
# one of several python libraries for image procession
plt.rcParams['image.cmap'] = 'gray'
# by default, the grayscale images are displayed with the jet colormap: use grayscale instead
Here is a simple example how to read and display an image (you need to first download this image)
name='lena.jpg'
I=plt.imread(name)
plt.imshow(I)
I
? Its type? The type of the values it contains? What's its range?¶# complete this part.
# Tip: look for numpy attributes (e.g. https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html)
# and functions to do what you want (e.g. np.min)
# /!\: I will resort to PYTHON3
from IPython.display import display, Markdown
# data type range of numpy dtypes
dtype_range = {np.bool_.__name__: (False, True),
np.bool8.__name__: (False, True),
np.uint8.__name__: (0, 255),
np.uint16.__name__: (0, 65535),
np.int8.__name__: (-128, 127),
np.int16.__name__: (-32768, 32767),
np.int64.__name__: (-2**63, 2**63 - 1),
np.uint64.__name__: (0, 2**64 - 1),
np.int32.__name__: (-2**31, 2**31 - 1),
np.uint32.__name__: (0, 2**32 - 1),
np.float32.__name__: (-1, 1),
np.float64.__name__: (-1, 1)}
display(Markdown("- **Size of `I`:** `{}` \n"\
"- Its *type*: `{}` \n"\
"- The **type of its values**: `{}` \n"\
" - whose **range** is: `{}` \n"\
"- Its **shape**: `{}`"\
.format(I.size, type(I).__name__, I.dtype, dtype_range[str(I.dtype)], I.shape)))
def load_image_v0(name,crop_window=-1):
I=plt.imread(name)
if crop_window!=-1:
I=I[crop_window[0]:crop_window[1],crop_window[2]:crop_window[3]] # slices the image according to crop_window
I=I.astype('float')/255
return I
I=I.astype('float')
doing? Use the function to extract and save the 100x50 pixels of the bottom left corner of 'lena'.¶The line I=I.astype('float')
casts the values of I
to the type float
, so that the division performed afterwards is a floating point one, and the resulting float image is restricted to the range (-1, 1)
.
dh, dw = 100, 50 # size of the bottom-left cropped image
height, width, _ = I.shape
crop_window = [height-dh, height, 0, dw]
I_cropped = load_image_v0(name, crop_window)
plt.imshow(I_cropped)
def load_image(name,crop_window=-1):
I = load_image_v0(name, crop_window)
# if the input is already grayscale,
# i.e. if it's a 2D array, just return it
if I.ndim == 2:
return I
else:
# it's a color image (3D array: there's an additional dimension for the RGB coordinates)
# The following dot product is a weighted sum over the RGB coordinates (last axis) of I,
# whose weights are the values of [0.2989, 0.5870, 0.1140], which is what we want
# (cf https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html)
return np.dot(I, np.array([0.2989, 0.5870, 0.1140]))
I_cropped_grayscaled = load_image(name, crop_window)
plt.imshow(I_cropped_grayscaled)
Use your function to load a grayscale version of lena and save it.
I = load_image(name)
plt.imshow(I)
Ib = ndimage.gaussian_filter(I, sigma=3, order=1)
display(Markdown("> The size of the output **is** {} the same as the input"\
.format("**not**"*(Ib.size != I.size))))
plt.imshow(Ib)
mode
paramater, whose default value is reflect
. That is, when applying the filter to the borders, the values of the last row/column are reflected across the image's boundary.mode
parameter has to be set to constant
(constant padding), the constant value (determined by the cval
paramater) being 0.0
by default.Ib = ndimage.gaussian_filter(I, sigma=3, mode='constant')
plt.imshow(Ib)
The black outline is due to the border conditions being set to zero padding: on the borders, the gaussian window pixels outside the boundary are black (constant value 0
), resulting in a blacker weighted average.
k = np.array([[-1./9, -1./9, -1./9],
[-1./9, 2.-1/9., -1./9],
[-1./9, -1./9, -1./9]])
Is=ndimage.convolve(I, k)
plt.imshow(Is)
As seen in class (page 56), this linear filter sharpens the image (by accentuating the difference with the local average):
dirac_two = np.zeros((3, 3))
dirac_two[1,1] = 2
k = dirac_two - 1./9 * np.ones((3, 3))
$$k = \underbrace{\begin{pmatrix} 0 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 0 \\ \end{pmatrix}\;}_{\text{accentuation}} \;\; - \; \underbrace{\begin{pmatrix} \frac 1 9 & \frac 1 9 & \frac 1 9 \\ \frac 1 9 & \frac 1 9 & \frac 1 9 \\ \frac 1 9 & \frac 1 9 & \frac 1 9 \end{pmatrix}\;}_{\text{local average}}$$
compute_gradient
that returns three images containing an image derivative along the horizontal direction, the vertical direction, and the gradient norm. It should use forward differences, and take an optional argument sigma
that gives the scale of the derivative. What is the size of the output compared to the size of the input?¶def compute_gradient(I, sigma=3):
# derivative along horizontal (resp. vertical) direction:
# we use the fact that d/dx(G * f) = d/dx(G) * f (resp. d/dx(G * f) = d/dx(G) * f)
# where G is a gaussian kernel
I_horiz = ndimage.gaussian_filter1d(I, sigma, order=1, axis=1)
I_vert = ndimage.gaussian_filter1d(I, sigma, order=1, axis=0)
# ||grad f|| = sqrt((df/dx)^2 + (df/dy)^2)
I_grad = np.sqrt(I_vert**2 + I_horiz**2)
return I_horiz, I_vert, I_grad
for img, label in zip(compute_gradient(I), ["Horizontal derivative", "Vertical derivative", "Gradient norm"]):
display(Markdown("#### {}".format(label)))
plt.imshow(img)
plt.show()
display(Markdown("> The size of the output **is** {} the same as the input\n \n_______"\
.format("**not**"*(img.size != I.size))))
Tip: if you want to display several images in a single cell, use
plt.show()
after each call toplt.imshow(...)
sigma_range = range(1, 5)
for sigma in sigma_range:
display(Markdown("#### `sigma = {}`".format(sigma)))
fig = plt.figure(figsize = (15., 15.))
# for sigma=sigma, retrieve the images returned by compute_gradient
# and add them to a subplot
for (index, img), label in zip(enumerate(compute_gradient(I, sigma=sigma)), \
["Horizontal derivative", "Vertical derivative", "Gradient norm"]):
fig.add_subplot(1, 3, index+1)\
.set_title(label)
plt.imshow(img)
plt.axis('off')
plt.show()
display(Markdown("\n \n_______"))
The greater
sigma
is, the blurrier the image is and the thicker the contours are (since the gaussian standard deviation is larger).
def gradient_normalized_01(I, sigma=3):
# compute the gradient norm array
# and have its values range between 0. and 1.
I_gradient = compute_gradient(I, sigma)[2]
min_val, max_val= np.min(I_gradient), np.max(I_gradient)
I_gradient = (I_gradient - min_val)/(max_val - min_val)
return I_gradient
plt.hist(gradient_normalized_01(I).flatten())
plt.title("Histogram: values of pixels of the gradient norm, normalized to range 0-1 ")
plt.show()
def binary_threshold(I, threshold=0.5, sigma=3):
# 0 <= threshold <= 1: applied to the normalized to range 0-1 gradient norm
# Blackening the pixels whose value <= threshold (by setting them to 0.)
# Whitening those whose value > threshold (by setting them to 1.)
return (gradient_normalized_01(I, sigma) > threshold).astype('float')
sigma_range = range(1, 6)
thresholds = [0.1, 0.2, 0.4, 0.6]
len_sigma_range = len(sigma_range)
for threshold in thresholds:
display(Markdown("#### `threshold = {}`".format(threshold)))
fig = plt.figure(figsize = (17., 17.))
for index, sigma in enumerate(sigma_range):
fig.add_subplot(1, len_sigma_range, index+1)\
.set_title("sigma = {}".format(sigma))
plt.imshow(binary_threshold(I, threshold=threshold, sigma=sigma))
plt.axis('off')
plt.show()
display(Markdown("\n \n_______"))
threshold
is, the more edges we are filtering out (those whose normalized (to range 0-1) gradient norm is less than the threshold).The larger sigma
is:
nms(gradient,gradient_norm,threshold)
which takes as input the gradient, its norm and a threshold and outputs a binary image with value 1 only for pixels that correspond to a maximum in the direction of the gradient.¶def nms(gradient, gradient_norm, threshold, binary_output=True):
# normalize the gradient norm to range 0-1 and threshold it
min_val, max_val = np.min(gradient_norm), np.max(gradient_norm)
gradient_norm = (gradient_norm - min_val)/(max_val - min_val)
gradient_norm *= (gradient_norm > threshold).astype('float')
def angle_to_orientation(angle):
angle_pi8 = 8*angle/np.pi
# gradient's direction: east -- west
if -1 <= angle_pi8 < 1 or 7 <= angle_pi8 or angle_pi8 < -7:
return np.array([1, 0])
# north-east -- south-west
if 1 <= angle_pi8 < 3 or -7 <= angle_pi8 < -5:
return np.array([1, 1])
# north -- south
if 3 <= angle_pi8 < 5 or -5 <= angle_pi8 < -3:
return np.array([0, 1])
# north-west -- south-east
if 5 <= angle_pi8 < 7 or -3 <= angle_pi8 < -1:
return np.array([1, -1])
gradient_angle = np.arctan2(gradient[0], gradient[1])
shape = np.array(gradient_norm.shape)
output = np.zeros(shape)
for ij, p in np.ndenumerate(gradient_norm):
if p == 0:
continue
# in the pixel p is not black, compare it to its surrounding values along the gradient direction
vec = angle_to_orientation(gradient_angle[ij])
if (np.logical_or(ij + vec < 0, ij + vec >= shape).any() or p>=gradient_norm[tuple(ij + vec)])\
and (np.logical_or(ij - vec < 0, ij - vec >= shape).any() or p>=gradient_norm[tuple(ij - vec)]):
output[ij] = 1 if binary_output else gradient_norm[ij]
# binary_output (put to False) will be useful later in canny_edges,
# not to call nms two times
return output
gradient_hor, gradient_vert, gradient_norm = compute_gradient(I)
plt.imshow(nms([gradient_hor, gradient_vert], gradient_norm, 0.2))
plt.show()
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
gradient_hor, gradient_vert, gradient_norm = compute_gradient(I)
@interact(threshold=widgets.FloatSlider(min=0.0,max=0.8,step=0.01,value=0.15))
def show_nms(threshold):
plt.figure(figsize = (10., 10.))
plt.imshow(nms([gradient_hor, gradient_vert], gradient_norm, threshold))
plt.show()
The higher the threshold is, the less edges there are. We have to find a compromise between:
threshold > 0.2
threshold < 0.13
c
to implement a function computing the Canny edges.¶Here is one way to do the hysteresis thresholding. Keep a list of the edges for which you have to visit neighbors. Initialize the list with the edges corresponding to the most discriminative threshold. For each edge you are sure of (i.e. in the list), you have to check if its neighbors are considered edges using the less discriminative threshold. If they are, add them to the output edges and in the list of edges to visit.
def canny_edges(I, sigma=3, low=0.1, high=0.4, binary_output=True):
gradient_hor, gradient_vert, gradient_norm = compute_gradient(I, sigma=sigma)
# Non-maximum suppression + suppress the edges which are below the low threshold
gradient_nms = nms([gradient_hor, gradient_vert], gradient_norm, threshold=low, binary_output=False)
shape = gradient_nms.shape
# As binary_output=False when calling nms in gradient_nms,
# no need to resort to nms again to compute the "high" edges (the following suffices)
output = gradient_nms * (gradient_nms > high)
# Initialize the list with the edges corresponding to the high threshold (the "high" edges)
stack = [(i, j) for (i, j), p in np.ndenumerate(gradient_nms) if p > high]
already_seen = set(stack)
def neighbors(ij):
i0, j0 = ij
# considering only unvisited neighboring edges
indices = set((i, j) for i in range(i0-1, i0+2) for j in range(j0-1, j0+2)\
if 0 <= i < shape[0] and 0 <= j < shape[1] and (i, j) not in already_seen)
return indices
while stack:
ij = stack.pop()
neigh = neighbors(ij)
for ind in neigh:
# check if the neighbor is a "low" edge using the low threshold
# NB: gradient_nms[ind] <= high is necessary, since all the "high" edges
# are in already_seen (and thus not added to neighbors(ij))
if low < gradient_nms[ind]:
output[ind] = gradient_nms[ind]
stack.append(ind)
already_seen.update(neigh)
return output if not binary_output else output.astype('bool').astype('float') # binary image if wanted
@interact(sigma=widgets.FloatSlider(min=0.1,max=6,step=0.1,value=2),\
low=widgets.FloatSlider(min=0.,max=0.5,step=0.01,value=0.01),\
high=widgets.FloatSlider(min=0.1,max=0.7,step=0.01,value=0.15))
def show_canny(sigma, low, high):
plt.figure(figsize = (10., 10.))
plt.imshow(canny_edges(I, sigma=sigma, low=low, high=high))
plt.show()
sigma
¶sigma
(sigma < 2
) results in too much noise (not eliminated due to a too small gaussian standard deviation)sigma
(sigma > 4
) results in less relevant edges: as the borders in the original blurred image are thicker, the non-maximum suppression may select pixels stemming from the blurring (as, for a given edge, it only selects the brighter pixel).high
and low
thresholds¶high
threshold is a compromise between losing too many outlining edges (too high threshold) and increasing the number of noise edges (too low).high
threshold, finding a good low
threshold is a compromise between "reasonably" extending the main "high" edges (stemming from the high threshold) and between extending them to such a point that we add unwanted noise.