TP1: Canny Edges

Read, understand, and complete and run the following notebook. You must return the completed notebook, including your answers and illustrations (you may need to add cells to write your code or comments).

To execute a notebook, you will need to install jupyter. If you are using anaconda (strongly advised). If you cannot/don't want to use notebooks, you can return both your python code and a report in pdf. Note that all the TPs will use python2. If you are using python3, you might need to do a few changes to the provided code.

Return your work by e-mail using a single file (ipynb or zip) with the format 'introvis17_tp1_yourname.ipynb'

The first part is about doing basic image processing in Python, the second breaks down the different steps in implementing the Canny edge detector.

1. Basic image processing in Python

Here are a set of useful libraries for image processing. You will need to refer to the online documentation of the different libraries to complete the TP.

In [1]:
import numpy as np 
# this is the key library for manipulating arrays. Use the online ressources! http://www.numpy.org/

import matplotlib.pyplot as plt 
# used to read images, display and plot http://matplotlib.org/api/pyplot_api.html . 
# You can also check this simple intro to using ipython notebook with images 
# https://matplotlib.org/users/image_tutorial.html .

%matplotlib inline 
# to display directly in the notebook

import scipy.ndimage as ndimage
# one of several python libraries for image procession

plt.rcParams['image.cmap'] = 'gray' 
# by default, the grayscale images are displayed with the jet colormap: use grayscale instead

Here is a simple example how to read and display an image (you need to first download this image)

In [2]:
name='lena.jpg'
I=plt.imread(name)
plt.imshow(I)
Out[2]:
<matplotlib.image.AxesImage at 0x1814292320>

a- What's the size of the image I? Its type? The type of the values it contains? What's its range?

In [3]:
# complete this part. 
# Tip: look for numpy attributes (e.g. https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html) 
# and functions to do what you want (e.g. np.min)


# /!\: I will resort to PYTHON3
from IPython.display import display, Markdown

# data type range of numpy dtypes
dtype_range = {np.bool_.__name__: (False, True),
               np.bool8.__name__: (False, True),
               np.uint8.__name__: (0, 255),
               np.uint16.__name__: (0, 65535),
               np.int8.__name__: (-128, 127),
               np.int16.__name__: (-32768, 32767),
               np.int64.__name__: (-2**63, 2**63 - 1),
               np.uint64.__name__: (0, 2**64 - 1),
               np.int32.__name__: (-2**31, 2**31 - 1),
               np.uint32.__name__: (0, 2**32 - 1),
               np.float32.__name__: (-1, 1),
               np.float64.__name__: (-1, 1)}


display(Markdown("- **Size of `I`:** `{}` \n"\
                 "- Its *type*: `{}` \n"\
                 "- The **type of its values**: `{}` \n"\
                 "    - whose **range** is: `{}` \n"\
                 "- Its **shape**: `{}`"\
      .format(I.size, type(I).__name__, I.dtype, dtype_range[str(I.dtype)], I.shape)))
  • Size of I: 786432
  • Its type: ndarray
  • The type of its values: uint8
    • whose range is: (0, 255)
  • Its shape: (512, 512, 3)

b- Here is a simple function to load an image and crop it according to a bounding box. Make sure you understand every line.

In [4]:
def load_image_v0(name,crop_window=-1): 
    I=plt.imread(name)
    if crop_window!=-1:
        I=I[crop_window[0]:crop_window[1],crop_window[2]:crop_window[3]] # slices the image according to crop_window
    I=I.astype('float')/255
    return I

What is the line I=I.astype('float') doing? Use the function to extract and save the 100x50 pixels of the bottom left corner of 'lena'.

The line I=I.astype('float') casts the values of I to the type float, so that the division performed afterwards is a floating point one, and the resulting float image is restricted to the range (-1, 1).

In [5]:
dh, dw = 100, 50 # size of the bottom-left cropped image 
height, width, _ = I.shape 
crop_window = [height-dh, height, 0, dw]

I_cropped = load_image_v0(name, crop_window)
plt.imshow(I_cropped)
Out[5]:
<matplotlib.image.AxesImage at 0x181457f240>

c- Modify the following function so that it can load an RGB image and convert it to a grayscale image using the formula L=0.2989 R + 0.5870 G + 0.1140 * B, where R, G and B are the three color channels of the color image and L is the luminance (i.e. the grayscale image). Include a test so that your function also works if the input image is grayscale.

In [6]:
def load_image(name,crop_window=-1): 
    I = load_image_v0(name, crop_window)
    
    # if the input is already grayscale,
    # i.e. if it's a 2D array, just return it
    if I.ndim == 2:
        return I
    else: 
        # it's a color image (3D array: there's an additional dimension for the RGB coordinates) 
        # The following dot product is a weighted sum over the RGB coordinates (last axis) of I, 
        #  whose weights are the values of [0.2989, 0.5870, 0.1140], which is what we want
        # (cf https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html)
        return np.dot(I, np.array([0.2989, 0.5870, 0.1140]))

I_cropped_grayscaled = load_image(name, crop_window)
plt.imshow(I_cropped_grayscaled)
Out[6]:
<matplotlib.image.AxesImage at 0x1814614160>

Use your function to load a grayscale version of lena and save it.

In [7]:
I = load_image(name)
plt.imshow(I)
Out[7]:
<matplotlib.image.AxesImage at 0x1814ebb898>

d- Here is an example of how to convolve an image with a Gaussian. What is the size of the output? What are the border conditions? Replace the border conditions with zero padding and comment on the result

In [8]:
Ib = ndimage.gaussian_filter(I, sigma=3, order=1)

display(Markdown("> The size of the output **is** {} the same as the input"\
      .format("**not**"*(Ib.size != I.size))))

plt.imshow(Ib)

The size of the output is the same as the input

Out[8]:
<matplotlib.image.AxesImage at 0x1815b47748>
  1. The output is of the same shape (and size) as the input.
  2. The border conditions are determined by the mode paramater, whose default value is reflect. That is, when applying the filter to the borders, the values of the last row/column are reflected across the image's boundary.
  3. To replace the border conditions with zero padding, the mode parameter has to be set to constant (constant padding), the constant value (determined by the cval paramater) being 0.0 by default.
In [9]:
Ib = ndimage.gaussian_filter(I, sigma=3, mode='constant')
plt.imshow(Ib)
Out[9]:
<matplotlib.image.AxesImage at 0x181634fcc0>

The black outline is due to the border conditions being set to zero padding: on the borders, the gaussian window pixels outside the boundary are black (constant value 0), resulting in a blacker weighted average.

e- What does the following code do?

In [10]:
k = np.array([[-1./9, -1./9, -1./9],
               [-1./9, 2.-1/9., -1./9],
               [-1./9, -1./9, -1./9]])
Is=ndimage.convolve(I, k)
plt.imshow(Is)
Out[10]:
<matplotlib.image.AxesImage at 0x18165fdc18>

As seen in class (page 56), this linear filter sharpens the image (by accentuating the difference with the local average):

dirac_two = np.zeros((3, 3))
dirac_two[1,1] = 2
k = dirac_two - 1./9 * np.ones((3, 3))

$$k = \underbrace{\begin{pmatrix} 0 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 0 \\ \end{pmatrix}\;}_{\text{accentuation}} \;\; - \; \underbrace{\begin{pmatrix} \frac 1 9 & \frac 1 9 & \frac 1 9 \\ \frac 1 9 & \frac 1 9 & \frac 1 9 \\ \frac 1 9 & \frac 1 9 & \frac 1 9 \end{pmatrix}\;}_{\text{local average}}$$

f- Write a function compute_gradient that returns three images containing an image derivative along the horizontal direction, the vertical direction, and the gradient norm. It should use forward differences, and take an optional argument sigma that gives the scale of the derivative. What is the size of the output compared to the size of the input?

In [11]:
def compute_gradient(I, sigma=3):
    # derivative along horizontal (resp. vertical) direction:
    # we use the fact that d/dx(G * f) = d/dx(G) * f (resp. d/dx(G * f) = d/dx(G) * f)
    # where G is a gaussian kernel
    I_horiz = ndimage.gaussian_filter1d(I, sigma, order=1, axis=1)
    I_vert = ndimage.gaussian_filter1d(I, sigma, order=1, axis=0)
    
    # ||grad f|| = sqrt((df/dx)^2 + (df/dy)^2)
    I_grad = np.sqrt(I_vert**2 + I_horiz**2)
    
    return I_horiz, I_vert, I_grad

for img, label in zip(compute_gradient(I), ["Horizontal derivative", "Vertical derivative", "Gradient norm"]):
    display(Markdown("#### {}".format(label)))
    plt.imshow(img)
    plt.show()
    display(Markdown("> The size of the output **is** {} the same as the input\n \n_______"\
      .format("**not**"*(img.size != I.size))))

Horizontal derivative

The size of the output is the same as the input


Vertical derivative

The size of the output is the same as the input


Gradient norm

The size of the output is the same as the input


g- Visualize the three images returned by your function for different values of sigma. Comment.

Tip: if you want to display several images in a single cell, use plt.show() after each call to plt.imshow(...)

In [12]:
sigma_range = range(1, 5)

for sigma in sigma_range:
    display(Markdown("#### `sigma = {}`".format(sigma)))

    fig = plt.figure(figsize = (15., 15.))
    
    # for sigma=sigma, retrieve the images returned by compute_gradient
    # and add them to a subplot
    for (index, img), label in zip(enumerate(compute_gradient(I, sigma=sigma)), \
                                   ["Horizontal derivative", "Vertical derivative", "Gradient norm"]):
        fig.add_subplot(1, 3, index+1)\
           .set_title(label)
        plt.imshow(img)
        plt.axis('off')
    plt.show()
    display(Markdown("\n \n_______"))

sigma = 1


sigma = 2


sigma = 3


sigma = 4


The greater sigma is, the blurrier the image is and the thicker the contours are (since the gaussian standard deviation is larger).

2. Canny edges

a- Compute a binary image corresponding to thresholding the norm of the gradient. Discuss the parameters (there are two) and their influence on the results.

In [13]:
def gradient_normalized_01(I, sigma=3):
    # compute the gradient norm array
    # and have its values range between 0. and 1.
    I_gradient = compute_gradient(I, sigma)[2]
    min_val, max_val= np.min(I_gradient), np.max(I_gradient)
    I_gradient = (I_gradient - min_val)/(max_val - min_val)
    return I_gradient



plt.hist(gradient_normalized_01(I).flatten())
plt.title("Histogram: values of pixels of the gradient norm, normalized to range 0-1 ")
plt.show()

def binary_threshold(I, threshold=0.5, sigma=3):
    # 0 <= threshold <= 1: applied to the normalized to range 0-1 gradient norm
    # Blackening the pixels whose value <= threshold (by setting them to 0.)
    # Whitening those whose value > threshold (by setting them to 1.)
    return (gradient_normalized_01(I, sigma) > threshold).astype('float')


sigma_range = range(1, 6)
thresholds = [0.1, 0.2, 0.4, 0.6]

len_sigma_range = len(sigma_range)

for threshold in thresholds:
    display(Markdown("#### `threshold = {}`".format(threshold)))
    fig = plt.figure(figsize = (17., 17.))

    for index, sigma in enumerate(sigma_range):
        fig.add_subplot(1, len_sigma_range, index+1)\
           .set_title("sigma = {}".format(sigma))
        plt.imshow(binary_threshold(I, threshold=threshold, sigma=sigma))
        plt.axis('off')
        
    plt.show()
    display(Markdown("\n \n_______"))

threshold = 0.1


threshold = 0.2


threshold = 0.4


threshold = 0.6


  1. The greater the threshold is, the more edges we are filtering out (those whose normalized (to range 0-1) gradient norm is less than the threshold).
  2. The larger sigma is:

    • the more edges there are (since the edges in the blurred image are thicker (we are locally averaging over more neighbors))
    • and the thicker they are

b- Write a function nms(gradient,gradient_norm,threshold) which takes as input the gradient, its norm and a threshold and outputs a binary image with value 1 only for pixels that correspond to a maximum in the direction of the gradient.

In [119]:
def nms(gradient, gradient_norm, threshold, binary_output=True):
    # normalize the gradient norm to range 0-1 and threshold it
    min_val, max_val = np.min(gradient_norm), np.max(gradient_norm)
    gradient_norm = (gradient_norm - min_val)/(max_val - min_val)
    gradient_norm *= (gradient_norm > threshold).astype('float')
        
    def angle_to_orientation(angle):
        angle_pi8 = 8*angle/np.pi
        # gradient's direction: east -- west
        if -1 <= angle_pi8 < 1 or 7 <= angle_pi8 or angle_pi8 < -7:
            return np.array([1, 0])
        
        # north-east -- south-west
        if 1 <= angle_pi8 < 3 or -7 <= angle_pi8 < -5:
            return np.array([1, 1])
        
        # north -- south
        if 3 <= angle_pi8 < 5 or -5 <= angle_pi8 < -3:
            return np.array([0, 1])
        
        # north-west -- south-east
        if 5 <= angle_pi8 < 7 or -3 <= angle_pi8 < -1:
            return np.array([1, -1])
        
    gradient_angle = np.arctan2(gradient[0], gradient[1])
    
    shape = np.array(gradient_norm.shape)
    output = np.zeros(shape)
    
    for ij, p in np.ndenumerate(gradient_norm):
        if p == 0:
            continue
        # in the pixel p is not black, compare it to its surrounding values along the gradient direction
        vec = angle_to_orientation(gradient_angle[ij])
        if (np.logical_or(ij + vec < 0, ij + vec >= shape).any() or p>=gradient_norm[tuple(ij + vec)])\
        and (np.logical_or(ij - vec < 0, ij - vec >= shape).any() or p>=gradient_norm[tuple(ij - vec)]):
            output[ij] = 1 if binary_output else gradient_norm[ij]
            # binary_output (put to False) will be useful later in canny_edges,
            # not to call nms two times
    return output
        

gradient_hor, gradient_vert, gradient_norm = compute_gradient(I)
plt.imshow(nms([gradient_hor, gradient_vert], gradient_norm, 0.2))
plt.show()

c- Test your function on Lena for different values of the threshold, comment the results.

In [120]:
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

gradient_hor, gradient_vert, gradient_norm = compute_gradient(I)

@interact(threshold=widgets.FloatSlider(min=0.0,max=0.8,step=0.01,value=0.15))

def show_nms(threshold):
    plt.figure(figsize = (10., 10.))
    plt.imshow(nms([gradient_hor, gradient_vert], gradient_norm, threshold))
    plt.show()

The higher the threshold is, the less edges there are. We have to find a compromise between:

  • a higher threshold, resulting in losing an increasing number of important edges (outlining the big picture): particularly for threshold > 0.2
  • a lower threshold, resulting in increasing the number of noise edges: especially for threshold < 0.13

d- Add the hysteresis thresholding to c to implement a function computing the Canny edges.

Here is one way to do the hysteresis thresholding. Keep a list of the edges for which you have to visit neighbors. Initialize the list with the edges corresponding to the most discriminative threshold. For each edge you are sure of (i.e. in the list), you have to check if its neighbors are considered edges using the less discriminative threshold. If they are, add them to the output edges and in the list of edges to visit.

In [116]:
def canny_edges(I, sigma=3, low=0.1, high=0.4, binary_output=True):
    gradient_hor, gradient_vert, gradient_norm = compute_gradient(I, sigma=sigma)
    
    # Non-maximum suppression + suppress the edges which are below the low threshold
    gradient_nms = nms([gradient_hor, gradient_vert], gradient_norm, threshold=low, binary_output=False)
    shape = gradient_nms.shape
    
    # As binary_output=False when calling nms in gradient_nms,
    # no need to resort to nms again to compute the "high" edges (the following suffices)
    output = gradient_nms * (gradient_nms > high)
    
    # Initialize the list with the edges corresponding to the high threshold (the "high" edges)
    stack = [(i, j) for (i, j), p in np.ndenumerate(gradient_nms) if p > high]
    already_seen = set(stack)
    
    def neighbors(ij):
        i0, j0 = ij
        # considering only unvisited neighboring edges
        indices = set((i, j) for i in range(i0-1, i0+2) for j in range(j0-1, j0+2)\
                             if 0 <= i < shape[0] and 0 <= j < shape[1] and (i, j) not in already_seen)
        return indices
    
    while stack:
        ij = stack.pop()
        neigh = neighbors(ij)        
        for ind in neigh:
            # check if the neighbor is a "low" edge using the low threshold
            # NB: gradient_nms[ind] <= high is necessary, since all the "high" edges 
            # are in already_seen (and thus not added to neighbors(ij))
            if low < gradient_nms[ind]:
                output[ind] = gradient_nms[ind]
                stack.append(ind)        
        already_seen.update(neigh)

    return output if not binary_output else output.astype('bool').astype('float') # binary image if wanted

e- Use your algorithm with different parameters and comment on their effects.

In [115]:
@interact(sigma=widgets.FloatSlider(min=0.1,max=6,step=0.1,value=2),\
          low=widgets.FloatSlider(min=0.,max=0.5,step=0.01,value=0.01),\
          high=widgets.FloatSlider(min=0.1,max=0.7,step=0.01,value=0.15))

def show_canny(sigma, low, high):
    plt.figure(figsize = (10., 10.))
    plt.imshow(canny_edges(I, sigma=sigma, low=low, high=high))
    plt.show()

sigma

  • a too low sigma (sigma < 2) results in too much noise (not eliminated due to a too small gaussian standard deviation)
  • a too high sigma (sigma > 4) results in less relevant edges: as the borders in the original blurred image are thicker, the non-maximum suppression may select pixels stemming from the blurring (as, for a given edge, it only selects the brighter pixel).

high and low thresholds

  • finding a good high threshold is a compromise between losing too many outlining edges (too high threshold) and increasing the number of noise edges (too low).
  • given a fixed high threshold, finding a good low threshold is a compromise between "reasonably" extending the main "high" edges (stemming from the high threshold) and between extending them to such a point that we add unwanted noise.