CS180 Project 4
Part 1: Shoot the Pictures
In this section, I took several sets of images for image mosaics and rectifications. For creating image mosaics, I took two images from different angles (left first and then right) using my phone. To ensure similar exposure through the set, I made sure to use exposure lock on my phone (press and hold the focus box until it shows AE lock). Unfortunately, I only used exposure locks for daytime images. My nightime image set was taken without the exposure lock causing the mosaiced result to have a very obvious seem due to the exposure difference. For rectification, I took images of an quadrilateral object from an angle, so that the object in appeared slanted and distorted in the image. Then, I used a series of operations to rectify the image, or change the perspective such that the object appears square/rectangular in the image. Here are some examples of the photos taken:
Images for Image Mosaic
Images for Rectification
Part 2: Recover Homographies"
Here we want to recover the homographies between the images. To achieve this effect, we need to first define a set of correspondences between the images. Then, we can use the correspondences to calculate the homographic transformation matrix.
Imagine we have 1 pair of correspondence between the images. Here is how we can find the homograhica transformation matrix. \[ \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \\ \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \\ \end{bmatrix} = \begin{bmatrix} wx' \\ wy' \\ w \\ \end{bmatrix} \] Notice that the last line of the matrix multiplication results in \[ \begin{align*} \begin{bmatrix} g & h & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \\ \end{bmatrix} &= \begin{bmatrix} w \\ \end{bmatrix} \\ \begin{bmatrix} gx & hy & 1 \\ \end{bmatrix} &= \begin{bmatrix} w \\ \end{bmatrix} \\ gx + hy + 1 &= w \end{align*} \] Then, this system of equations becomes \[ \begin{align*} \begin{bmatrix} ax & by & c \\ dx & ey & f \\ gx & hy & 1 \end{bmatrix} &= \begin{bmatrix} wx' \\ wy' \\ w \end{bmatrix} \\[10pt] \begin{bmatrix} ax & by & c \\ dx & ey & f \\ gx & hy & 1 \end{bmatrix} &= \begin{bmatrix} (gx + hy + 1)x' \\ (gx + hy + 1)y' \\ (gx + hy + 1) \end{bmatrix} \\[10pt] \begin{bmatrix} ax & by & c \\ dx & ey & f \\ gx & hy & 1 \end{bmatrix} &= \begin{bmatrix} (gxx' + hyx' + x') \\ (gxy' + hyy' + y') \\ (gx + hy + 1) \end{bmatrix} \\[10pt] \begin{bmatrix} ax & by & c \\ dx & ey & f \end{bmatrix} &= \begin{bmatrix} (gxx' + hyx' + x') \\ (gxy' + hyy' + y') \end{bmatrix} \\[10pt] \begin{bmatrix} ax & by & c & 0 & 0 & 0 & -gxx' & -hyx' \\ 0 & 0 & 0 & dx & ey & f & -gxy' & -hyy' \end{bmatrix} &= \begin{bmatrix} x' \\ y' \end{bmatrix} \\[10pt] \begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx' \\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} &= \begin{bmatrix} x' \\ y' \end{bmatrix} \\[10pt] \end{align*} \] This is thus the system of equations that if solved with linear solver, will give us the elements in the transformation matrix. However, since there are 8 unknowns, we need to construct 8 equations at minimum to solve the system. Each pair of points can give us two equations, so we need at least four correspondences to solve the system. We simply stack each pair of equations on top of each other to create a more determined matrix. \[ \begin{align*} \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1' & -y_1x_1' \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -x_1y_1' & -y_1y_1' \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2' & -y_2x_2' \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -x_2y_2' & -y_2y_2' \\ x_3 & y_3 & 1 & 0 & 0 & 0 & -x_3x_3' & -y_3x_3' \\ 0 & 0 & 0 & x_3 & y_3 & 1 & -x_3y_3' & -y_3y_3' \\ x_4 & y_4 & 1 & 0 & 0 & 0 & -x_4x_4' & -y_4x_4' \\ 0 & 0 & 0 & x_4 & y_4 & 1 & -x_4y_4' & -y_4y_4' \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} &= \begin{bmatrix} x_1' \\ y_1' \\ x_2' \\ y_2' \\ x_3' \\ y_3' \\ x_4' \\ y_4' \end{bmatrix} \\[10pt] \end{align*} \] However, to improve numerical stability, we may need to create more than 8 correspondences to form an overdetermined linear system. One down side is that we rarely have an exact solution, so to find a solution, we need to form this as a least squares optimization problem and solve the system this way.
Part 3: Warp the Images
To warp the images according to the homography matrix found using the process described in part 2, we need to follow the following steps with some caution:
-
Use the homography matrix to find the new corners of the image after the homographic transformation. Note that we need to normalize the last row of the coordinates to turn w into 1.
-
Use the new four corners and old four corners to find the smallest bounding box. To do this, we need to find the minimum/maximum x and y coordinates of the 8 points. Create the final canvas according to this shape.
-
The new canvas might not be centered around (0,0), so we need to shift the bottom left corner of the image to (0,0).
-
Get all points enclosed within the new four corners using the polygon function in Python. Use the inverse homography matrix to transform the points back into the original image. Finally use griddata to interpolate the non-integer coordinates. This process should be similar to the one in project 3.
-
Now we have warped the image into the perspective described by the homography matrix.
Part 4: Image Rectification
To rectify the images, we need to define the four corners of the rectangular object and find the correspondences between these points and four corners of a true rectangle. Then we find the homography matrix and warp the images.
Part 5: Blend the images into a mosaic
Here is where the fun begins. I can finaly use the sets of images displayed in part 1 to create three beautiful image mosaics. I first warped each image into the perspective of the second image using the warpImage function from part 3. Then I found the distance transform of each image's polygon (D1 and D2). Using D1 > D2 and D1 < D2, I found the blending mask for the first and second image. I attempted to blend the two images together just using alpha blending. However, the results turned out to have strong artificats at the seams. Therefore, I decided to use a Laplacian stack for blending using the masks described above. Here are the results.
Here is a set of extra images.
Here is a failed attempt to blend the images together. The main reason for this was because the differences in exposure between the left and right images, which caused a obvious seam the in the resulting image.
Part 6: Conclusion
From this part of the project, I learned that using exposure lock is extremely important for blending images, especially for images taken at night. In addition, while it was extremely difficult to warp the images correctly, the debugging process was phenomenal and reminded me the power of pen and paper debugging.
Autostitching
In this part of the project, I'm going to stictch together a mosaic using automatic approaches.
Part 7: Adaptive Non-Maximal Suppression and Harris Interest Point Detector
I first found harris corners using the provided code. Then, I used the adaptive non-maximal suppression algorithm specified in the paper At first, I implemented the algorithm naively with two for loops. Unfortunately, on my local architecture, it took around 10 mins to process one image. It forced me to think of ways to optimize the process. Using dist2 and a vector outer product, I successfully remove the need for any for loop and achieved a dramatic performance improvement (from 10 mins to under a second for one image).
Harris Corner Detection
Adaptive Non-Maximal Suppression
Look! We have reduced so many gibberish points from the images!
Part 8: Feature Descriptor Extraction
Now that we have a manageable number of corners, we can extract the feature descriptors for them. First, for each point, I found the neighboring 20 pixels in each direction (40 by 40 where the corner point is in the center). For corners that are too close to the edge, I just discarded them, but this was not necessary because the harris corner function only consider points at least 20 pixels away from any edge. After identifying the interesting windows, I rescaled them to 8 x 8 windows using Gaussian blur. All of these were done in a 3-channel color space. Here are some sample results for the car photo.
Part 9: Feature Matching
Let's match feature desciptors together! For each feature in image 1, we will find the nearest and second nearest neighboring feature in image 2. The distance criterion is the Euclidean distance. After finding the top two nearest neighbors, we can find a truely similar pair by using the Lowe ratio test, where we only keep the pair if \[ \frac{\text{dist to nn1}}{\text{dist to nn2}} < \text{threshold} \] After iterating through all the feature descriptors in image 1 and automaticaly in image 2, we have all the valid pairing in our hands.
Part 10: RANSAC
To find a good homography matrix, we need to use RANSAC. For n number of iterations (default to 1000), we conduct the following operations.
- Randomly sample 4 feature pairs (done by sampling 4 indices and indexing into the feature arrays).
- Computer an exact Homography using the chosen points (with the matrix inverse).
- Compute the number of inliers by checking if the Euclidean distance between the target points and the transformed pairs is less than a threshold (default to 1).
- Update the largest set of inliers.
This way, we will have the largest set of points (inliers) that aggree with one homography. To find that homography H, we use the least square solver, as the number of inliers are most likely to be larger than 4. From here, we can use this H to transform the images using the process described in first half of the project.
Here are the pairs of inliers after RANSAC.
Part 11: Auto Stitching Results
Here are some successful results using auto stitching.
For the plushie example, here are the original images.
Here are some failed results using auto stitching.
Part 12: Conclusion
This project taught me a lot about the need to vectorize operations. I could not have finished project without vectorizing the ANMS algorithm. I also learned how powerful RANSAC is. Originally, I mistakenly used two sets of independent samples from image 1 features and image 2 features to find inliers. This meant that I just used random points to compute the homographies. Somehow RANSAC still produced relatively reasonable results. I am fully impressed by how powerful RANSAC is.