A firm believer that "colored photagraphy was the future," Sergei Mikhailovich Prokudin-Gorskii, a Russian photographer, documented the colorful world with a set of images for each of the three color channels: red, blue, and green. The technology at the time could not successfully reconstruct the color-decomposed photagraph into colored images. Luckily, with the advant of computer vision, we can, more than 100 years later, reconstruct the colored photos and look into the deep time with vivid colors.
First I implemented the exhaustive search for low resolution images. I chose one color (green) as the base layer and attempted to align the other two layers on the base layer one at a time. To align the layer pairs, I first experimented with a naive exhaustive search without any preprocessing. I simply looped over every single shift in the x, y directions, calculated a metric, and kept the shift that created the optimal metric. I tried both euclidean distance and normalized cross correlation (NCC). For euclidean distance, I kept the shift that generated the lowest distance, whereas for NCC, I kept the one with the maximum score. This was because a set of similar images would produce a low distance and a high correlation. Since NCC performed slightly better than euclidean distance, I used NCC for all images. However, this was not enough to successfully align the layers, so I tried to preprocess each color layer. After further examinations, I figured out that the rugged edges of the layers messed up the alignment algorithm. Thus, I cropped each layer by 5% of the image height/width in each direction from each side. This update finally properly aligned the image pairs. Using this finalized algorithm, I was able to stack the aligned layers together to produce a visually pleasing colored image for most small images.
While the exhaustive search approach could easily handle low resolution images, it failed to process the high resolution (tif) files efficiently. Enters the image pyramid. Here, I implemented a recursive search from the lowest resolution all the way to the highest resolution (halfing the image dimensions each layer from top to bottom), updating the optimal shift while keeping the search window small. The recursive structures starts from the highest resolution. When the recursive calls hit a base case where the size was less than equal to 10 by 10 pixels, the function would find an optimal shift using the exhaustive search function outlined above. After the base layer returned the optimal shift, the layers above will recursively update the optimal shifts within a 10 by 10 window centered at the previous optimal shift. The reason why I used a 10 by 10 window throughout was because I needed to exhaustively search the base case image while minimizing the window size to improve speed performance. Otherwise, the algorithm may not be able to find the optimal shift in the image with the lowest quality. After the recursive stack finished, the returned value would be the optimal shift for the original image. At the outset, I tried to use a 30 by 30 window as the smallest window size, but my program was too slow, so I changed to the current configuration.
The only problem with the exhaustive search and pyramid combination was that it did poorly on cathedral.jpg. After observing the picture, I thought that the blue layer is significantly darker than the other two layers. This fact, coupled with the rough edges of the grasses and the poor image quality, made alignment with the raw image layers difficult. Therefore, I decided to use a sobel edge detector to reduce the image complexity and even out each layer's intensity. With the edge detector as a preprocessor, my algorithm successfully aligned the cathedral image.
The above pictures showed no discernable differences with the ones created by regular pyramid methods.
The above example illustrated the problem with the regular pyramid on the cathedral image. The one using edge detectors turned to be much more aligned than the regular approach.
Note that the water fall image quality was not very high. This might be because the water was flowing, so the three layers taken at different time didn't capture the same still frame.