CS 280A Project 2: Fun with Filters and Frequencies

by Alonso Martinez

Part 1 - Fun with Filters

1.1 - Finite Difference Operator

When detecting edges, the width of the filter matters (for example [-1,1] vs [-1,0,1]). Below is a gif of the effect of the width of the filter on the edge: Observation: notice that it not just dilates the edges left and right, but also introduces some issues at the edges of the image. I imagine this could be an issue in filters like Gaussian that have a symmetric/ odd sized kernel vs others like edge detection that might benefit from an even sized kernel. If using a width 3 edge filter, we need to make sure to divide the output by 2 to normalize the data.

Similarly, when convolving with a multi-row kernel, I realize that tiling the rows y also incorrectly dilated the edge detection image. Instead, padding the kernel with zeros is the correct solution.

The threshold for a binary mask was swept and a threshold value of 0.13 was manually chosen. Speculation: Critical information seemed present in the extremes of positive AND negative values and filtering out a smooth band of the middle range values with a Gaussian would give smoother / better results. (This was solved later when we took the gradient of the image as it deals with absolute values)

Now that we have extracted the left and right and up and down edges, how do we compose them? This is done by computing the "gradient magnitude" by squaring the outputs of the dx and dy filters, adding them and then taking the square root. This makes sense as previously we were doing a binary filter not paying attention to the negative values and this in essence it taking the magnitude of the displacement vector.

Musing: Squaring and square root operations are non-linear in comparison to taking the absolute value. I suspect that given the range is [-1,1] this won't have large influence, but it's always good to be aware that we are changing the distribution of the values.

The magnitude correctly aggregates information from the dx and dy for which we do another binary mask threshold sweep and choose 0.14.

Part 1.2 - Derivative of Gaussian (DoG) Filter

The Gaussian sigma was chosen to avoid clipping at the edges. Since the mask may later be represented in an 8-bit (0-255) range, I designed the filter to highlight values that would round to the darkest possible shade (values less than 1/255), using green for those cases.

A Gaussian filter size sweep was done trying to balance cleaning up the anti-alias issues without losing too much information. A kernel size of 7 was chosen as a starting point. Speculation: The ideal resolution of the Gaussian kernel might be dependent on the image size and making it invariant to that might be a good experiment.

More sweeps for finding the Gaussian sigma and binary threshold were done and a sigma of 1 and binary threshold of 0.2 was chosen. This will be the target for our composite kernel. Observation: Using the gaussian filter helped get rid of noise and anti-aliasing issues in the image. While many improvements were made, one has to be careful that the filter does not change the 'essence' of the image.

We create a composite kernel composed of the Gaussian kernel with the dx and dy filters correspondingly:

On the left we have the output of convolving with the composite filters above. On the right we have a binary delta between the two images. There are *some* differences around the border, but not experientially different and proving the commutative and additive properties of convolutions.

Original

Diff

Part 2 Fun with Frequencies!

Part 2.1 - Image "Sharpening"

Now we continue our journey into manipulating images by exploring the frequency domain. We start by blurring an image, subtracting it from the original, and then adding it back to the original, which in essence is a high pass filter of the image (removing low frequencies).

Then we can apply the high frequency back to the original image to sharpen it. You can see that the bricks on the road and top of the building are more defined. Observation: using np.clip is critical to ensure that we are not paying attention to negative values introduced by the high frequency data.

We can now apply this technique to each of the RGB channels of the image:

We apply this same technique to the blurred image of Dante from Coco on the left and see that we get decently close to restoring the image. Observation: While sharpening does an incredible job for areas like the whiskers that already had high frequency data, it does not do as well for the background which was very low frequency. This is a great motivation for learning how to perform image manipulation in the frequency domain!

Original Dante

Dante Blurred

Part 2.2: Hybrid Images

Hybrid images leverage the non-linear nature of human visual perception to create images that change based on viewing distance. By combining the low frequencies of one image with the high frequencies of another, we can create a single static image that appears differently when viewed up close versus from afar.

The low frequencies are obtained by blurring the image, while the high frequencies can be extracted using two methods: subtracting a blurred version of the image from the original, or using an impulse filter (a filter with a center value of 1) subtracted by a Gaussian filter, as suggested in the SIGGRAPH 2006 "Hybrid Images" paper by Oliva, Torralba, and Schyns.

Observation: I did not see a noticable difference between using the suggested impulse filter vs the high frequency filter using the difference of the original and blurred image. Perhaps this image isn't complex enough to show the difference.

Examples

We apply the "Hybrid image" technique to two
of Mr. Incredible's expressions from the Incredibles.

Here we break down the hybrid image into its frequency analysis:

Original smile

Frequency Analysis

High frequency filter

Frequency Analysis

Original yell

Frequency Analysis

Low frequency filter

Frequency Analysis

Example 2 - Jellyfish / Umbrella

High pass Jellyfish + Low pass umbrella

Example 3 - Walter White and Ned Flanders. (this one worked less well)

Speculation: My intuition says tha the contributing variables were to this image not working well were: 1) The cartoon image was likely a poor choice given the lack of high frequency data 2) Despite having conceptual similarities such as both characters having mustaches, a button up shirt and similar shirt, the morphology of the objects are too different and features like the eyes don't line up and create large swaths of value regions that conflict. Dante original

The input images were altered in photoshop to improve the results

Bells and Whistles

I played around with the choice of color and greyscale for both the low and high frequency images. From my experimentation, I found that color in the low pass is better than in the high pass but this is only true when the colors of your two objects are close together.

Original Dante

Low Freq Color, High freq Greyscale

Part 2.3: Gaussian and Laplacian Stacks

Given what we learned so far, we have all the available tools to do image blending that is more sophisticated than just alpha blending. We can use the Gaussian and Laplacian stacks to blend images to leverage the fact that human perception has an attention bias towards high frequencies. Let's use that towards our advantage!

In order to achieve a multi frequency blending, we create a gaussian and Laplacian stack for each image and then blend them together. The Laplacian stack is created by taking the difference between the Gaussian stack and the next level of the Gaussian stack. This allows us to blend the images at different frequencies.

In essence, we are breaking up the image into different frequency bands and blending them together.

For example, for the following image

We create the following increments of blur

We compute the difference between the blur stages above and get the Laplacian values.(We remap the -1 to 1 range to 0 to 1 for viewing purposes)

Then we can recompose the original image by taking the blurriest frame of the gaussian, the laplacian stack and summing them all together.

Now that we created the gaussian and laplacian stacks, let's blend these two images together:

Panther chameleon

Mandarin Fish

Instead of using a sharp binary mask across all frequencies, we extract a gaussian stack for the mask as well:

Shown in red, you can see how the transition separating the two images is more heavily blended as you get to the lower frequencies

And when you compose the image back together you get a very nice blend!

Additional Examples

Input images:

Jackson's chameleon

Rainbow Lorikeet

Mask

Final Blend

Example 2:

Zebra

Giraffe

Mask

Final Blend

Example 3 - Orapple:

Final Blend

Orange Gaussian stack

Orange Laplacian stack

Mask Gaussian stack

Final Blend

Laplacian stack

Final Blend

Final thoughts

This project and the hands-on lectures revealed the incredible complexity that can arise from simple signals interacting with eachother. On a practical level I learned to be careful about how scaling the values of the images mattered a lot when the range was negative vs positive, as well as being very careful about when to use np.clip; sometimes it was critically necessary and other times it was throwing away unwanted information. Throughout this project, I kept thinking, this is so cool, "Why isn't there a Photoshop clone / blend brush that leverages the human perceptual trick of focusing mostly on higher frequencies?!"