When detecting edges, the width of the filter matters (for example [-1,1] vs [-1,0,1]). Below is a gif of the effect of the width of the filter on the edge: Observation: notice that it not just dilates the edges left and right, but also introduces some issues at the edges of the image. I imagine this could be an issue in filters like Gaussian that have a symmetric/ odd sized kernel vs others like edge detection that might benefit from an even sized kernel. If using a width 3 edge filter, we need to make sure to divide the output by 2 to normalize the data.
Similarly, when convolving with a multi-row kernel, I realize that tiling the rows y also incorrectly dilated the edge detection image. Instead, padding the kernel with zeros is the correct solution.
The threshold for a binary mask was swept and a threshold value of 0.13 was manually chosen. Speculation: Critical information seemed present in the extremes of positive AND negative values and filtering out a smooth band of the middle range values with a Gaussian would give smoother / better results. (This was solved later when we took the gradient of the image as it deals with absolute values)
Now that we have extracted the left and right and up and down edges, how do we compose them? This is done by computing the "gradient magnitude" by squaring the outputs of the dx and dy filters, adding them and then taking the square root. This makes sense as previously we were doing a binary filter not paying attention to the negative values and this in essence it taking the magnitude of the displacement vector.
Musing: Squaring and square root operations
are non-linear in comparison to taking the
absolute value. I suspect that given the range is [-1,1] this won't have large influence, but it's always
good to be aware that we are changing the distribution of the values.
The magnitude correctly aggregates information from the dx and dy
for which we do another binary mask threshold sweep and choose 0.14.
The Gaussian sigma was chosen to avoid clipping at the edges. Since the mask may later be represented in an 8-bit (0-255) range, I designed the filter to highlight values that would round to the darkest possible shade (values less than 1/255), using green for those cases.
A Gaussian filter size sweep was done trying to balance cleaning up the anti-alias issues without losing too much information. A kernel size of 7 was chosen as a starting point. Speculation: The ideal resolution of the Gaussian kernel might be dependent on the image size and making it invariant to that might be a good experiment.
More sweeps for finding the Gaussian sigma and binary threshold were done and a sigma of 1 and binary threshold of 0.2 was chosen. This will be the target for our composite kernel. Observation: Using the gaussian filter helped get rid of noise and anti-aliasing issues in the image. While many improvements were made, one has to be careful that the filter does not change the 'essence' of the image.
We create a composite kernel composed of the Gaussian kernel with the dx and dy filters correspondingly:
On the left we have the output of convolving with the composite filters above. On the right we have a binary delta between the two images. There are *some* differences around the border, but not experientially different and proving the commutative and additive properties of convolutions.
Now we continue our journey into manipulating images by exploring the frequency domain. We start by blurring an image, subtracting it from the original, and then adding it back to the original, which in essence is a high pass filter of the image (removing low frequencies).
Then we can apply the high frequency back to the original image to sharpen it. You can see that the bricks on the road and top of the building are more defined. Observation: using np.clip is critical to ensure that we are not paying attention to negative values introduced by the high frequency data.
We can now apply this technique to each of the RGB channels of the image:
We apply this same technique to the blurred image of Dante from Coco on the left and see that we get decently close to restoring the image. Observation: While sharpening does an incredible job for areas like the whiskers that already had high frequency data, it does not do as well for the background which was very low frequency. This is a great motivation for learning how to perform image manipulation in the frequency domain!
Hybrid images leverage the non-linear nature of human visual perception to create images that
change based on
viewing distance. By combining the low frequencies of one image with the high frequencies of another, we can
create
a single static image that appears differently when viewed up close versus from afar.
The low
frequencies are
obtained by blurring the image, while the high frequencies can be extracted using two methods: subtracting a
blurred
version of the image from the original, or using an impulse filter (a filter with a center value of 1)
subtracted by
a Gaussian filter, as suggested in the SIGGRAPH 2006 "Hybrid Images" paper by Oliva, Torralba, and Schyns.
Observation: I did not see a noticable difference between using the suggested impulse
filter vs the high frequency filter using the difference of the original and blurred image. Perhaps this
image isn't complex enough to show the difference.
We apply the "Hybrid image" technique to two
of Mr. Incredible's expressions from the Incredibles.
Here we break down the hybrid image into its frequency analysis:
Given what we learned so far, we have all the available tools to do image blending that
is more sophisticated than just alpha blending. We can use the Gaussian and Laplacian stacks to blend images
to leverage the fact that human perception has an attention bias towards high frequencies. Let's use that
towards our advantage!
In order to achieve a multi frequency blending, we create a gaussian and Laplacian stack for each image and
then blend them together. The Laplacian stack is created by taking the difference between the Gaussian stack
and the next level of the Gaussian stack. This allows us to blend the images at different frequencies.
In essence, we are breaking up the image into different frequency bands and blending them together.
Now that we created the gaussian and laplacian stacks, let's blend these two images together:
Input images:
Example 2:
Example 3 - Orapple:
This project and the hands-on lectures revealed the incredible complexity that can arise from simple signals interacting with eachother. On a practical level I learned to be careful about how scaling the values of the images mattered a lot when the range was negative vs positive, as well as being very careful about when to use np.clip; sometimes it was critically necessary and other times it was throwing away unwanted information. Throughout this project, I kept thinking, this is so cool, "Why isn't there a Photoshop clone / blend brush that leverages the human perceptual trick of focusing mostly on higher frequencies?!"