CSE 252 hw03 lastname studentid.zip. The contents of the file should be:
- A pdf file with your writeup. This should have all code attached in the appendix. Name this file: CSE 252 hw03 lastname.pdf.
- All of your source code in a folder called code.
No physical hand-in for this assignment.
- In general, code does not have to be efficient. Focus on clarity, correctness and function here, and we can worry about speed in another course.
Suppose a camera calibration gives a transformation (R, T ) such that a point in the world maps to the camera by CP = R W P + T .
- Given calibrations of two cameras (a stereo pair) to a common external coordinate system, represented by _R1, T1, R2, T_2, provide an expression that will map points expressed in the coordinate system of the right camera to that of the left. (4 points)
- What is the length of the baseline of the stereo pair. (2 points)
- Give an expression for the Essential Matrix in terms of _R1, T1, R2, T_2. (4 points)
Consider two cameras whose image planes are the z=1 plane, and whose focal points are at ( 15_, 0, 0) and (15, 0, _0). Well call a point in the first camera (x, y), and a point in the second camera (u, v). Points in each camera are relative to the camera center. So, for example if (x, y) = (0_, 0), this is really the point (−15, 0, _1) in world coordinates, while if (u, v) = (0_, 0) this is the point (15, 0, _1).
Figure 1: Problem 1 Setup
- Suppose the points (x, y) = (6_, _6) is matched with disparity of 5 to the point (u, v) = (1_, _6). What is the 3D location of this point?
- Consider points that lie on the line x + z = 0, y = 0. Use the same stereo set up as before. Write an analytic expression giving the disparity of a point on this line after it projects onto the two images, as a function of its position in the right image. So your expression should only involve the variables u and d (for disparity). Your expression only needs to be valid for points on the line that are in front of the cameras, i.e. with _z > _1
Show that maximizing the Normalized Cross Correlation (NCC) ),),i,j W˜1(i, j) · W˜2(i, j) is equivalent to minimizing the Normalized Sum Squared Distance (NSSD)
In this problem we will play with convolution filters. In these lecture slides, it is said that filters can be used as templates. Notice that the filter, when convolved with an image, will fire strongest on locations of an image that _look _like the filter. In fact, the slides show how to find Waldo by convolving a patch of Waldo with the Where’s Waldo picture. We will do something similar with pictures of cars. Although this is not a very good way to do object detection, this problem will show you some of the steps necessary to create a good object detector. The goal of this problem will be to teach some pre-processing steps to make vision algorithms be successful and some strengths and weaknesses of filters. Each problem will ask you to analyze and explain your results. If you do not provide an explanation of why or why not something happened, then you will not get full credit. Provide your code in the appendix.
First you will convolve a filter to a synthetic image. The filter or template is filter.jpg and the synthetic image is toy.png. These files are available on the course webpage. You may want to modify the filter image and original slightly. I suggest f ilter img = f ilter img mean(f ilter img(:)). To convolve the filter image with the toy example, in Matlab you will want to use conv2 and in Python you will want to use scipy.signal.fftconvolve. The output of the convolution will create an intensity image. Provide this image in the report. In the original image (not the image with its mean subtracted), draw a bounding box of the same size as the filter image around the top 3 intensity value locations in the convolved image. The outputs should look like Fig. ??. Describe how well you think this will technique will work on more realistic images? Do you forsee any problems for this algorithm on more realistic images?
We have now created an algorithm that produces a bounding box around a detected object. However we have no way to know if the bounding box is good or bad. In the example images shown above, the bounding boxes look reasonable, but not perfect. Given a ground truth bounding box (g) and a predicted bounding box (p), a commonly used measurement for bounding box quality is p∩g . More intuitively, this is the number of overlapping pixels between the bounding boxes divided by the total number of unique pixels of the two bounding boxes combined. Assuming that all bounding boxes
Figure 2: Example outputs for the synthetic example. (a) Example heat map. (b) Example bounding boxes
will be squares (and not diamonds), implement this error function and try it on the toy example in the previous section. Choose 3 different ground truth bounding box sizes around one of the Mickey silhouettes. In general, if the overlap is 50% or more, you may consider that the detection did a good job.
Now that you have created an algorithm for matching templates and a function to determine the quality of the match, it is time to try some more realistic images. The file, cartemplate.jpg, will be the filter to convolve on each of the 5 other car images (car1.jpg, car2.jpg, car3.jpg, car4.jpg, car5.jpg). Each image will have an associated text files that contains 2 _x, y _coordinates (one pair per line). These coordinates will be the ground truth bounding box for each image. For each car image, provide the following:
- A heat map image
- A bounding box drawn on the original image.
- The bounding box overlap percent.
- A description of what pre-processing steps you needed to do to achieve this overlap.
- An explaination of why you felt these steps made sense.
For pre-processing steps, feel free to use whatever you feel makes sense. As a starter suggestion, you will want to rescale the cartemplate to various sizes and you may want to blur either the template or the test image or both. An example output is show for car1.jpg in Fig ??. It may not be possible to achieve 50% overlap on all the images. Your analysis of the images will be worth as much as achieving a high overlap percentage.
In computer vision there is often a desire for features or algorithms to be invariant to X. One example briefly described in class was illumination invariance. The detection algorithm that was implemented for this problem may have seemed a bit brittle. Can you describe a few things that this algorithm was not invariant to? For example, this algorithm was not scale-invariant, meaning the size of the filter with respect to the size of the object being detected mattered. One filter size should not have worked on everything.
Now that you have thought about the weaknesses of the algorithm, can you create one new template to use that improves the overlap of your detection on at least two images? You are more than welcome to use image editing tools such as Paint to create this new filter. Provide a figure showing your new template and explain why you chose to create the template this way.
In this problem we will play around with sparse stereo mathing methods. You will work on two image pairs, a warrior figure and a figure from the Matrix movies (warrior2.mat and matrix2.mat). These files both contain two images, two camera matrices, and set sets of corresponding points (extracted by manually clicking the images).
For illustration, I have run my code on a third image pair (dino2.mat). This data is also provided on the webpage for you to debug your code, but you should only report results on warrior and matrix. In other words, where I include one (or a pair) of images in the assignment below, you will provide the same thing but for BOTH matrix and warrior. Note that the matrix image pair is ‘harder’, in the sense that the matching algorithms we are implementing will not work quite as well. You should expect good results, however, on warrior.
To make the TA extra happy, make the line width and marker sizes bigger than the default sizes.
The first thing we need to do is to build a corner detector. This should be done according to http://cseweb.ucsd.edu/classes/fa11/cse252A-a/lec13.pdf. Your file should be named Cor- nerDetect.m/py, and take as input
corners = CornerDetect(Image, nCorners, smoothSTD, windowSize)
where smoothSTD is the standard deviation of the smoothing kernel, and windowSize the size of the smoothing window. In the lecture the corner detector was implemented using a hard threshold. Do _not _do that but instead return the nCorners strongest corners after non-maximum suppression. This way you can control exactly how many corners are returned.
Run your code on all four images (with nCorners = 20) and show outputs as in fig. ??
Figure 4: Result of corner detection
Write a function SSDmatch.m that implements the SSD matching algorithm for two input windows. Include this code in your report (in appendix as usual).
Equipped with the corner detector and the SSD matching code, we are ready to start finding cor- respondances. One naive strategy is to try and find the best match between the two sets of corner points. Write a script that does this, namely, for each corner in image1, find the best match from the detected corners in image2 (or, if the SSD match score is too low, then return no match for that point). You will have to figure out a good threshold (SSDth) value by experimentation. Write a function naiveCorrespondanceMatching.m and call it as below. I only want you to use 10 corners so that the figure is readable in the report, but experiment with larger numbers! naiveCorrespon- danceMatching.m will call your SSD mathching code. Include a figure like fig. ?? in your report. The parameter ’R’ below, is the radius of the patch used for matching.
ncorners = 10;
corners1 = CornerDetect(I1, ncorners, smoothSTD, windowSize)); corners2 = CornerDetect(I2, ncorners, smoothSTD, windowSize));
[I, corsSSD] = naiveCorrespondanceMatching(I1, I2, corners1, corners2, R, SSDth);
Using the provided fund.m/py together with the provided points cor1, and cor2, calculate the fun- damental matrix, and then plot the epipolar lines in both images pairs as shown below. Plot the points and make sure the epipolar lines go through them. Show your output as in fig. ??. You might find the supplied linePts.m/py function when you are working with epipolar lines.
We will now use the epipolar geometry to build a better matching algorithm. First, detect 10 corners in Image1. Then, for each corner, do a linesearch along the corresponding epipolar line in Image2. Evaluate the SSD score for each point along this line and return the best match (or no match if all scores are below the SSDth). _R _is the radius (size) of the SSD patch in the code below. You do not have to run this in ‘both directions’, but only as indicated in the code below.
Figure 5: Result of naive matching procedure
Figure 6: Epiploar lines
ncorners = 10;
F = fund(cor1, cor2);
corners1 = CornerDetect(I1, ncorners, smoothSTD, windowSize));
corsSSD = correspondanceMatchingLine( I1, I2, corners1, F, R, SSDth);
Your resulting plots should look like fig. ??.
Now that you have found correspondences between the pairs of images we can triangulate the corresponding 3D points. Since we do not enforce the ordering constraint the correspondences you have found are likely to be noisy and to contain a fair amount of outliers. Using the provided camera matrices you will triangulate a 3D point for each corresponding pair of points. Then by reprojecting the 3D points you will be able to find most of the outliers.
You should implement the linear triangulation method described in lecture (supplementary ma- terial on this subject is available here: http://cseweb.ucsd.edu/classes/fa11/cse252A-a/hw3/ linear_triangulation.pdf). P1 and P2 below are the camera matrices.
Also write a function, findOutliers.m/py, that back-projects the world points to Image2, and then determines which points are outliers and inliers respectively. For this purpose, we will call a point an outlier if the distance between the true location, and the back-projection is more than 20 pixels.
Figure 7: Result of epipolar matching procedure
outlierTH = 20;
F = fund(cor1, cor2); ncorners = 50;
corners1 = CornerDetect(I1, ncorners, smoothSTD, windowSize)); corsSSD = correspondanceMatchingLine( I1, I2, corners1, F, R, SSDth); points3D = triangulate(corsSSD, P1, P2);
[ inlier, outlier ] = findOutliers(points3D, P2, outlierTH, corsSSD);
Display your results by showing, for image2, the original points in black circler, the inliers as blue plus signs, and the outliers as red plus signs, as shown in fig. ??. Compare this outlierplot with fig. ??. Does the detected outliers correspond to false matches?
Optional: Plot a point cloud! Now that you have the 3D positions of your inlier correspondences you can plot a point cloud. However, since the number of inlier correspondences is probably low your point cloud will be sparse and it will be hard to identify a structure. If you would really like to get a dense point cloud, more images with projection matrices can be found at http://www. cs.washington.edu/homes/furukawa/research/mview/index.html. You can create partial point clouds between pairs of images. Since all the projection matrices are known you can merge your partial point clouds to make a dense one.
Figure 8: Result of back propagation.