Single view depth estimation from train images
|Abstract:||Depth prediction is the task of computing the distance of different points in the scene from the camera. Knowing how far away a given object is from the camera would make it possible to understand its spatial representation. Early methods have used stereo pairs of images to extract depth. To have a stereo pair of images, we need a calibrated pair of cameras. However, it is simpler to have a single image as no calibration or synchronization is needed. For this reason, learning-based methods, which estimate depth from monocular images, have been introduced. Early solutions of learning-based problems have used ground truth depth for training, usually acquired from sensors such as Kinect or Lidar. Acquiring depth ground truth is expensive and difficult which is why self-supervised methods, which do not acquire such ground truth for fine-tuning, has appeared and have shown promising results for single image depth estimation. In this work, we propose to estimate depth maps for images taken from the train driver viewpoint. To do so, we propose to use geometry constraints and rails standard parameters to extract the depth map inside the rails, to provide it as a supervisory signal to the network. To this end, we first gathered a train sequences dataset and determined their focal lengths to compute the depth map inside the rails. Then we used this dataset and the computed focal lengths to finetune an existing model “Monodepth2” trained previously on the Kitti dataset. We show that the ground truth depth map provided to the network solves the problem of depth of the rail tracks which otherwise appear as standing objects in front of the camera. It also improves the results of depth estimation of train sequences.|
|Document Type:||Mémoire de maîtrise|
|Open Access Date:||20 September 2021|
|Collection:||Thèses et mémoires|
All documents in CorpusUL are protected by Copyright Act of Canada.