Book All Semester Assignments at 50% OFF! ORDER NOW

A Stacked LSTM Based Approach for Reducing Semantic Pose Estimation Error 




SLAM and the Intervention of Deep Learning.

Enhancing SLAM Estimation Accuracy.


Semantic SLAM


Introduction to IEEE Transactions on Instrumentation and Measurement

SLAM is stands for SIMULTANEOUS localization and mapping. It is one of the most frequent problems of research in the community of robotics. It is described as the main problem of guessing the trajectory path of the robotic vehicles and preparing the map in an incremental order of its embracing of the vehicle, offered with the determinants which are perceived from the territory [1]. SLAM helps in serving as key enabler in a wide variety of applications in the robotics of mobile like rescue and search, and supplemented reality [6]. The semantic slam based on the visual determinants which are acquired by the vision sensor. It helps in exploiting the structure of surrounding’s understanding to develop very expressive map, so that human operators can understand it easily. It started to achieve a remarkable amount of focus, mostly the breakthrough in deep learning, that led to modernisation in detecting the objects and techniques that help in tracking [7].

The correctness of the localization is the most important factor to get success in the robotics activity specifically that incorporated in interacting with humans. There are various kinds of examples available of these tasks like rescue and search, driving autonomously and caring elderly. Due to its infancy, the semantics SLAM is still to accomplish robustness in the existence of nosy computation, such as those occurring owning to not accurate objects pose approximate with respect to the sensor vision. The unreliability of SLAM approximates may arise because of the errors in measurement that is different according to the adopted method to SLAM. If talk about the object oriented relied semantic SLAM, most of the time errors occurs in the pose processing sensory information to calculate the poses of the noticed to characteristics with respect to the environment’s sensor. This procedure initiates to detect the landmark and calculate its centroid after calculating the bounding box.

Here the centroid is the landmark that help in utilizing the measurements the pose among the vision sensors and the feature. In addition to this, occlusion have a remarkable effect in estimated the poses of object with accuracy [8]. The main aim of this paper is to reduce the join influence of many sources of errors of estimating the accuracy of SLAM semantics. These errors or bugs can arise because of the constraints of the software and hardware elements which is used to execute semantic SLAM, from external conditions or from unforeseeable noise. Making a noise model that helps in accounting for these types of bugs which is very challenging and some bugs occur with any expectations while collecting the data or processing. Therefore, a stacked LSTM dependent neural network is used in this study to get to know and to capture the patterns of bugs attached with the trajectory evaluate of semantic SLAM.

If make a comparison with the trajectory evaluation with the ground truth, the network helps in minimising the bugs and therefore improve the accuracy in the semantic SLAM. This is a general approach which can used in any system of SLAM as it operates on trajectory guesses instead of raw material. This approach can be used in different kinds of applications that need appropriate localization in the vehicle which has robots installed. For instance, there is an estimation which appropriate to the trajectory path semantic SLAM that helps in providing the meaningful and more appropriate map in the environment.

There is another use case scenario defined in the paper in the applications of rescue and search. If the robotics act as a first responders have to ability to find the accurate location, it will be good in rescuing the victims, or locating the area that requires the instant help. This paper developed a method based on stacked LSTM to find and minimise pose buds in the semantic SLAM which is object based. The method mitigates the influence of foreseeable and unforeseeable noise in the accuracy of trajectory approximates.

II. Related Work

DNN which stands for Deep Neural Network. These networks are trained to behave in a particular way according to the issue at hand, when information with processing. At the time training, the internal elements of the network which is called as weights have to adjust to reduce the discrepancy among the desired output and the network’s foreseeable [9]. SNN has three layers, one is input layer, other is hidden layer and the third is output layer. If the network that has two or more than two hidden layers known as DNN. DNN is more efficient as compared to SNN as in the computational units with respect to the modelling a complicated issue. This attribute belongs to the non-linear nature of all the functions that are activated held at all seven layers in the Deep Neural Network [10].

In addition to this, RNN which is recurrent neural networks are the neural networks if artificial and are capable enough to informing knowledge from the context. This can be helpful in looping that permits the information or data to be give back to the network after getting processed. Nevertheless, these kinds of networks from different kinds of vanishing gradients, that help in motivating the requirement for LSTM cells [11]. These cells help in making sure RNNs to sustain data that are important and it can discard them. This attribute cannot be acquired if want to use traditional neural networks. LSTMs and DNNs have display performance related to state of the art in a multitude of many applications, incorporating robotics and computer vision.

B. SLAM and the Intervention of Deep Learning

If the body of literature is rich then it helps in addressing the problems of SLAM and range of algorithms also that is very reliable, efficient and accurate, all these things are proposed. With the help of deep learning approach, it been witnessing that substantial share of these methods in the last few times and its capability to carry out all the classical methods that has been demonstrated. Additionally, deep learning is based on the object relied detection methods, help in promoting the advanced part of the object relied semantic SLAM, all these are based on observations of the landmark that can be labelled semantically in the environment, like different approaches given in [28], [29]. Getting a reliable observation in the environment of landmark and that too accurately and pointing its position w.r.t the sensor is still an issue.

C. Enhancing SLAM Estimation Accuracy

There are different kinds of SLAM applications in which the accuracy of state estimation is susceptible to the influence of many bugs sources. These bugs occur at one or more stages in the pipeline of SLAM, like gathering of data, processing the data and optimization. There are different exist in the literature that asserts the model of noise always follows the fixed distributions that can formulated mathematically. However, this case is not applicable to the applications which are practical and may lead to severe degradation in estimating accuracy. To improve the accuracy in the localization, the solutions can be found in different literatures and can be categorised into: (1) control the ambience under investigation, (2) fusion of data centre (3) enhancing the calculation in covariance estimation (4) to correct all the calculation errors, that can be classified into learning approaches and classical approaches. According to the work [36] study, the passive tags are also used in the landmarks to keep the accuracy of a specific range. In other area, the robustness in the localisations of indoor was also supported by integrating sensor data, that compensate the constraints of employed sonar described in [37]. The other example of fusion is presented in [38] where these measurements can be recorder with the other sensors that were used to enhance the accuracy. Rather than, assuming of fixed calculations noise model, the work defined in [39] helps in predicting the noise model on the basis of raw measurements according to the DNN. In the same way, the study [30] defined about the detections of QR code. This method is more expensive mathematically then the Kalman filter, and still believes in higher accuracy. Likewise, the method proposed in [35] enhance the accuracy of SLAM by devising the adaptive Gaussian particle filter where job is used to compensate in measurements. According to [16], an approach which is based on deep learning that can be employed to enhance the estimation in altitude of flying robotic. In addition to this [45] presented the odometry in the wheeled cart, help in calculating the dynamic equations was enhanced with the help of SNN. The network was modelled in such a way to calculate the estimate the distance travelled by vehicle. Nevertheless, as the network is made up of single hidden layer, it cannot be able to store all the patterns in estimating the errors and this cannot do more. The main advantages of proposed stacked LSTM is, It helps in mitigating the influence of all the potential experiences during carrying out the SLAM, incorporating the computational errors, faults in data processing and any other noise.

III. Proposed Approach

The deep learning method in this paper shows in figure 1. In common, it is based on the ground trajectory’s theory that helps in estimating the semantic SLAM, is passed to a neural network that help in finding and minimising the potential pose bugs.

Semantic SLAM

The SLAM which is adopted is designed mainly for the ground vehicle and is performed according to the calculations from the wheel of vehicle encoders and with the help of RGB-D camera which is installed on top. In this section, this is carried out by different means of mapping algorithm.

1) Landmark Pose Estimation and Data Association

2) Measurement Uncertainty

(3) multi-path interference

(4) flying pixels and

 (5) the scene’s characteristics

References for IEEE Transactions on Instrumentation and Measurement

[1] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age,” IEEE Transactions on Robotics, vol. 32, no. 6, pp. 1309–1332, 2016.

[2] X. Chen, H. Zhang, H. Lu, J. Xiao, Q. Qiu, and Y. Li, “Robust SLAM system based on monocular vision and LiDAR for robotic urban search and rescue,” SSRR 2017 - 15th IEEE International Symposium on Safety, Security and Rescue Robotics, Conference, pp. 41–47, 2017.

[3] A. Denker and M. C. Is¸eri, “Design and implementation of a semiautonomous mobile search and rescue robot: SALVOR,” IDAP 2017 - International Artificial Intelligence and Data Processing Symposium, 2017.

[4] J. Casper and R. R. Murphy, “Human-robot interactions during the robot-assisted urban search and rescue response at the World Trade Center,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 33, no. 3, pp. 367–385, 2003.

[5] A. Pfrunder, P. V. Borges, A. R. Romero, G. Catt, and A. Elfes, “Real-time autonomous ground vehicle navigation in heterogeneous environments using a 3D LiDAR,” IEEE International Conference on Intelligent Robots and Systems, vol. 2017-September, pp. 2601–2608, 2017.

[6] D. Ramadasan, M. Chevaldonne, and T. Chateau, “Real-time slam for static multi-objects learning and tracking applied to augmented reality applications,” in 2015 IEEE Virtual Reality (VR), March 2015, pp. 267– 268.

 [7] Z. Zou, Z. Shi, Y. Guo, and J. Ye, “Object Detection in 20 Years: A Survey,” pp. 1–39, 2019. [Online]. Available: http: //

[8] S. Soetens, A. Sarris, and K. Vansteenhuyse, “Pose Estimation Errors, the Ultimate Diagnosis,” European Space Agency, (Special Publication) ESA SP, vol. 1, no. 515, pp. 181–184, 2002.

 [9] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.

 [10] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layerwise training of deep networks,” Advances in Neural Information Processing Systems, no. 1, pp. 153–160, 2007.

[11] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.

[13] C. Farabet, C. Couprie, L. Najman, and Y. Lecun, “Learning Hierarchical Features for Scene Labeling,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 8, pp. 1915–1929, 2013.

[14] A. Garcia-Perez, F. Gheriss, D. Bedford, A. Garcia-Perez, F. Gheriss, and D. Bedford, “Going deeper with convolutions,” Designing and Tracking Knowledge Management Metrics, pp. 163–182, 2019.

[15] S. Wang, R. Clark, H. Wen, and N. Trigoni, “End-to-end, sequence-tosequence probabilistic visual odometry through deep neural networks,” International Journal of Robotics Research, vol. 37, no. 4-5, pp. 513– 542, 2018.

 [16] M. K. Al-Sharman, Y. Zweiri, M. A. K. Jaradat, R. Al-Husari, D. Gan, and L. D. Seneviratne, “Deep-learning-based neural network training for state estimation enhancement: Application to attitude estimation,” IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 1, pp. 24–34, 2019.

[17] V. Peretroukhin and J. Kelly, “DPC-Net: Deep Pose Correction for Visual Localization,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 2424–2431, 2018.

[18] R. Mur-Artal and J. D. Tardos, “ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.

[19] J. Engel, T. Schops, and D. Cremers, “LSD-SLAM: Large-Scale Direct ¨ monocular SLAM,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8690 LNCS, no. PART 2, pp. 834–849, 2014.

[20] R. Gomez-Ojeda, F. A. Moreno, D. Zuniga-No ˜ el, D. Scaramuzza, and ¨ J. Gonzalez-Jimenez, “PL-SLAM: A Stereo SLAM System Through the Combination of Points and Line Segments,” IEEE Transactions on Robotics, vol. 35, no. 3, pp. 734–746, 2019.

[21] G. Costante, M. Mancini, P. Valigi, and T. A. Ciarfuglia, “Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation,” IEEE Robotics and Automation Letters, vol. 1, no. 1, pp. 18–25, 2016.

 [22] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 International Conference on Computer Vision, ICCV 2015, pp. 2650–2658, 2015.

[23] C. Cadena, A. Dick, and I. D. Reid, “Multi-modal auto-encoders as joint estimators for robotics scene understanding,” Robotics: Science and Systems, vol. 12, 2016.

[24] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for real-time 6-dof camera relocalization,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 International Conference on Computer Vision, ICCV 2015, pp. 2938– 2946, 2015.

[25] J. S. D. R. G. A. F. Redmon, “(YOLO) You Only Look Once: Unified, Real-Time Object Detection,” Cvpr, 2016.

[26] Y. Konishi, Y. Hanzawa, M. Kawade, and M. Hashimoto, “SSD: Single Shot MultiBox Detector,” Springer International Publishing AG, vol. 1, pp. 398–413, 2016

[27] J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield, “Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects,” no. CoRL, pp. 1–11, 2018. [Online]. Available:

[28] S. L. Bowman, N. Atanasov, K. Daniilidis, and G. J. Pappas, “Probabilistic data association for semantic SLAM,” Proceedings - IEEE International Conference on Robotics and Automation, pp. 1722–1729, 2017.

[29] B. Mu, S. Y. Liu, L. Paull, J. Leonard, and J. P. How, “SLAM with objects using a nonparametric pose graph,” IEEE International Conference on Intelligent Robots and Systems, vol. 2016-November, pp. 4602–4609, 2016.

[30] P. Nazemzadeh, D. Fontanelli, D. Macii, and L. Palopoli, “Indoor localization of mobile robots through QR code detection and dead reckoning data fusion,” IEEE/ASME Transactions on Mechatronics, vol. 22, no. 6, pp. 2588–2599, 2017.

[31] P. Ozog and R. M. Eustice, “On the importance of modeling camera calibration uncertainty in visual SLAM,” Proceedings - IEEE International Conference on Robotics and Automation, pp. 3777–3784, 2013.

[32] J. H. Park, Y. D. Shin, J. H. Bae, and M. H. Baeg, “Spatial uncertainty model for visual features using a KinectTM sensor,” Sensors (Switzerland), vol. 12, no. 7, pp. 8640–8662, 2012.

[33] N. Sunderhauf, O. Brock, W. Scheirer, R. Hadsell, D. Fox, J. Leitner, ¨ B. Upcroft, P. Abbeel, W. Burgard, M. Milford, and P. Corke, “The limits and potentials of deep learning for robotics,” International Journal of Robotics Research, vol. 37, no. 4-5, pp. 405–420, 2018.

[34] J. Hidalgo-Carrio, D. Hennes, J. Schwendner, and F. Kirchner, “Gaussian process estimation of odometry errors for localization and mapping,” Proceedings - IEEE International Conference on Robotics and Automation, pp. 5696–5701, 2017.

[35] A. Rao and W. Han, “An Adaptive Gaussian Particle Filter based Simultaneous Localization and Mapping with dynamic process model noise bias compensation,” Proceedings of the 2015 7th IEEE International Conference on Cybernetics and Intelligent Systems, CIS 2015 and Robotics, Automation and Mechatronics, RAM 2015, pp. 210–215, 2015.

[36] V. Magnago, L. Palopoli, R. Passerone, D. Fontanelli, and D. Macii, “Effective Landmark Placement for Robot Indoor Localization with Position Uncertainty Constraints,” IEEE Transactions on Instrumentation and Measurement, vol. 68, no. 11, pp. 4443–4455, 2019.

[37] H. Liu, F. Sun, B. Fang, and X. Zhang, “Robotic Room-Level Localization Using Multiple Sets of Sonar Measurements,” IEEE Transactions on Instrumentation and Measurement, vol. 66, no. 1, pp. 2–13, 2017.

[38] M. Zhang, X. Xu, Y. Chen, and M. Li, “A Lightweight and Accurate Localization Algorithm Using Multiple Inertial Measurement Units,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1508–1515, 2020.

[39] K. Liu, K. Ok, W. Vega-Brown, and N. Roy, “Deep inference for covariance estimation: Learning Gaussian noise models for state estimation,” Proceedings - IEEE International Conference on Robotics and Automation, pp. 1436–1443, 2018.

 [40] M. Brossard, A. Barrau, and S. Bonnabel, “AI-IMU Dead-Reckoning,” IEEE Transactions on Intelligent Vehicles, pp. 1–1, 2020.

 [41] M. Heshmat, M. Abdellatif, and H. Abbas, “Improving visual SLAM accuracy through deliberate camera oscillations,” ROSE 2013 - 2013 IEEE International Symposium on Robotic and Sensors Environments, Proceedings, pp. 154–159, 2013.

[42] S. Chen and C. Chen, “Probabilistic fuzzy system for uncertain localization and map building of mobile robots,” IEEE Transactions on Instrumentation and Measurement, vol. 61, no. 6, pp. 1546–1560, 2012.

[43] H. Hur and H. S. Ahn, “Unknown Input H∞ observer-based localization of a mobile robot with sensor failure,” IEEE/ASME Transactions on Mechatronics, vol. 19, no. 6, pp. 1830–1838, 2014.

 [44] J. W. Yoon and T. Park, “Maximizing localization accuracy via selfconfigurable ultrasonic sensor grouping using genetic approach,” IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 7, pp. 1518–1529, 2016.

[45] J. Toledo, J. D. Pineiro, R. Arnay, D. Acosta, and L. Acosta, “Improving ˜ odometric accuracy for an autonomous electric cart,” Sensors (Switzerland), vol. 18, no. 1, 2018.

[46] M. Brossard and S. Bonnabel, “Learning wheel odometry and imu errors for localization,” Proceedings - IEEE International Conference on Robotics and Automation, vol. 2019-May, pp. 291–297, 2019.

 [47] J. Czarnowski, T. Laidlow, R. Clark, and A. J. Davison, “DeepFactors: Real-Time Probabilistic Dense Monocular SLAM,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 721–728, 2020.

[48] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard, and F. Dellaert, “ISAM2: Incremental smoothing and mapping using the Bayes tree,” International Journal of Robotics Research, vol. 31, no. 2, pp. 216–235, 2012.

[49] C. V. Nguyen, S. Izadi, and D. Lovell, “Modeling kinect sensor noise for improved 3D reconstruction and tracking,” Proceedings - 2nd Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2012, pp. 524–530, 2012.

[50] T. S. B, Z. Zhou, G. Zhao, and M. Pietik, “Comparison of Kinect V1 and V2 Depth Images in Terms of Accuracy and Precision,” vol. 1, no. March, pp. 277–289, 2017.

[51] B. Zoph and Q. V. Le, “Searching for activation functions,” 6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings, pp. 1–13, 2018.

[52] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, p. 1735–1780, Nov. 1997. [Online]. Available:

[53] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of RGB-D SLAM systems,” IEEE International Conference on Intelligent Robots and Systems, pp. 573– 580, 2012.

[54] A. Koubaa, Ed., Robot Operating System (ROS): The Complete Reference (Volume 3), ser. Studies in Computational Intelligence. Springer, 2018, vol. 778.

[55] F. Dellaert, “Factor graphs and GTSAM: A hands-on introduction,” no. September, pp. 1–26, 2012.

[56] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.

[57] R. B. Rusu and S. Cousins, “3D is here: Point Cloud Library (PCL),” Proceedings - IEEE International Conference on Robotics and Automation, pp. 1–4, 2011.

[58] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006. [Online]. Available: 5786/504

[59] F. Chollet et al., “Keras,”, 2015.

[60] M. H. Law and J. T. Kwok, “Bayesian support vector regression,” in In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics. Key West, 2001, pp. 239–244.

Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Engineering Assignment Help

Get Quote in 5 Minutes*

Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
Upload your assignment
  • 1,212,718Orders

  • 4.9/5Rating

  • 5,063Experts


  • 21 Step Quality Check
  • 2000+ Ph.D Experts
  • Live Expert Sessions
  • Dedicated App
  • Earn while you Learn with us
  • Confidentiality Agreement
  • Money Back Guarantee
  • Customer Feedback

Just Pay for your Assignment

  • Turnitin Report

  • Proofreading and Editing

    $9.00Per Page
  • Consultation with Expert

    $35.00Per Hour
  • Live Session 1-on-1

    $40.00Per 30 min.
  • Quality Check

  • Total

  • Let's Start

Get AI-Free Assignment Help From 5000+ Real Experts

Order Assignments without Overpaying
Order Now

My Assignment Services- Whatsapp Tap to ChatGet instant assignment help