Localization of pedestrians in 3D scene space from single RGB images is critical for various downstream applications. Current monocular approaches employ either the bounding box of pedestrians or the visible parts of their bodies for localization. Both approaches introduce additional error to the location estimation in the case of real-world scenarios – crowded environments with multiple occl…