Physically Plausible Scene Estimation
Perceiving object poses in a cluttered scene is a challenging problem because of the partial observations available to an embodied robot, where cluttered scenes are especially problematic. In addition to occlusions, cluttered scenes have various cases of uncertainty due to physical object interactions, such as touching, stacking and partial support. In this work, we study these cases of physics-based uncertainty one by one and propose methods for physically-viable scene estimation. Specifically, we use Newtonian physical simulation to validate the plausibility of hypotheses within a generative probabilistic inference framework for: particle filtering, MCMC and an MCMC variant on particle filtering. Assuming that object geometries are known, we estimate the scene as a collection of object poses, and infer a distribution over the state space of scenes as well as the maximum likelihood estimate. We compare with ICP based approaches and present our results for scene estimation in isolated cases of physical object interaction as well as multi-object scenes such that manipulation of graspable objects can be performed with a PR2 robot.
This work is accepted for publication at Humanoids 2016. Check the publications for more details.
Axiomatic State Estimation
A tabletop scene can be representated as a scene graph that models the interactions between the objects such as "on", "in" and "has" along with its pose in the 3D world. A scene state can be represented as a collection of axioms. Observing a scene using camera and estimating its axiomatic state, gives a very rich information required for a manipulation task such as pick and place. In this project this estimation of the scene is approached using standing Bayesian Filter method. It is called Axiomatic Particle Filter (APF). Particles here are rendered depth images from the OpenGL graphics engine, assuming that the object geometries are known. Observation comes from the Microsoft Kinect sensor. Filtering gives most likely particle at every iteration. Most likely particle at the end of several iterations converges to the observation and the real scene.
This work is accepted for publication at IROS 2015. Check the publications for more details.
Viewpoint based Mobile Robotic Exploration aiding Object Search in Indoor Environment
Robots have to manipulate and grasp the objects in the environment so as to assist the human tasks. This requires Object detection followed by the recognition. Not every location viewed by the robot has object in view or even the containers (tables, shelves) of the object in view. So though the object recognition performs at its best, the manuevering the robot to that particular view with respect to a particular task is always challenging. To overcome such practical issues, we have formulated a exploration that aids the object localization. This uses the semantic information in terms of the map and tries to maximize the views that gets the object segmentation and recognition working.
Using a 3D occupancy map of the environment, we identify the potential locations of the objects and generate a 2D potential map. We explore this potential map to find an object using the viewpoint based exploration strategy.
This work is accepted for publication at ICVGIP 2012. Check the publications section for the link to this paper.
RGB-D saliency using Kinect like Sensors for Indoor Environment
In continuation of the previous work, we want the robot to search for the object in a real time human environment which includes lot of occlusions and negative objects that distract the search. Studying this we came across the saliency models that depict the human visual system in specific to a task. But we were more interested in segmenting the object from the scene that is salient and try to apply the same for the object search. Hence we formulated a new saliency model based on the 3D shapes of the objects in the scene. We were successful in using the same with existing image based models and fuse them to form RGB-D saliency model. Now this model is used in segmenting clouds that are more salient in the scene decreasing the search space for the robot.
The version of the paper is available in the publications section.
Visual Localization and Navigation across Semantics using Monocular CameraThis is an ongoing project an extension of the Aravindhan's semantic exploration. Making use of the semantic representation of the environmentlike lab, corridor we are working on localization of the robot in this sparse prior information. It is a challenging project where the sparse data refers to descrete images available from the stage of exploration. Localizing the current location of the robot with these images is challenging. We were successful in building the same with some constraints. We are working on these constraints right now. Video here shows the successful localization of the robot and its navigation to a goal in different semantic.