In this report we present our first prototype of the perception pipeline developed for the CROWDBOT project. Currently, the focus of the perception pipeline is on detecting and tracking pedestrians in low to medium density scenarios using RGB-D cameras and 2D LiDAR sensors. We begin with reviewing the major detection and tracking methods used in our perception pipeline, as well as their ROS implementation. Then we proceed with quantitative evaluations, presenting results on the detection, tracking, and run-time performance. Finally, we discuss some on-going work to extend the perception pipeline, including interactive data annotation tools, optical flow aided pedestrian tracking, and detailed person analysis modules.