A key issue in social intelligence design is the realization of artifacts that can fluently communicate with people. Thus,
we proposed a two-layered approach to enhance a robot’s capacity of involvement and engagement. The upper layer flexibly controls
social interaction by dynamic Bayesian networks (DBN) representing social interaction patterns. The lower layer improves the
robustness of the system by detecting rhythmic and repetitive gestures. We designed a listener robot that can follow and record
humans’ explanation on how to assemble and/or disassemble a bicycle. The implementation of this system is described by assembling
the key algorithms presented in this paper.