A computer vision system is described that captures color image sequences, detects and recognizes static hand poses (i.e., "letters") and interprets pose sequences in terms of gestures (i.e., "words"). The hand object is detected with a double-active contour-based method. A tracking of the hand pose in a short sequence allows detecting "modified poses", like diacritic letters in national alphabets. The static hand pose set corresponds to hand signs of a thumb alphabet. Finally, by tracking hand poses in a longer image sequence, the pose sequence is interpreted in terms of gestures. Dynamic Bayesian models and their inference methods (particle filter and Viterbi search) are applied at this stage, allowing a bi-driven control of the entire system.