Detecting distinct features in modern pop music is an important problem that can have significant applications in areas such as multimedia entertainment. They can be used, for example, to give a visually coherent representation of the sound. We propose to integrate a singing voice detector with a multimedia, multi-touch game where the user has to perform simple tasks at certain key points in the music. While the ultimate goal is to automatically create visual content in response to features extracted from the music, here we give special focus to the detection of voice segments in music songs. The solution presented extracts the Mel-Frequency Cepstral Coefficients of the sound and uses a Hidden Markov Model to infer if the sound has voice. The classification rate obtained is high when compared to other singing voice detectors that use Mel-Frequency Cepstral Coefficients.