How do you train a computer to just take out vocals and not music? Or certain crowd noise?
Karaoke mixes and such are created by the spatial/phase relationship, and how vocals are generally panned to the center. Audience recordings from concerts won't have such a clean and perfect spatial mix.