I plan to write a conversation analysis software, which will recognize the individual speakers, their pitch and intensity. Pitch and intensity are somewhat straightforward (pitch via autocorrelation).
How would I go about recognizing individual speakers, so I can record his/her features? Will storing some heuristics for each speaker's frequencies be enough? I can assume that only one person speaks at a time (strictly non-overlapping). I can also assume that for training, each speaker can record a minute's worth of data before actual analysis.