In UTF-8, a non-leading byte always has the top two bits set to 10
, so just ignore all such bytes. If you don't mind extra complexity, you can do more than that (to skip ahead across non-leading bytes based on the bit pattern of a leading byte) but in reality, it's unlikely to make much difference except for short strings (because you'll typically be close to the memory bandwidth anyway).
Edit: I originally mis-read your question as simply asking about how to count the length of a string of characters encoded in UTF-8. If you want to count character frequencies, you probably want to convert those to UTF-32/UCS-4, then you'll need some sort of sparse array to count the frequencies.
The hard part of this deals with counting code points vs. characters. For example, consider the character "À" -- the "Latin capital letter A with grave". There are at least two different ways to produce this character. You can use codepoint U+00C0, which encodes the whole thing in a single code point, or you can use codepoint U+0041 (Latin capital letter A) followed by codepoint U+0300 (Combining grave accent).
Normalizing (with respect to Unicode) means turning all such characters into the the same form. You can either combine them all into a single codepoint, or separate them all into separate code points. For your purposes, it's probably easier to combine them into into a single code point whenever possible. Writing this on your own probably isn't very practical -- I'd use the normalizer API from the ICU project.