—
is not em dash, your text was mis-translated from em dash to that value.
—
is the HTML decimal entity for em dash. Specifically it is referencing the Unicode code point 8212 which represents an em dash.
- Your file is not ASCII if it contains an em dash. ASCII chars only encode to decimal range 0 - 127, and em dash is not a character that can be represented by ASCII encoding. If you have em dash stored as 0x97 (151 in decimal) you probably have an ANSI text file (aka Windows Codepage 1252 (w-1252)).
Your first app...
The data started as an em dash encoded in w-1252. In w-1252 the em dash maps to the decimal value 151 (0x97 in hex, or 10010111 in binary).
At some point the em dash was handled by code that thought the bytes in your file were iso-8859-1 encoded text. When that code interpreted 0x97 as a string/char it mapped 0x97 to a character according to the iso-8859-1 encoding. In iso-8859-1 0x97 maps to the char "End of guarded area".
Next, the string, which the code thinks is the "End of guarded area" control char, was encoded as utf-8. "End of guarded area" encoded in utf-8 is the two-byte sequence: 0xC2 0x97.
Your second app...
The text file was correctly interpreted as w-1252, thus the 0x97 is recognized as em dash, which was correctly encoded as the em dash in utf-8: 0xE2 0x80 0x94.
What influences this behavior
Not sure if you're dealing with web apps or what, but the concept should be the same whatever it is. We had the same 0x97->0xC297 scenario in a web app where people input data into a form. I found that the charset of the web page was declared as iso8859-1, and the browser's best way to handle the w1252 chars was to just send them along as as the iso bytes without alerting the user or the server. The server receives the data thinks it's iso and converts to utf-8, resulting in 0xC297.
Basically any time an app touches text it needs to be told how the text is encoded, or else it might fall back to a system default. If that happens you risk data corruption.