I'm writing a code for handling SMS PDUs based on all those ETSI GSM documentations. There is one thing I need to ask about. PDU contains a User Data Length field followed by User Data. According to GSM 03.40, the UDL field is the number of septets of user data when the uncompressed GSM default alphabet is used. However, it also says, that when the data is compressed, then the UDL is the number of octets of user data.
See the quotes:
If the TP User Data is coded using the GSM 7 bit default alphabet, the TP User Data Length field gives an integer representation of the number of septets within the TP User Data field to follow.
[...]
If the TP User Data is coded using compressed GSM 7 bit default alphabet or compressed 8 bit data or compressed UCS2 [24] data, the TP User Data Length field gives an integer representation of the number of octets after compression within the TP User Data field to follow.
The problem is that when the 7-bit data is compressed and the number of octets of the compressed user data is a multiple of 7, you don't know whether the last 7 bits in the last octet are fill bits or a real character. I.e. 7 octets may contain either 7 or 8 7-bit characters when compression is on. And when the UDL field is the number of octets, how can you know whether those 7 octets contain 7 or 8 characters?? If UDL contained the number of septets, everything would be clear, right? So have I misunderstood the documentation or does it really work this way?
Could anyone please explain me how it really works? Thanks in advance!