The encoding of your SMS message directly affects how many characters you can send — and how much it costs. Understanding the difference between GSM-7 and UCS-2 (Unicode) encoding can save your business money and prevent messages from being split unexpectedly.
GSM-7: The Standard Encoding
GSM-7 is the default character set for SMS messages. It includes:
- Standard Latin letters (A-Z, a-z)
- Numbers (0-9)
- Common punctuation (. , ! ? ' " : ; / - + = @ £ $ etc.)
- Some special characters
With GSM-7 encoding, a single SMS can contain up to 160 characters. If your message exceeds 160 characters, it's split into multiple segments of 153 characters each (7 characters per segment are used for concatenation headers).
The Extended GSM Character Trap
Some characters that look standard actually count as two characters in GSM-7:
- Curly braces: { } — 2 characters each
- Square brackets: [ ] — 2 characters each
- Euro sign: € — 2 characters
- Backslash: \ — 2 characters
- Tilde: ~ — 2 characters
- Pipe: | — 2 characters
- Caret: ^ — 2 characters
A message that appears to be 158 characters could actually be 162 characters if it contains extended characters — pushing it into two-segment territory and doubling the cost.
UCS-2 (Unicode): When Things Get Expensive
If your message contains any character outside the GSM-7 set, the entire message switches to UCS-2 encoding. This reduces the single-segment limit from 160 to just 70 characters. Multipart messages drop to 67 characters per segment.
Characters that trigger UCS-2 include:
- Accented characters: é, ñ, ü, ö (common in names and loanwords)
- Emoji: Any emoji triggers UCS-2
- Smart quotes: "curly quotes" copied from Word or Google Docs
- Em and en dashes: — and – (vs the standard hyphen -)
- Non-Latin scripts: Chinese, Arabic, Cyrillic, Hindi, etc.
The Cost Impact
Consider this example: a 140-character message in GSM-7 is a single SMS. Add one emoji, and it becomes a UCS-2 message requiring 3 segments (140 ÷ 67 = 2.09, rounded up to 3). Your cost just tripled.
For international SMS, this is particularly important. Messages in Chinese, Arabic, or Hindi are always UCS-2, meaning your effective message length is 70 characters per segment.
Common Encoding Pitfalls
- Copy-pasting from Word: Microsoft Word uses smart quotes ("") and em dashes (—) that trigger UCS-2. Always paste into a plain text editor first
- Customer names with accents: Personalised messages using names like "José" or "Müller" will force UCS-2 encoding
- Emoji in marketing: A single 🎉 in a 150-character message turns 1 SMS into 3
- Template variables: Even if your template is GSM-7, dynamic content might include Unicode characters
Best Practices
- Test your messages: Use a character counter that checks encoding before sending
- Avoid emoji in bulk sends unless the engagement benefit outweighs the 2-3x cost increase
- Strip smart quotes: Replace "" with standard "" before sending
- Consider your audience: If personalisation fields might include accented names, factor UCS-2 costs into your budget
Faretext's SMS API and Oello platform include real-time character counters and encoding detection, so you always know exactly how many segments your message will use before you send. View our pricing for per-segment rates.