Unicode visual spoofing for fun and profit


Apple recently announced Swift, an innovative new programming language for iOS/OSX.

According to their product-page the language is designed for safety:

Swift eliminates entire classes of unsafe code. Variables are always initialized before use, arrays and integers are checked for overflow, and memory is managed automatically. Syntax is tuned to make it easy to define your intent — for example, simple three-character keywords define a variable (var) or constant (let).

I really like the idea of shifting the security focus closer to the developer. The next step in this evolution is to aid developers with tools and programming-languages designed with security in mind.

Exciting features

A recent blog-post about Swift gave me some insights into a rather fascinating aspect of the language: You can use unicode characters in your code… (Why, oh why?)


Besides giving a whole new meaning to the concept of obfuscated code, there exists a potential for more sinister mischief behind this functionality.

The unicode problem

Consider doing a code-review on the following code from an imaginary service:

Would you ever consider that the ‘a’ in “administrator” really could be:

  • U+0430 (Cyrillic small letter a)
  • U+0251 (Latin small letter alpha)
  • U+03B1 (Greek small letter alpha)
  • U+237A (APL Functional symbol alpha)

So if someone intentionally planted that in your codebase, and then registered as the user “administrator”, using a confusable character for ‘a’, they would be superUser.

This problem is not Swift-specific, it exists in all languages where unicode is used for string input.

Swift homograph variable attacks

What is possible in Swift however is that this attack-vector is extended to variables and constants, leading to two variables visually alike, but logically different.

Here two possibilities for intentional placements of backdoors exists:

  • The ‘a’ in the variable ‘name’ can be a confusable, leading the code to check another variable than what you think
  • The second variable ‘superUser’ is not the same as the first, due to a confusable character, leading to superUser always true.

This example is of course very simplistic but i believe it highlights the dangers that could lure in a language where Unicode can be used in variable-names.

JavaScript homograph variable attacks

It appears as Swift is not alone in this, JavaScript (ECMAScript 5.1) also supports unicode variable-names, however getting it working seems to require some fiddling with HTTP content charsets…


Those who do not remember their past are condemned to repeat their mistakes

Confusing unicode characters have been used historically to mislead people in for example URLs, now we can have the same problems in our code if we are not careful!

The Unicode Consortium have released a pretty cool utility [http://unicode.org/cldr/utility/confusables.jsp] that generates visually confusable unicode-strings. It is a good place to kickstart your imagination on what could be done!