The amount of non-English-language web content has been growing dramatically over the past few years. However, there are still some significant challenges when it comes to making content available in other languages. One of the biggest issues is how to represent languages that don’t use the Latin script.
Soon, that may not be a problem. Google, in collaboration with partners including companies like Adobe, is working on a rather ambitious project: Noto Fonts, a font family that “aims to support all the world’s languages” and “achieve visual harmonization across languages.”
At the moment, Noto Fonts features 100 scripts and 100,000 characters, and is capable of representing 600 written languages. That’s a great start, but there’s still a ways to go. According to Ethnologue, “ of the currently listed 7,105 living languages, 3,570 have a developed writing system.” Plus, there are around 3,000 languages that may or not have writing systems of their own- we simply don’t know.
As Tanvi Misrah notes on NPR’s Codeswitch blog, with Noto, Google is building on the previous work of the Unicode project.
Unicode currently features 100 scripts and more than 110,000 characters. However, the project has faced allegations of cultural insensitivity in the past, particularly when the time came to code Asian fonts. Between Chinese, Japanese and Korean, they ran out of code. Their solution was something called “Han unification.” As Finn Brunton, a professor at New York University explained to NPR:
“So they were like, ‘Hey, you know, Chinese, Japanese, Korean — they’re pretty close. Can we just mash big chunks of them together?'” explains Brunton.
Obviously, people who actually use these scripts were less than pleased with the compromise. To Brunton, the dust-up over Han unification indicates a larger problem with these sorts of projects:
“There’s all these different, sort of, approaches, which are fundamentally, obviously reflecting cultural models — cultural biases. But when they get substantiated into software, they turn into exclusionary systems.”
To its credit, Noto has preserved the variations in script between the different languages. As its partner Adobe notes on its blog, “While the variations may be subtle, especially to the Western eye, they are very important to the users of each language.”
However, other languages have fared less well, according to NPR. Urdu and Persian, for example, must be written in the Arabic naskh script, another case of subtle-yet-important distinctions being erased in the name of simplicity:
“The naskh script of the Arabic alphabet is more angular, linear — and incidentally, easier to code — than the nastaliq script. So that’s what is currently present in Noto for the Urdu language, even though Persian and Urdu language communities say nastaliq is a more accurate representation.”
That said, according to Google this only a temporary situation as they work to develop a nastaliq script.
The NPR article has inspired a lively debate amongst commenters, with some accusing Noto’s critics of making the perfect the enemy of the good.
For example, Brad Zimmmerman says
“I am the last person that will defend Google, but – in my opinion – it is unreasonable to criticise a project that already has good support for a huge number of languages and is *still in development*. It’s even a bit more unreasonable considering that Google’s efforts – the fonts themselves – are free *and* released under the Apache License, a very generous and easy-to-get-along-with license.”
What do you think of Noto? Is Google doing enough to address the concerns of minority language communities?