Localization with NLP: Global Empire-Building for Fun & Profit
In order to establish a user base across the globe, a product needs to support a variety of locales. The challenge with supporting multiple locales is the maintenance and generation of localized strings, which are deeply integrated into many facets of a product. To address these challenges at Qordoba, we’re using highly scalable technologies and natural language processing (NLP) to automate the process. Specifically, we need to generate high-quality translations in many different languages and make them available in real-time across platforms, e.g. mobile, print, and web. The combination of various open source tools provides structure for a scalable localization platform with machine learning at its core.
In this talk, we describe the techniques we’re using to provide:
- Continuous deployment of localized strings
- Live syncing across platforms (mobile, web, photoshop, sketch, help desk, etc.)
- Content generation for any locale
- Emotional response
We will also share our architecture for handling billions of localized strings in many different languages. We talk about our use of:
- Scala and Akka as an orchestration layer
- Apache Cassandra and MariaDB as a storage layer
- Apache Spark, Apache PredictionIO (incubating), Apache HBase, and ElasticSearch for natural language processing
- Apache Kafka as a message bus for reporting, billing, & notifications
- Docker, Marathon, & Apache Mesos for containerized deployment
We present our natural language processing (NLP) techniques in the context of a platform that makes it feasible to build products that feel native to every user, regardless of language.