Localization with NLP: Global Empire-Building for Fun & Profit
In order to establish a user base across the globe, a product needs to support a variety of locales. The challenge with supporting multiple locales is the maintenance and generation of localized strings, which are deeply integrated into many facets of a product. To address these challenges at Qordoba, we’re using highly scalable technologies and natural language processing (NLP) to automate the process. Specifically, we need to generate high-quality translations in many different languages and make them available in real-time across platforms, e.g. mobile, print, and web. The combination of various open source tools provides structure for a scalable localization platform with machine learning at its core.
In this talk, we describe the techniques we’re using to provide:
- Continuous deployment of localized strings
- Live syncing across platforms (mobile, web, photoshop, sketch, help desk, etc.)
- Content generation for any locale
- Emotional response
We will also share our architecture for handling billions of localized strings in many different languages. We talk about our use of:
- Scala and Akka as an orchestration layer
- Apache Cassandra and MariaDB as a storage layer
- Apache Spark, Apache PredictionIO (incubating), Apache HBase, and ElasticSearch for natural language processing
- Apache Kafka as a message bus for reporting, billing, & notifications
- Docker, Marathon, & Apache Mesos for containerized deployment
We present our natural language processing (NLP) techniques in the context of a platform that makes it feasible to build products that feel native to every user, regardless of language.
Director of Data Science at Qordoba, uses machine learning to create better UX and less pain for engineers