Thursday Apr 26
14:45 –
15:30
Room 203-204

Analyzing Pwned Passwords with Apache Spark

Slides:


This video is also available in the GOTO Play video app! Download it to enjoy offline access to our conference videos while on the move.

Available in Google Play Store or Available in Apple App Store




Apache Spark aims to solve the problem of working with large scale distributed data -- and with access to over 500 million leaked passwords we have a lot of data to dig through.

Advancements in the API make running Spark with Scala, Python, or even SQL smoother and faster than ever. This talk will introduce you to Spark and the new way to run queries on structured, distributed data by looking at breached credentials. We'll walk through how to get started with Spark and discuss the tradeoffs for using different abstractions provided by the framework. With the help of live code, we'll find patterns in the password data and look at how you can encourage your users to be more secure. You will see how easy and fast it is to both explore and process data using Spark SQL and leave with the tools to get started with your own distributed data...and a password manager.

live demo
Scala
programming languages
distributed systems
Spark
SQL