Loading…
Build Stuff 2018 has ended
Wednesday, November 14 • 1:50pm - 2:40pm
[SLIDES]Sam Elamin @samelamin - Lessons learnt implementing scalable, fault-tolerant data pipelines with Apache Spark

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
ETL pipelines ingest data from a variety of sources and must handle incorrect, incomplete or inconsistent records and produce curated, consistent data that delivers invaluable insight into the customers behaviour
 
In this talk Sam Elamin will relate his real life experience in building robust data processing pipelines powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance.

Sam will walk through building a Datalake using best practice patterns and running hundreds of jobs in parallel using open source tools including Apache Spark, Apache Airflow and Presto that underline systems which are dealing with £100,000 worth of transactions every hour, and more importantly will also highlight the pitfalls to avoid while providing scalable and reliable big data solutions

If you are curious about becoming a data engineer or fancy a move to big data then this is the talk for you!

Speakers
avatar for Sam Elamin

Sam Elamin

Data Engineer, Elamin LTD
My name is Sam and I am a Big Data Engineer as well as a Software Craftsman and Apache Spark evangelist. I am interested in Big Data, Metrics Driven Development, Continuous Delivery and is currently exploring Real Time Analytics, as as well as streaming tools and frameworks like Apache... Read More →



Wednesday November 14, 2018 1:50pm - 2:40pm EET
4. Lambda
  Session