Spark and Python for Big Data with PySpark (

Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more!

Created by: Jose Portilla

Produced in 2022

What you will learn

  • Use Python and Spark together to analyze Big Data
  • Learn how to use the new Spark 2.0 DataFrame Syntax
  • Work on Consulting Projects that mimic real world situations!
  • Classify Customer Churn with Logisitic Regression
  • Use Spark with Random Forests for Classification
  • Learn how to use Spark's Gradient Boosted Trees
  • Use Spark's MLlib to create Powerful Machine Learning Models
  • Learn about the DataBricks Platform!
  • Get set up on Amazon Web Services EC2 for Big Data Analysis
  • Learn how to use AWS Elastic MapReduce Service!
  • Learn how to leverage the power of Linux with a Spark Environment!
  • Create a Spam filter using Spark and Natural Language Processing!
  • Use Spark Streaming to Analyze Tweets in Real Time!

Quality Score

Content Quality
Video Quality
Qualified Instructor
Course Pace
Course Depth & Coverage

Overall Score : 84 / 100

Live Chat with CourseDuck's Co-Founder for Help

Need help deciding on a python course? Or looking for more detail on Jose Portilla's Spark and Python for Big Data with PySpark? Feel free to chat below.
Join CourseDuck's Online Learning Discord Community

Course Description

Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Python!
One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!
Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!
This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax! Once we've done that we'll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you'll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem!
We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees! After you complete this course you will feel comfortable putting Spark and PySpark on your resume! This course also has a full 30 day money back guarantee and comes with a LinkedIn Certificate of Completion!
If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you!Who this course is for:
  • Someone who knows Python and would like to learn how to use it for Big Data
  • Someone who is very familiar with another programming language and needs to learn Spark

*Some courses are excluded from this sale. Coupon not working? If the link above doesn't drop prices, clear the cookies in your browser and then click this link here.
Also, you may need to apply the coupon code directly on the cart page to get the discount.

Coupon Code

Instructor Details

Jose Portilla

Jose Marcial Portilla has a BS and MS in Mechanical Engineering from Santa Clara University and years of experience as a professional instructor and trainer for Data Science and programming. He has publications and patents in various fields such as microfluidics, materials science, and data science technologies. Over the course of his career he has developed a skill set in analyzing data and he hopes to use his experience in teaching and data science to help other people learn the power of programming the ability to analyze data, as well as present the data in clear and beautiful visualizations. Currently he works as the Head of Data Science for Pierian Data Inc. and provides in-person data science and python programming training courses to employees working at top companies, including General Electric, Cigna, The New York Times, Credit Suisse, and many more. Feel free to contact him on LinkedIn for more information on in-person training sessions or group training sessions in Las Vegas, NV.



50 total reviews

5 star 4 star 3 star 2 star 1 star
% Complete
% Complete
% Complete
% Complete
% Complete

good intro course!though in some places, it would've been better if he explained things out a bit more.

This course was exactly what I needed. I'm working on a big data project for my capstone project in school, and I had no idea how to use PySpark. This course showed me everything I needed to perform exploratory data analysis, clean the data, encode categorical features, and train all the models I needed. This course has been a lifesaver. Jose is an amazing instructor. I have several of his courses, and I've learned so much. His courses have been critical for filling in the gaps of my current school's curriculum.

Good enough for different backgrounds. I was expecting little more theory but it is fine. Overall, it is worth it.

Alles helder en duidelijk. De instructeur is to the point en duidelijk. Focus is het werkende voorbeelden op te zetten en minder op de Theory (prima).Installatie guide met VirtualBox en Ubuntu is niet up to date. Koste me daardoor veel te veel tijd. Dit kan beter.Ik zou wel iets meer willen weten over het draaien op clusters en hoe je data hadoop style bewaard. Als we dit niet behandelen is het ook niet nodig om met een virtual box op linux te draaien. Ik hoop dus dat de cursus dit waar maakt. vraag me aan het eind nog maar een keer mijn mening

Jose has failed to explain this topic for real world scenarios such as Spark failure and recovery mechanism etc. This course is too far from the real world scenarios and doesn't really serve the purpose. Examples covered are just the copy from Spark manual and instructor is just reading out and typing manually which is very sad to see. Even core concepts are not covered thoroughly. Disappointed.

Awesome experience with this step by step course. It is successful in explaining and setting up core elements in Apache Spark. Great explaining by Jose and keep it up.Thank You.

Would be better if Spark Streaming section would be bigger, but I liked the course a lot. The speaker is clear, has interesting examples and hands-on exercises are cool as well

Course helps to give an introduction to spark dataframes and machine learning techniques using MLLIB

Maybe just some other lectures about spark streaming and integration with Kafka would complete the course and rate it to 5stars

Awesome course with practical and motivated consulting projects.

It lacks of a real world example working with database like mysql or mongodb. The setup installation of pyspark and tweetpy sections were really good otherwise.

explanation is good, But was interested in few concepts are missing like: 1) RDD.2) Broadcast , accumulator.3) Spark internal architecture and its working.4)Memory allocations for worker nodeswhere can i find this file "Linear_Regression_Consulting_Project.ipynb" ?