Scrapy: Powerful Web Scraping & Crawling with Python (Udemy.com)

Python Scrapy Tutorial - Learn how to scrape websites and build a powerful web crawler using Scrapy, Splash and Python

( 100 Reviews )

Created by: GoTrained Academy

Produced in 2022

What you will learn

Creating a web crawler in Scrapy
Crawling a single or multiple pages and scrape data
Deploying & Scheduling Spiders to ScrapingHub
Logging into Websites with Scrapy
Running Scrapy as a Standalone Script
Integrating Splash with Scrapy to scrape JavaScript rendered websites
Using Scrapy with Selenium in Special Cases, e.g. to Scrape JavaScript Driven Web Pages
Building Scrapy Advanced Spider
More functions that Scrapy offers after Spider is Done with Scraping
Editing and Using Scrapy Parameters
Exporting data extracted by Scrapy into CSV, Excel, XML, or JSON files
Storing data extracted by Scrapy into MySQL and MongoDB databases
Several real-life web scraping projects, including Craigslist, LinkedIn and many others

Quality Score

Content Quality

Video Quality

Qualified Instructor

Course Pace

Course Depth & Coverage

Overall Score : 84 / 100

Live Chat with CourseDuck's Co-Founder for Help

Need help deciding on a python course? Or looking for more detail on GoTrained Academy's Scrapy: Powerful Web Scraping & Crawling with Python? Feel free to chat below.

Join CourseDuck's Online Learning Discord Community

Course Description

Why this course?

Join the most popular course on Web Scraping with Scrapy, Selenium and Splash.
Learn from the a professional instructor, Lazar Telebak, full-time Web Scraping Consultant.
Apply real-world examples and practical projects of Web Scraping popular websites.
Get the most up-to-date course and the only course with 10+ hours of playable content.
Empower your knowledge with an active Q&A board to answer all your questions.
30 days money-back guarantee.

Scrapy is a free and open source web crawling framework, written in Python. Scrapy is useful for web scraping and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival. This Python Scrapy tutorial covers the fundamentals of Scrapy.
Web scraping is a technique for gathering data or information on web pages. You could revisit your favorite web site every time it updates for new information, or you could write a web scraper to have it do it for you!
Web crawling is usually the very first step of data research. Whether you are looking to obtain data from a website, track changes on the internet, or use a website API, web crawlers are a great way to get the data you need.
A web crawler, also known as web spider, is an application able to scan the World Wide Web and extract information in an automatic manner. While they have many components, web crawlers fundamentally use a simple process: download the raw data, process and extract it, and, if desired, store the data in a file or database. There are many ways to do this, and many languages you can build your web crawler or spider in.
Before Scrapy, developers have relied upon various software packages for this job using Python such as urllib2 and BeautifulSoup which are widely used. Scrapy is a new Python package that aims at easy, fast, and automated web crawling, which recently gained much popularity.
Scrapy is now widely requested by many employers, for both freelancing and in-house jobs, and that was one important reason for creating this Python Scrapy course, and that was one important reason for creating this Python Scrapy tutorial to help you enhance your skills and earn more income.
In this Scrapy tutorial, you will learn how to install Scrapy. You will also build a basic and advanced spider, and finally learn more about Scrapy architecture. Then you are going to learn about deploying spiders, logging into the websites with Scrapy. We will build a generic web crawler with Scrapy, and we will also integrate Splash and Selenium to work with Scrapy to iterate our pages. We will build an advanced spider with option to iterate our pages with Scrapy, and we will close it out using Close function with Scrapy, and then discuss Scrapy arguments. Finally, in this course, you will learn how to save the output to databases, MySQL and MongoDB. There is a dedicated section for diverse web scraping solved exercises... and updating.
One of the main advantages of Scrapy is that it is built on top of Twisted, an asynchronous networking framework. "Asynchronous" means that you do not have to wait for a request to finish before making another one; you can even achieve that with a high level of performance. Being implemented using a non-blocking (aka asynchronous) code for concurrency, Scrapy is really efficient.
It is worth noting that Scrapy tries not only to solve the content extraction (called scraping), but also the navigation to the relevant pages for the extraction (called crawling). To achieve that, a core concept in the framework is the Spider -- in practice, a Python object with a few special features, for which you write the code and the framework is responsible for triggering it.
Scrapy provides many of the functions required for downloading websites and other content on the internet, making the development process quicker and less programming-intensive. This Python Scrapy tutorial will teach you how to use Scrapy to build web crawlers and web spiders.
Scrapy is the most popular tool for web scraping and crawling written in Python. It is simple and powerful, with lots of features and possible extensions.

Python Scrapy Tutorial Topics:

This Scrapy course starts by covering the fundamentals of using Scrapy, and then concentrates on Scrapy advanced features of creating and automating web crawlers. The main topics of this Python Scrapy tutorial are as follows:

What Scrapy is, the differences between Scrapy and other Python-based web scraping libraries such as BeautifulSoup, LXML, Requests, and Selenium, and when it is better to use Scrapy.

This tutorial starts by how to create a Scrapy project and and then build a basic Spider to scrape data from a website.

Exploring XPath commands and how to use it with Scrapy to extract data.

Building a more advanced Scrapy spider to iterate multiple pages of a website and scrape data from each page.

Scrapy Architecture: the overall layout of a Scrapy project; what each field represents and how you can use them in your spider code.

Web Scraping best practices to avoid getting banned by the websites you are scraping.

In this Scrapy tutorial, you will also learn how to deploy a Scrapy web crawler to the Scrapy Cloud platform easily. Scrapy Cloud is a platform from Scrapinghub to run, automate, and manage your web crawlers in the cloud, without the need to set up your own servers.

This Scrapy tutorial also covers how to use Scrapy for web scraping authenticated (logged in) user sessions, i.e. on websites that require a username and password before displaying data.

This course concentrates mainly on how to create an advanced web crawler with Scrapy. We will cover using Scrapy CrawlSpider which is the most commonly used spider for crawling regular websites, as it provides a convenient mechanism for following links by defining a set of rules. We will also use Link Extractor object which defines how links will be extracted from each crawled page; it allows us to grab all the links on a page, no matter how many of them there are.

Furthermore there is a complete section in this Scrapy tutorial to show you how to combine Splash or Selenium with Scrapy to create web crawlers of dynamic web pages. When you cannot fetch data directly from the source, but you need to load the page, fill in a form, click somewhere, scroll down and so on, namely if you are trying to scrape data from a website that has a lot of AJAX calls and JavaScript execution to render webpages, it is good to use Splash or Selenium along with Scrapy.

We will also discuss more functions that Scrapy offers after the s

Time: 10.5h

*Some courses are excluded from this sale. Coupon not working? If the link above doesn't drop prices, clear the cookies in your browser and then click this link here.
Also, you may need to apply the coupon code directly on the cart page to get the discount.

Coupon Code

Instructor Details

GoTrained Academy

4.2 Rating
100 Reviews

GoTrained is an e-learning academy aiming at creating useful content in different languages and it concentrates on technology and management.
We adopt a special approach for selecting content we provide; we mainly focus on skills that are frequently requested by clients and jobs while there are only few videos that cover them. We also try to build video series to cover not only the basics, but also the advanced areas.Full time scraping consultant specializing in web scraping, crawling, and indexing web pages.
Worked on projects that deal with automation and website scraping, crawling and exporting data to various data formats.
Over the years worked with 100+ different individuals/startups/companies and helped them achieve their goals.
Feel free to contact me on LinkedIn for more information on in-person training sessions.

Python for Everybody Specialization (2014)

4.7 (90 Reviews)

Provider: Coursera
Time: 25h

Free

Python Tutorial for Beginners by Corey Schafer (2017)

4.8 (28 Reviews)

Provider: YouTube
Time: 9h

Free

Google's Python Class (2010)

4.4 (18 Reviews)

Provider: Google
Time: 3h 35m

Free

More Courses

Reviews

4.2

100 total reviews

5 star 4 star 3 star 2 star 1 star

Write a Review

By Louis Rivas on 2 weeks ago

Falt ms detalle en el proceso, se repite demasiadas veces lo mismo.El scrap con Linkedin debera haber sido con scrapy + selenium.Curso un poco desactualizado y audio poco pulcro.

By Michael Gilbert on a month ago

I love the videos and think they are very informative. I really appreciate how the instructor goes over everything and doesn't leave a single, necessary character unexplained.I was first a little frustrated with how fast the instructor was going through the videos and examples, leaving a 4/5 instead of a 5/5. HOWEVER, I later realized my video players were stuck at 1.25x speed. So, I am here to correct my review and give it the 5 starts that it deserves.

By Desmend Jetton on a month ago

Lazar goes above and beyond to answer all queries. He's very knowledgeable and I learned a ton for my side projects.

By Paradorn Boonpoor on a month ago

I done in this my project because I enrolled this course. Thank you the teacher. I reccommend this course

By Francisco Medina on 3 months ago

This course was worth the money and time hands on, the instructor was very detailed on the subject and knows his stuff. Only one minor issues was the audio it sounded slightly low so I had to rewind a few scenes again but all in all this was an excellent course.

By Davide Mizzaro on 4 months ago

I'm learning exactly the things I expected to learn. I didn't give 5 stars only because of the lessons organization (some content is repeated among the sections).

By Nicholas Dossegger on 3 weeks ago

Should Get 4 stars BUT!!!!The new Splash section is a decent dive into the tool. However, there is a MAJOR Issue for Splash if your using Windows 10 that he hasn't resolved even when a student gave him all the resources to do so.This course can be a 5 star if some updates are made and better explanations are given for the utilization of tools like Docker.

By Swaroop Kallakuri on 5 months ago

Instructor is quick responsive for each and every doubt & all the project that he had explain are awesome and are sufficient to provide knowledge to scrap content you want.

By Phill Farrer on 4 months ago

Well detailed and help at every step. I just completed it and now going to start a project. Teacher assures they can help with Questions along the way. Well paced and very thorough. Thank you.

By Joe Wenzel on 4 weeks ago

This course is great! The instructor is responsive to questions and the content is easy to understand. This course helped me so much! You will not be disappointed!

By Vasco Meerman on 5 months ago