Almost all eCommerce or online retail websites display products in different categories: by brand, type of products, or price, etc. Sometimes, you only want to scrape some of these categories instead of all of them. This tutorial will show you how to click into specific categories on an eCommerce website using Conditional commands and scrape all of the products from these categories.
Initial project set-up
1. Open up the ParseHub client and click on "New Project"
2. In the textbox that appears, enter the URL of the website that you want to scrape. For this example, I will be scraping the Movies & TV section of Amazon. Click on "Start project on this URL".
Clicking into specific categories
1. Using the empty Select command automatically generated for you, click on the first category (genre) on the left-hand side of the page. It will get highlighted in green, while similar elements will get highlighted in yellow.
2. Click on one of the other genres highlighted in yellow to train ParseHub to select all of the genre links on the page. You can rename this command to something more descriptive, such as genre, for example.
3. We only want ParseHub to begin entries in our results file for genres that we are interested in, so click on the list icon beside "Select genre" to expand the Begin new entry command. Delete it by clicking on the x button beside it to avoid beginning entries for every single genre.
4. Click on the plus button next to "Select genre", click Advanced, and then add a Conditional command.
I want to click into the Comedy genre, so I will type in $e.text.contains("Comedy") into the Expression textbox. We could type in !$e.text.contains("Comedy") instead if we wanted to click into every genre that is not Comedy. You can replace "Comedy" in the expression above with the name of any category/genre that you want to scrape in your specific project.
If we want to scrape multiple categories from the same selection, we can use upright slashes. In the example below, the conditional will be true for the Comedy and Drama genres.
6. Click on the plus button next to your condition, click Advanced, and choose a Begin new entry command.
This will create a new row in our CSV/JSON object for every genre that is in our condition. Rename your command to "genre".
7. Click on the plus button next to the Begin new entry command, click Advanced, and choose an Extract command. This will extract the name of the genres in our list. Rename your Extract command to "name".
Scrape products from categories
1. To go into each category and scrape products from them, click on the plus button next to your Begin new entry command and choose a Click command.
2. In the pop-up that appears, choose "No" and create a new template. I will rename my new template "category".
3. Now that we are on the products page, we can follow the instructions from our tutorial on scraping product details from an eCommerce websites to finish collecting our data from our categories of interest.