Scrape data about reviews and ratings

In this tutorial, we'll show you how to:

  • Scrape the names of all of the reviewers of a product
  • Scrape the star rating of each review
  • Scrape the text of each review

I'll use this product from the Walmart website as an example of how to scrape review details: http://www.walmart.ca/en/ip/ninja-coffee-bar-brewer/6000195370640

Selecting and scraping each reviewer's name

1. In a new template, or with a newly created Select command, click on the name of the first reviewer on a product.

2. Click on the next few reviewers, to train ParseHub to select each of the reviewers' names.

 

3. ParseHub should automatically create a Begin New Entry command and Extract commands, that will get the names of the reviewers.

Getting the star rating for each review

4. Click the plus button to the right of the Begin New Entry command and add a Relative Select command.

5. Click on one of the reviewer's names, and then click on the star rating, so that an arrow shows up between each reviewer and the rating.

6. Click on the plus button to the right of the relative selection, and add an extract command.

7. In the Extract dropdown, select title attribute.

Note that different websites will hide the data about star numbers in different places, so you may need to right click on the rating and press Inspect Elements to find the rating.

Getting the text of each review

8. Like with scraping the star ratings, add a new Relative Select command.

9. Click on one of the reviewer's names, and then click on the text, so that an arrow shows up between each reviewer and the text. You may need to zoom out to select the whole comment, which you can do by holding down control or command and pressing 1 and 2 while hovering over part of the comment. When the whole comment is highlighted in blue, click.

 

Using Relative Select commands like this, you could also scrape the review data, number of positive votes the review has, and any other data that comes with it.

We have other tutorials if you need to: