Scrape product categories

Almost all eCommerce or online retail websites display products in different categories, by brand, type of products, or price, etc. With ParseHub you can grab the number of products in each category easily to do a product analysis, fast.

In this tutorial we will show you how to:

1) Open and go to every category on the eCommerce website.

2) Scrape the total number of products in each category and all of the product names.

3) Use some simple Regular Expressions to clean up your data

If you're looking to scrape the details of the products in category, or from a search result, you should follow this tutorial on eCommerce pages instead.

Getting Started:

1. Open the ParseHub desktop app and click "New Project".


2. In the text box add the following url (or any other eCommerce site url) - and click on "Start project on this url".

Open and go to each category

3. Click on the "Select page" command plus button. Choose the "Select tool" from the tool menu.

4. Click on the first category on the website "Dresses & Jumpsuits".

5. Click on the second category on the website. All of the categories should be now selected for you in green. 

6. 3 tasks are automated for you. The "Begin new entry in selection1" command is created for you. Click on "select1" to rename it to "categories".

7. The name and url of each category is extracted for you automatically. Two new columns in excel will be created for you with the name and the url. See your sample results by clicking "CSV/Excel" on the desktop app.

8. Choose the plus button on the "Begin new entry in categories" command to add a command under. The new command will open each category and navigate to it.

9. Choose the "Click tool" from the tools menu. 

10. Choose the "Create New Template" button and type in a new temple name such as "product_listing".

You are creating a template because the next page that will open will look differently from the current page. The main_template was used to create instructions to get and open the categories. The new product_listings template will be used to process instructions that will let you get the number of products in each category.

11. Click "Create New Template".

A new template has now been created for you and a new webpage that shows one of the categories products was opened for you. This behaviour will repeat automatically for you, for every category. You only have to set it up one time.

Extract the number of products in each category

12. Find and click on the "Alternative styles" text and select the number after the text. This represents the number of products in each category. 

13. Rename the "Select & Extract selection1" to "Select & Extract number".

Use Regular Expressions to clean up your data and get only the number

You will notice that the selection gets the "Alternative styles" text for you as well. This can be removed using regular expressions.

You can fix this by creating a new extraction on the selection of the category number.

1. From the "Select & extract number" command click on the plus button and choose "Advanced". In the tool bar select the "Extract tool".

2. You will notice the advanced options appear. Check mark the "Use regex" box and in the text box write Alternative styles: (.*). This means you will get anything after the text - such as the number you need.


Save project and run it to get data:

1. Click on the "Get Data" button.

2. Click on "Run".

3. Click on "Save and Run".

4. Wait for your results to finish being collected and for the run page to refresh. The green buttons for you to upload your results in Excel or JSON will appear when you data is finished. You will also get a link to your data over email as well. 

You final set of data should look something like this:




Have more questions? Submit request!


Article is closed for comments.