Find and extract a specific value from a table

Parsehub can be used to extract a specific value from a table. On some websites the format of the table changes per listing and you should make sure to tell ParseHub to select the right value even if the position of the selection(element) changes.

In order to extract a particular value from a table, we can select the headers/titles' values and add a conditional command to check and extract the only relevant required values from the next column. 

For this example, we will scrape Kijiji.ca and Amazon.ca to explain two different formats which can be found on different websites. 

 

How to extract a specific value from a table

Kijiji

For this website we want to select and extract the value of "Date Listed" and "Furnished" from the table:

1. Click on the + button next to the select page, then choose the Select tool. Click on the "Date listed", the next headers will be highlighted in yellow. Click on the second header to select all of the headers. Remove the Begin new entry and Extract name commands. 

 

2. Click on the + button next to the Selection1, then choose the Conditional tool. In the conditional command write $e.text.contains("Date Listed"). Please make sure to enter the text of the header in the same format (this command is case-sensitive).

3. Click on the + button next to Conditional command and add a relative selection and click on "Date Listed" which is highlighted in orange and go to "14-Mar-17". If the rest of the values are not selected (highlighted in green), click on the second header "Price" and go to the related price to give more examples to ParseHub and train the relative select.

4. Now you can either repeat step 3 to extract the value of "Furnished" element as well or you can click on the conditional command, hold command/control + c to copy the commands and hold command/control + v to paste the commands. You can just edit the conditional command and the extract's title command to extract the value of "No" as "Furnished".

The Conditional command will be checked and it is true then the relative selection will be executed to extract the elements.

2017-04-13_15-14-33.png

 

Amazon

On Amazon we will select and extract the value of "ASIN" and "Amazon Bestsellers Rank" from the product details section.

For the product's details section, selecting the headers only is not really doable since the format of these values are different.

1. Scroll down on the website to the product details section. Click on the + button next to the select page, then choose the Select tool. Move your cursor over the first element and hold command/control and press 1 on your keyboard to zoom out and select the Li tag. Click on the next ones highlighted in yellow to select all of them.

2017-04-12_15-18-34.png

2. Remove the Begin new entry and Extract name commands. Let's assume we want to extract the "ASIN" value and "Amazon Bestsellers Rank" from the product details table. 

3. Click on the + button next to the Select details command node, then choose the Conditional tool. In the conditional command we can enter $e.text.contains("ASIN"). Please make sure to enter the text of the header in the same format (this command is case-sensitive).

Now if the selection's text contains ASIN then we will extract the value. We can use regex ASIN:(.*) to filter the text and only get the value. 

4. Repeat the step 3 to get the "Amazon bestseller Rank" value as well, or click on the conditional command, hold command/control + c to copy the commands and hold command/control + v to paste the commands.

Now you can just edit the conditional command and the extract's title command to extract the value of the "Amazon Bestsellers Rank".

2017-04-12_15-39-46.png