Combining keywords from two lists (nested loops)

For some websites you may want to input two lists of data into two search fields. For example, you may have a list of keywords that you want to cross-reference with a list of locations:

  • Keywords: Plumbers, Locksmiths, Dentists, Doctors...
  • Locations: "Buffalo, NY","Portland, OH","Miami, FL"

Combining the two lists your searches would be:

  • "Buffalo, NY" Plumbers
  • "Buffalo, NY" Locksmiths
  • "Buffalo, NY" Dentists
  • "Buffalo, NY" Doctors
  • "Portland, OH" Plumbers
  • "Portland, OH" Locksmiths
  • .... etc. 

You could perform these searches on the Yellow Pages website, for example, which is what we will be doing on this tutorial.

Screen_Shot_2017-08-11_at_12.31.10_PM.png

 

Creating Your Lists

The is the format of a JSON list that you can use in ParseHub: 

{
  "keywords":["Plumbers","Locksmiths","Dentists","Doctors"]
}

When you combine the two, your lists should be in this format:

{
  "keywords":["Plumbers","Locksmiths","Dentists","Doctors"],
"locations":["Buffalo, NY","Portland, OH","Miami, FL"] }

 

We recommend using a tool like Mr. Data Converter to convert a list of words into a JSON list.

In the "Input" section type in your list name (e.g. "keywords") into the first row followed by each list item on a separate row. Change the Output to "JSON - Column Arrays" and copy the JSON list from the Output field.

keywords.png

 

Building Your Project

1. Open your ParseHub client, click on "New Project" and input the URL you would like to scrape data from. For this example we will be using the Yellow Pages, you can type https://www.yellowpages.com/ into your project if you would like to follow along. Click on "Start project on this URL".

 

Screen_Shot_2017-08-11_at_12.42.35_PM.png

 

2. Go under the "Settings" tab on the project.

Screen_Shot_2017-08-11_at_1.01.09_PM.png

 

3. In the "Starting Value" box, add in your lists of keywords in JSON which you created in the first part of this tutorial. You will see both lists appear in the preview section at the bottom of ParseHub:

Screen_Shot_2017-08-11_at_4.31.32_PM.png

 

4. Go back to the "Commands" tab on your project. Then click on the "+" button next to "Select page" and click on the "Advanced" arrow to show more tools.

Screen_Shot_2017-08-11_at_2.13.38_PM.png

 

5. Choose the "Loop" command. The loop command iterates through a list, and is good for repeating commands multiple times.

Screen_Shot_2017-08-11_at_2.16.24_PM.png

 

6. In the text boxes - change "item" to "keyword" and type in "keywords" in the list text box (without the quotation marks).

  • You can change "item" to anything you want. The item represents one keyword in your list of keywords.
  • Make sure the the list name is exactly the same as your list name in JSON. If you typed in {"keywords":....} make sure to keep the text in the text box as keywords (this is case sensitive).

Screen_Shot_2017-08-11_at_2.40.03_PM.png

 

7. Click on the "+" button next to "For each keyword in keywords", click on the "Advanced" arrow to show all the commands and select a "Begin New Entry" command. Now the results for each one of the keywords will go into a separate row in Excel and a separate scope in JSON. If you don't use this command anywhere in your project, the results scraped for each keyword will over-ride one another.

Screen_Shot_2017-08-11_at_3.19.05_PM.png

 

8. Rename the "list1" name that appears next to "Begin new entry" to something else like "jobs". Make sure not to name the list command the same as the list that holds your keywords. The list command should have a unique name. 

Screen_Shot_2017-08-11_at_2.40.55_PM.png

 

9. Click on the "+" button next to "Begin new entry in jobs" (or "Begin new entry in list1" if you did not rename it in the previous step) and choose a Select command

Screen_Shot_2017-08-11_at_3.00.15_PM.png

 

10. Click on the left-hand search box on the Yellow Pages (which says "Search by business or keyword"). ParseHub will automatically create an Input command for you. Instead of typing the actual keyword, just type in "keyword". This will tell ParseHub to add in the current keyword in your list of keywords. Also ensure that you select "expression" in the "Input type" drop-down menu so that ParseHub will read the text as an expression instead of just plain text.

Screen_Shot_2017-08-11_at_3.56.23_PM.png

 

11. [Optional] Now you have an option to extract each keyword in your list along with the related results. This step will add a new column for the keywords that you provided as the starting value. 

If you would like to do this, click on the "+" button next to "Begin new entry in jobs" (or "Begin new entry in list1" if you did not rename the command), click on the "Advanced" arrow to show all the commands and select an Extract command. Instead of $location.href enter "keyword" and rename the Extract command to "currentkeyword". In your final results you will have a column which extracts the associated keyword per each result.

Screen_Shot_2017-08-11_at_3.24.52_PM.png

 

12. To nest our loops, we will now repeat steps 6 - 11 for our locations list which will be nested in our keywords list.

Click on the "+" button next to "Begin new entry in jobs" (or "Begin new entry in list1" if you did not rename the command), click on the "Advanced" arrow to show all the commands and select another "Loop" command. In the text boxes - change "item" to "location" and type in "locations" in the list text box (without the quotation marks).

  • You can change "item" to anything you want. The item represents one location in your list of locations.
  • Make sure the the list name is exactly the same as your list name in JSON. If you typed in {"locations":....} make sure to keep the text in the text box as keywords (this is case sensitive).

Screen_Shot_2017-08-11_at_3.26.49_PM.png

 

12. Click on the "+" button next to "For each location in locations", click on the "Advanced" arrow to show all the commands and select a "Begin New Entry" command. Now the results for each one of the keywords will go into a separate row in Excel and a separate scope in JSON. Rename the "list1" name that appears next to "Begin new entry" to something else like "cities". Make sure not to name the list command the same as the list that holds your locations. The list command should have a unique name. 

Screen_Shot_2017-08-11_at_3.27.47_PM.png

 

13. Click on the "+" button next to "Begin new entry in cities" (or "Begin new entry in list2" if you did not rename it in the previous step) and choose a Select command. Click on the right-hand search box on the Yellow Pages (which should have a current location such as "Fort Lauderdale, FL"). ParseHub will automatically create an Input command for you. Instead of typing the actual keyword, just type in "location".This will tell ParseHub to add in the current location in your list of locations. Also ensure that you select "expression" in the "Input type" drop-down menu so that ParseHub will read the text as an expression instead of just plain text.

Screen_Shot_2017-08-11_at_3.58.48_PM.png

 

14. [Optional] Now you have an option to extract each location in your list along with the related results. This step will add a new column for the locations that you provided as the starting value. 

If you would like to do this, click on the "+" button next to "Begin new entry in cities" (or "Begin new entry in list2" if you did not rename the command), click on the "Advanced" arrow to show all the commands and select an Extract command. Instead of $location.href enter "location" and rename the Extract command to "currentlocation". In your final results you will have a column which extracts the associated location per each result.

Screen_Shot_2017-08-11_at_3.32.37_PM.png

 

15. To select and click on the search button, click on the "+" button next to "Begin new entry in cities" (or "Begin new entry in list2" if you did not rename the command) and choose a Select command. Select the search button on the Yellow Pages website.

Screen_Shot_2017-08-11_at_3.36.06_PM.png

 

16. Click on the "+" button next to "Select & Extract selection3" and choose a Click command. The Click command let's you click on anything on the page to open drop downs, tabs, etc or to click on buttons that will take you to another page. 

Screen_Shot_2017-08-11_at_3.39.33_PM.png

 

17.  In the pop-up select the "Create New Template" option which you can call "results". Clicking on the button will product a new page of results, therefore, you should be creating a new template to make a new set of instructions. Remember, you should use a new template for every page that looks different. Click on "Create New Template".

Screen_Shot_2017-08-11_at_3.41.27_PM.png

 

18. On the new template you can go ahead and select and extract any of the results that you want to scrape. ParseHub will repeat the instruction of searching for the keyword and scraping results for all of the keywords and locations you added into the "Starting value". 

 

Download this Project

You can download the project that we just created here: Yellow_Pages_-_Nested_Loops.phj

To open the project in your account, open ParseHub, go to My Projects, click on Import Project and select the file. Note that this project will work on the Yellow Pages only. However, you can customize the project to include your own keyword lists, extract different data from the results page and add more commands. This video tutorial has an example of how to extract more information from the Yellow Pages. 

 

Have more questions? Submit request!

0 Comments

Article is closed for comments.