EXAMPLE: Create your first ParseHub project

In this tutorial we will go over all of the ParseHub basics, build a project from scratch to scrape one website.

After this tutorial you should be able to download data in CSV or JSON format from one page of a website.

Step 1. Download the ParseHub Desktop App

You can download ParseHub here and follow the instructions. ParseHub is available on Windows, Mac and Linux.

Step 2. Open Website & Start New Project

1. Open the ParseHub Desktop App

2. Click "New Project"

3. Enter https://news.ycombinator.com/ into the text box.

4. Click "Start project on this URL"

Step 3: Select & Extract All of the Post Titles

1. Click on the button on the right side of the "Select page" command.

2. Choose the "Select" tool from the tool box. With this tool you can target any elements on the page to be selected. The select tool will also automatically extract common data, like text and URL. 

3. Click on the first title on the page. It should highlight it green for you, showing that it has been selected for extraction.

4. The other items should be highlighted yellow, showing that ParseHub thinks the elements are similar. Click on the second title on the page. Now, all of the posts should be highlighted in green for you. Now, all of the titles on the page will be selected and extracted.

5. Click on the "Select selection1" command in the app side bar. Rename the "select1" to "posts".

6. You will notice 3 new commands were created for you in the app side bar.

  • Automatically all of the selections were put in a new entry for you. This new entry created an empty row in Excel for each selection or an empty scope in JSON for each selection. All of the actual data you extract from each selection and other instructions (commands) will be repeated in each row or scope for each selection.
  • The text was extracted for all selections automatically. The text was extracted into an empty scope in JSON for you, or into the empty row in Excel (creating a new column).
  • The url was also extracted from each selection for you. The url was extracted into the same row as the text for each selection, creating a new column beside the "name" column. The url for each selection was also added to the same scope as the text (name) of each selection.

7. Rename the "Extract name" command by clicking to it to say "title".

Your CSV/Excel sample results should look like this:

Your JSON sample results should look like this:

Step 4: Select & Extract All of Post Comments

1. Make sure the "Begin new entry in posts" command is highlighted in blue.

2. Click on the + button on the right side of the "Begin new entry in posts" command.

3. Choose the "Relative Select" tool from the tool box. This tool let's you create a relationship between data that is already selected on the page, and any data that you want to attach to it.

This tool is best used for websites that list businesses, products, hotels, etc - when you need to pair the main business or product title with other repetitive data on the page such as telephone numbers or product prices. 

4. Click on the first title on the page.

5. Click on the corresponding comments number on the page. You should see an arrow created between one title and the comments. We need to refine this selection.

6. Click on the second title on the page.

7. Click on the comments number that corresponds to the second title. You should see an arrow created between all of the title and all of the comments on the page. 

8. The number of comments and the link to the comments was automatically extracted for you. 

9. Rename the Relative "selection1" command by clicking on it and typing in "comments"

Your CSV/Excel sample results should look like this:

Your JSON sample results should look like this:

Step 4. Run the Project

1. Click on the "Get Data" button on the bottom of the page

 

2. Click on the "Run" button

3. Click on the "Save and Run" button

Step 5. Download Data (CSV & JSON)

1. Wait a few seconds for your data to be ready and for ParseHub to scrape the website.

2. You will see the "your data is refreshing" text while you wait. You can refresh the results faster by clicking the blue refresh button.

2017-04-27_12-03-19.png

3. When the data is ready you will see "Download CSV" and "Download JSON" buttons highlight in bright green. Click on one of these buttons to download your CSV or JSON results.

You will also get an email when your run is finished - you can download your data from the link in this email.

Step 6. Connect to API

You can follow the instructions in our API reference.

To find your API Key:

1. You can get the API key from Run page or Click on the profile button on the top right corner of the app

2. Click on Account and copy and paste your API key

2017-04-27_12-06-03.png

To find your Project Token:

1. Go back to your project.

2. Go into the "Settings" tab of the project

3. Copy and paste your project token from this page