Many websites use widgets like Google Maps on their pages to display data you want.
You should read this tutorial if you're trying to scrape:
- A site like Yellow Pages that gives maps for the locations of local stores
- Data with information about stores or sites all across the country
ParseHub can scrape this information, as long as it can be found in the HTML on the page. Before you go to scrape data, you should make sure that the data is there in the map.
Checking if a map has data in its HTML
You can right click on the map, or something nearby, in ParseHub's website tab and click on the Inspect Elements option.
This lets you look into the HTML in the background of the page, which is what ParseHub can scrape. Hovering over parts of the HTML will show you in the website screen what page of the page that HTML displays.
Before you start looking into the HTML, make sure that the data you need isn't available somewhere else on the page, or that you can't navigate to each results page with a map otherwise. If you can, you likely don't have to go through the hassle of selecting elements from the map.
Clicking on location pins in a map
If you can find a pin on a map by looking through the HTML in the page, ParseHub will be able to select it.
Some maps, like on this website, have pins that can just be selected. Try entering Johannesburg and selecting the pin on the first map that comes up.
If you zoom out with Browse Mode, you can see other store pins. Click on these too to train ParseHub to select each pin.
On sites where you can't just click to select the pin, even if it is in the HTML, you can try to use the keyboard shortcuts to zoom your selection. Hold ctrl/cmd and press 1 or 2 while hovering over your selection to zoom in and out through a stack of overlapping elements.
By selecting and clicking on a pin will often bring up information that you can easily scrape. Pay attention to the command flow here. Your project should look a bit like this:
Scraping information embedded in a map
Some information, like latitude and longitude, can only be found within the HTML attributes of the map elements, and isn't obviously displayed. You'll definitely need to be familiar with CSS or XPath selectors to get this data.
You need to make sure that you can find the information you want in the site's HTML...
Click on the green Edit button in the options of your selection, and choose Convert to XPath.
You can input XPath that looks something like this: //1[contains(@2,'3')]
Replace the 1 with the kind of tag. Replace 2 with a distinguishing attribute from the tag. Replace 3 with a unique part of the attribute.
For the HTML above, we can write //script[contains(text(),'var map')]
You may also have to use Regex to cut away unnecessary parts of the extraction. Read more about how to use Regex in our docs page here.
For the HTML above, we can use the expression lat: (.*) to get latitude and lng: (.*) to get longitude.