There are several ways that you can clean your data in ParseHub prior to extracting it. Most of these use simple JavaScript functions and methods and may require a little bit of background knowledge on how these work.
Replacing elements
The .replace()
method allows you to replace elements in your text. It takes two parameters: the first is the text that you would like to replace and the second is the text you would like to replace it with.
For example, if I wanted to replace the text "apple" with the text "banana", I would use the method .replace("apple","banana")
.
In general this will replace just the first occurrence of the word "apple", so if you wanted to replace all instances of the word "apple" with "banana" you could use the method .replace(/apple/g,"banana")
which uses regex to find every occurrence of the word "apple" with the "g" standing for "global".
One frequent example occurs with data spread over multiple lines or on bullet points. If you try to extract multiple lines of data you will often see a "\n" that appears in your text - this signifies a "new line" in HTML.
You can replace all of these by using the method the method .replace(/\n/g," ")
which replaces the "\" with whatever is between the quotation marks - " "
would add a space and " || "
would add two pipe symbols, for example.
Extracting specific parts of an element
You can use regex on ParseHub to extract specific elements in your data. For example, if your data is extracting a "mailto:" or "Tel:" before a email address or phone number, you can use an Extract command, click on the "Use regex" checkbox and enter your regex in the box below to remove that text and only extract the email address. To extract data after a specific character (e.g. "mailto:", "Tel:"... etc.) you can enter that word followed by (.*) which indicates that you want to scrape everything after it, for example: mailto:(.*)
or Tel: (.*)
You can find a more comprehensive article on using regex on ParseHub here.
Regex can be relatively complex but there are websites like Debuggex that can help you find the right regular expression for what you're trying to achieve. This cheat sheet might come in handy.
Adding your own values
You can add your own value to an Extract command by including it between quotation marks in the box of the Extract command.
You can even add in numbers or perform calculations:
Adding the date or time
You can also add in date and time information for your data, including timestamps to see when each row was extracted. For example, adding ddddd to an Extract command will get a number for the date, as the number of milliseconds since January 1, 1970, 00:00:00 UTC.
This tutorial has more details about extracting dates and you can find a comprehensive list of the different functions you can use in the DateJS library.
Combining values
You can use the + sign in an Extract command to combine the data. For example, if I want to add text to my title, I can do so by adding a + sign and include it between quotation marks:
You can even combine data from previous Select or Extract commands by including the name of those commands in your Extract command. In the case below I've added in two pipe symbols to separate them.
Similarly, you can also perform calculations on any numbers in your data by combining your extracted data with functions (although generally these types of calculations are best left for post-processing). For example, if I select some nutritional values and their information in grams:
I can convert those numbers into another measure like kilograms by:
- Using JavaScript's parseFloat() function to covert it into a number
- Dividing by 1000 to convert from grams into kilograms
The result will look like this:
parseFloat()
I can even add on the text "kg" at the end using the + sign in the same or a new Extract command (Note how if I have two Extract commands with the same name, the data that appears in the field with that name is taken from the last command - each one overrides the last):
parseFloat()