Extract data: A comprehensive PRO guide
PRO Server offers flexible extraction constructs to extract data or fields from both emails AND/OR attachments.
This post will explore how you can setup the PRO Server to extract data using the extraction wizard (Option-1) or using regular expressions (Option-2).
Use the Global Filters tab to extract data
First off, to extract data you MUST use the Global Filters tab. This allows you to create one or more rule filters that will then be used for extraction.
Each rule is run for EVERY attachment (any type, converted to text before extraction occurs) or email body (saved as .eml or .txt/.html).
In a rule, you can specify what to look for in the email (like matching From, To, Subject etc.) and then parameters to extract one or more data into fields as will be explained below.
FIELDS: Extract data into fields {..}
Fields are what you as a user can create in the Rule that then can be used in various actions.
Extracting data in the PRO Server is performed by first specifying a field name and then providing various parameters in the Rule configuration such as:
- WHAT to extract by specifying what comes before or after the piece of data you want to extract
- HOW to extract it by specifying whether to remove whitespaces or convert it into a number etc. as will be shown
- Specifying a unique FIELD NAME (between {}) where the extracted data will be “stored”. For example: {ORDER_ID}.
FIRST: Specify the matching filters
The very FIRST STEP in extraction is to create or edit a rule in the Global Filters tab and then specify the Filters that helps narrow down the type of email we want to extract data from.
In there you MUST specify the specific email filters to select the kinds of emails you are looking to extract data out of.
As shown above we’ve used a matching Subject and a From that matches a certain address.
SECOND: Two ways to extract data in a rule…
There are two ways to extract data inside of a rule as will be explained below.
When you create or edit a rule in the Global Filters tab, you can start out by specifying Filters and then have two ways to perform extraction as shown in the below screenshot.
- Filter extraction with wizard (recommended)
- Filter extraction with regular expression
We will explore each option in this post below.
Option-1: Filter extraction with wizard
Create a new extraction by selecting
Add -> Extractions -> Field extraction with wizard
as was shown earlier.
You now have the option to name your field. A random name has been automatically selected, feel free to change this to something more meaningful.
For the purposes of this tutorial we will use {ORDER_ID}. The field name MUST be within braces {}. Then select the Source from the dropdown option as to where you want to extract data from.
NOTE: This dropdown does not include the attachment itself, we will cover this later below.
We will choose the EMAIL_SUBJECT as the source for this tutorial.
Click on the Extraction wizard button and follow these steps:
- Copy Sample Text: Once the dialog opens, lets start off by copying some text that may be a sample “Subject” of the email into the Test data area.
- Search tab: Specify here the text that may PRECEDE the actual data we want to extract. In our test data, we have Invoice # as a good prefix. So let’s use that.
PRO TIP
Latest builds now support .NET regular expressions in the following fields:– Search raw text field
– Followed by (optional)This allows you to specify more than one search term, for example, specifying (Invoices|Receipts) in this text including parentheses will search for Invoices or Receipts as the text to search until before extracting data after it.
- Extract tab: Since what follows after the invoice is a ‘-‘ hyphen, we will choose the Until any of these option and then specify the hyphen as the character to stop extracting data as shown below. You can also see that the test data now highlights the invoice number.
- Transform tab: From our previous step you can see that we have an extra space towards the end of the invoice number. This tab lets you transform the extracted data in many ways. Since we just want to remove the space, we will check the Remove leading and trailing spaces option. Feel free to experiment with the various options present here. You can also convert the extracted data from a String to an Integer or Decimal in the Convert to type dropdown.
- Validate tab: In here you can specify any validations. Since we absolutely want an invoice number to be present before we take further actions like saving the email or attachments in the rule, we will choose the No blank or empty extractions option.
- On Failure tab: When an extraction fails validation (based on what we specified in the validation tab), we want to skip this rule and proceed further.
- Click Save
Now we have completely set the parameters on how we want to extract the data.
Feel free to change the Test data and click the Run Test button to run tests. The Status below will show what was extracted. As shown above it says OK: 7265537 and that means 7265537 was what was extracted based on the Test data which is what we want.
Extract data from email attachments
Extracting data from email attachments is very similar to what has been described above with Option-1, except instead of choosing
Add -> Extractions -> Field extraction with wizard
you will be creating the field extraction in the Actions section as shown below:
Make sure that the ORDER at which the extraction occurs is correct. Any actions that are run BELOW the extraction will have the extracted field.
If you have any questions about this, please contact support@gearmage.com for help.
Option-2: Filter extraction with regular expressions
This is for advanced uses only. You MUST be well-versed with using regular expressions and also using regular expressions based replacement.
READ THIS FIRST: Regular Expressions format that is followed in this post.
Extract the Invoice # from the SUBJECT
Since we want to extract the Invoice # for further invoice processing.
With this post we will explore extracting data for invoices from the email using regular expressions. In a future post, we can walk you through how to use the Extraction Wizard if needed.
Since the Invoice # is in the Subject of the email, we will add a new extraction to the new rule we created as follows. Click on the Add dropdown -> Extractions -> Field extraction with regular expressions option as shown below.
Rename the Extraction field to something meaningful like “{INVOICE_NUMBER}” as shown below and include a regular expression to extract text.
- In the Pattern field enter ^.*Invoice #(\d+)([a-z| ]*)$. If you notice symbols like ^, $ and \d and * are indicators to what to look for in the Subject. ^ implies start of the line and $ is the end of the line. The .* means include any number of characters followed by Invoice #. Since what follows Invoice # is a number we include (\d+). This becomes our first extraction field (which we will use as the Replace text). Following that could include any number of characters or spaces ([a-z]| ]*) followed by the end of the line $.
- In the Replace textbox, specify $1 which is the first match between the parentheses () we have specified in the Pattern, in this case (\d+).
Feel free to contact support if you have a different pattern to look for and we can help you craft this as needed. Refer to the Microsoft’s regular expression reference here for more.
Then you can Test the result out as shown above, if you notice TEST SUCCESS includes our extracted Invoice #. This will save with the invoice number for invoice processing into the field name.
Finally: Uses of the extracted data
Now you can use the {ORDER_ID} or {INVOICE_NUMBER} fields we extracted in many places inside THIS rule such as:
- In the Save filename format as a Folder or part of the filename. If you want it as a folder just separate with a ‘\’ character as shown below. Click the little (i) button next to the Save filename format for a complete list of fields that can be inserted here.
- In the Send email action — in the Subject/To/CC/BCC or even the BODY you can specify the field like {ORDER_ID} and it will be replaced.
- In the Copy or rename file action in the filename format fields
- In the Move or copy email action (note: in this case the mailbox folder MUST already exist, the program will not auto-create it)
- In the Document conversion action in the filename/format where the converted document should be saved
- In the Save to database Or Extract csv and save to DB action inside the SQL query.
- In the Run script or Run command actions as arguments
and many more.
Conclusion: Extract data
We hope you found this post useful in extracting data from emails or attachments.
We highly recommend that you read the below TIPS and TRICKS post for a guide to other features in the product where you can use extracted data.
IMPORTANT: Further reading TIPS and TRICKS (READ THIS) https://gearmage.com/blog/2017/07/20/archive-emails-folders-tips-tricks/
As always, please contact us at support@gearmage.com with any questions or suggestions on this feature.
One thought on “Extract data from emails and attachments – an in-depth guide”
Comments are closed.