Skip to content
Gearmage Blog

Gearmage Blog

Email Processing Tools and Solutions

  • About
  • Products
    • Mail Attachment Downloader Free
    • Mail Attachment Downloader PRO Client
    • Mail Attachment Downloader PRO Server
    • Mail Attachment Downloader SDK
  • Partner
  • Blog
  • Contact
  • About
  • Products
  • Partner
  • Blog
  • Contact
Gearmage Blog

Tag: extract data from email

Extract data from emails and attachments – an in-depth guide

Extract data: A comprehensive PRO guide

PRO Server offers  flexible extraction constructs to extract data or fields from both emails AND/OR attachments.

This post will explore how you can setup the PRO Server to extract data using the extraction wizard (Option-1) or using regular expressions (Option-2).

Use the Global Filters tab to extract data

First off, to extract data you MUST use the Global Filters tab. This allows you to create one or more rule filters that will then be used for extraction.

Each rule is run for EVERY attachment (any type, converted to text before extraction occurs) or email body (saved as .eml or .txt/.html).

In a rule, you can specify what to look for in the email (like matching From, To, Subject etc.) and  then parameters to extract one or more data into fields as will be explained below.


FIELDS: Extract data into fields {..}

Fields are what you as a user can create in the Rule that then can be used in various actions.

Extracting data in the PRO Server is performed by first specifying a field name and then providing various parameters in the Rule configuration such as:

  1. WHAT to extract by specifying what comes before or after the piece of data you want to extract
  2. HOW to extract it by specifying whether to remove whitespaces or convert it into a number etc. as will be shown
  3. Specifying a unique FIELD NAME (between {})  where the extracted data will be “stored”. For example: {ORDER_ID}.

FIRST: Specify the matching filters

The very FIRST STEP in extraction is to create or edit a rule in the Global Filters tab and then specify the Filters that helps narrow down the type of email we want to extract data from.

In there you MUST specify the specific email filters to select the kinds of emails you are looking to extract data out of.

exrtact data - pick the email

As shown above we’ve used a matching Subject and a From that matches a certain address.

SECOND: Two ways to extract data in a rule…

There are two ways to extract data inside of a rule as will be explained below.

When you create or edit a rule in the Global Filters tab, you can start out by specifying Filters and then have two ways to perform extraction as shown in the below screenshot.

  1. Filter extraction with wizard (recommended)
  2. Filter extraction with regular expression

We will explore each option in this post below.

extract data - two ways


Option-1: Filter extraction with wizard

Create a new extraction by selecting
Add -> Extractions -> Field extraction with wizard
as was shown earlier.

You now have the option to name your field. A random name has been automatically selected, feel free to change this to something more meaningful.

For the purposes of this tutorial we will use {ORDER_ID}. The field name MUST be within braces {}. Then select the Source from the dropdown option as to where you want to extract data from.

extract data - select source

NOTE: This dropdown does not include the attachment itself, we will cover this later below.

We will choose the EMAIL_SUBJECT as the source for this tutorial.

Click on the Extraction wizard button and follow these steps:

  1. Copy Sample Text: Once the dialog opens, lets start off by copying some text that may be a sample “Subject” of the email into the Test data area. extract data - sample subject
  2. Search tab: Specify here the text that may PRECEDE the actual data we want to extract. In our test data, we have Invoice # as a good prefix. So let’s use that. extract data - specify Search text

    PRO TIP

    Latest builds now support .NET regular expressions in the following fields:

    – Search raw text field
    – Followed by (optional)

    This allows you to specify more than one search term, for example, specifying (Invoices|Receipts) in this text including parentheses will search for Invoices or Receipts as the text to search until before extracting data after it.

  3. Extract tab: Since what follows after the invoice is a ‘-‘ hyphen, we will choose the Until any of these option and then specify the hyphen as the character to stop extracting data as shown below. extract data - extract tabYou can also see that the test data now highlights the invoice number.
  4. Transform tab: From our previous step you can see that we have an extra space towards the end of the invoice number. This tab lets you transform the extracted data in many ways. Since we just want to remove the space, we will check the Remove leading and trailing spaces option. Feel free to experiment with the various options present here. extract data - transform tabYou can also convert the extracted data from a String to an Integer or Decimal in the Convert to type dropdown.
  5. Validate tab: In here you can specify any validations. Since we absolutely want an invoice number to be present before we take further actions like saving the email or attachments in the rule, we will choose the No blank or empty extractions option.extract data - validate data
  6. On Failure tab: When an extraction fails validation (based on what we specified in the validation tab), we want to skip this rule and proceed further.extract data - on failure
  7. Click Save

Now we have completely set the parameters on how we want to extract the data.

Feel free to change the Test data and click the Run Test button to run tests. The Status below will show what was extracted. As shown above it says OK: 7265537 and that means 7265537 was what was extracted based on the Test data which is what we want.

Extract data from email attachments

Extracting data from email attachments is very similar to what has been described above with Option-1, except instead of choosing
Add -> Extractions -> Field extraction with wizard

you will be creating the field extraction in the Actions section as shown below:

extract data - from attachments

Make sure that the ORDER at which the extraction occurs is correct. Any actions that are run BELOW the extraction will have the extracted field.

If you have any questions about this, please contact support@gearmage.com for help.


Option-2: Filter extraction with regular expressions

This is for advanced uses only. You MUST be well-versed with using regular expressions and also using regular expressions based replacement.

READ THIS FIRST: Regular Expressions format that is followed in this post.

Extract the Invoice # from the SUBJECT

Since we want to extract the Invoice # for further invoice processing.

With this post we will explore extracting data for invoices from the email using regular expressions. In a future post, we can walk you through how to use the Extraction Wizard if needed.

Since the Invoice # is in the Subject of the email, we will add a new extraction to the new rule we created as follows. Click on the Add dropdown -> Extractions -> Field extraction with regular expressions option as shown below.

extract data - using regex

Rename the Extraction field to something meaningful like “{INVOICE_NUMBER}” as shown below and include a regular expression to extract text.

Extract Invoice

  • In the Pattern field enter ^.*Invoice #(\d+)([a-z| ]*)$. If you notice symbols like ^, $ and \d and * are indicators to what to look for in the Subject. ^ implies start of the line and $ is the end of the line.  The .* means include any number of characters followed by Invoice #. Since what follows Invoice # is a number we include (\d+). This becomes our first extraction field (which we will use as the Replace text). Following that could include any number of characters or spaces ([a-z]| ]*) followed by the end of the line $.
  • In the Replace textbox, specify $1 which is the first match between the parentheses () we have specified in the Pattern, in this case (\d+).

Feel free to contact support if you have a different pattern to look for and we can help you craft this as needed. Refer to the Microsoft’s regular expression reference here for more.

Then you can Test the result out as shown above, if you notice TEST SUCCESS includes our extracted Invoice #. This will save with the invoice number for invoice processing into the field name.


Finally: Uses of the extracted data

Now you can use the {ORDER_ID} or {INVOICE_NUMBER} fields we extracted in many places inside THIS rule such as:

  • In the Save filename format as a Folder or part of the filename. If you want it as a folder just separate with a ‘\’ character as shown below. Click the little (i) button next to the Save filename format for a complete list of fields that can be inserted here.extract data - use in save filename format
  • In the Send email action — in the Subject/To/CC/BCC or even the BODY you can specify the field like {ORDER_ID} and it will be replaced. 
  • In the Copy or rename file action in the filename format fields
  • In the Move or copy email action (note: in this case the mailbox folder MUST already exist, the program will not auto-create it)
  • In the Document conversion action in the filename/format where the converted document should be saved
  • In the Save to database Or Extract csv and save to DB action inside the SQL query.
  • In the Run script or Run command actions as arguments

and many more.


Conclusion: Extract data

We hope you found this post useful in extracting data from emails or attachments.

We highly recommend that you read the below TIPS and TRICKS post for a guide to other features in the product where you can use extracted data.

IMPORTANT: Further reading TIPS and TRICKS (READ THIS) https://gearmage.com/blog/2017/07/20/archive-emails-folders-tips-tricks/

As always, please contact us at support@gearmage.com with any questions or suggestions on this feature.

Author gearmagePosted on November 21, 2017June 26, 2019Categories Mail Attachment Downloader, TipsTags extract data, extract data from email, extract data from email attachments, transform data1 Comment on Extract data from emails and attachments – an in-depth guide

Download emails for invoice processing

Download emails for invoice processing

When using invoice processing software, there are cases where you want to download emails and email attachments to a folder in a specific format so the invoice processing software can pick them up.

There are two paths to solving this today. One approach make use of solutions that require you to have an email client installed and offer limited functionality. Then there are tools that are more complex taking months to configure, test and deploy.

Mail Attachment Downloader PRO Server offers a sweet spot by offering flexible tooling for this purpose whilst making it easy to customize it for your specific needs. As you read on, we’ll show how this product can do more than just saving attachments to a folder or a share if your business needs it.

Let’s consider the case where you receive many invoices via email with (or without) attachments that then need to be saved for later consumption by invoice processing software.

Once the data is saved in a folder or share, you can either have the PRO Server save the invoice details into specific database fields OR run a custom script to hand it off to your line-of-business application OR have the invoice processing software monitor the folder and pick up the invoices and process them through the workflow. Also, the share could serve as a secondary backup for your invoices you’ve received.

You can also configure the program to save not only the invoices, but the email text or body as well very easily by just clicking on the ‘Save email text’ or ‘Save as .eml’ checkbox in the Rule filter screen (see below screenshots). This could come in handy if the invoice is in the body of the email itself (no attachment in the email, that is).

Also, let’s say that the  Subject  of that email is something like “Your Invoice #853290 is ready“. And we want to store this attachment under the folder of the Invoice # and use the filename that was sent prefixed by who it was From (from email address).

We will use the Mail Attachment Downloader PRO Server version prior to submitting the data to our invoice processing software that enables a LOT MORE flexibility in how you want the attachment named  via a template mechanism as will be shown below.

Create a Rule Filter

First, since the PRO Server version lets you create many filters, let’s create one filter for Invoices.

Click on the Global Filters tab and the Add new filter button as shown below.

Enter the Rule filter name as something meaningful, we’ve called it ‘Invoices‘ as shown below.

Filter by SUBJECT

Second, since we want to only look at emails with a subject that contains the Invoice #, we need to add a filter to look for those emails. Click on the Add dropdown arrow -> Filters -> Header: Subject as shown below.

Since we know the format of the subject at-least has “Invoice #” we are going to first include that.

Subject Filter entered

Extract the Invoice # from the SUBJECT

Third, since we want to extract the Invoice # for further invoice processing.

→ PRO Server also supports using the ExtractionWizard if you are not familiar with Regular Expressions. Makes it a lot easier to configure extractions this way.

Highly recommended that you go read this blog entry instead of the below: https://gearmage.com/blog/extract-data-emails-attachments/

With this post we will explore extracting data for invoices from the email using regular expressions. In a future post, we can walk you through how to use the Extraction Wizard if needed.

Since the Invoice # is in the Subject of the email, we will add a new extraction to the new rule we created as follows. Click on the Add dropdown -> Extractions -> Regex Extraction.

Extraction

Rename the Extraction field to something meaningful like “{INVOICE_NUMBER}” as shown below and include a regular expression to extract text.

Extract Invoice

  • In the Pattern field enter ^.*Invoice #(\d+)([a-z| ]*)$. If you notice symbols like ^, $ and \d and * are indicators to what to look for in the Subject. ^ implies start of the line and $ is the end of the line.  The .* means include any number of characters followed by Invoice #. Since what follows Invoice # is a number we include (\d+). This becomes our first extraction field (which we will use as the Replace text). Following that could include any number of characters or spaces ([a-z]| ]*) followed by the end of the line $.
  • In the Replace textbox, specify $1 which is the first match between the parentheses () we have specified in the Pattern, in this case (\d+).

Feel free to contact support if you have a different pattern to look for and we can help you craft this as needed. Refer to the Microsoft’s regular expression reference here for more.

Then you can Test the result out as shown above, if you notice TEST SUCCESS includes our extracted Invoice #. This will save with the invoice number for invoice processing into the field name.

NOTE: As mentioned before, if you are not well versed with regular expressions, we recommend using the  Extraction Wizard instead.

And lastly, we will enter the save filename format as shown below:

{INVOICE_NUMBER}\{EMAIL_FROM}_{FILENAME}_{ID}{EXT}

The INVOICE_NUMBER field is the invoice number we extracted from the Subject. After which we include a \ to indicate this is a folder and then EMAIL_FROM which is the email address followed by the FILENAME, the ID of the email and EXT for the extension.

Save Format specified

Lastly, click the Save button which will save the rule out and then you are ready to try downloading a few attachments to see if the extractions worked as intended.

If you were wondering what other information you could include in the Save filename format field, click on the little ? button which will open up a help option for this as shown below. As you can see, there are very many options to choose from giving complete flexibility to you.

Save Format field options

If you find something that is missing in the features above, let us know.

Run download actions after save

You can then use any of the extracted fields along with any email specific fields (such as {FROM}, {TO} etc.) in the numerous download actions that are supported. This includes saving the data out to a database, saving to a .csv metadata file and many more.

Once the data is saved in these folders, you can trigger the invoice processing software via a script (if you have the PRO Server, you can run a script after saving the file to a folder) or folder monitoring as needed.

Once this is complete, if you have the PRO Server version you can now install the above as a Windows Service and not have to worry about having to log in to the box to process these attachments.

The Mail Attachment Downloader Free version also enables you to do some of the above but has the following limited features:

  • For personal use only
  • ID of the email in the filename. We highly recommend keeping the ID in the filename as if this is removed it may result in duplicate filenames that may result in overwriting of the filename.
  • Fixed formatting of filename to include email headers are supported in a specific non-flexible format in the FREE version. The PRO versions include much more flexibility in how filenames are formatted as was shown above.
    • From like ‘From (Bran E. <bran@gearmage.com>)’
    • To like ‘To  (Atul <atul@gearmage.com>)’
    • Subject like ‘Subject (Our invoice)’
    • Date like ‘Date(Wed, 24 Feb 2016 21_44_15 -0800)’
  • The ability to create a folder based who the email is from.
  • The ability to create a folder for every mailbox folder eg. “Inbox”.

While this is great, it does not allow us to be more prescriptive about how we want to store the attachments. So if you need more flexibility you should get the PRO or the PRO Server versions.

Further reading

Some interesting how-to links and posts for further exploration:

  • Save  emails to a database or even excel/csv from emails to a database
  • Save to multiple folders from multiple accounts using multiple email rules.
  • Send emails automatically after downloading files or attachments or body
  • Unzip files, Decrypt pdf’s and run script’s
  • Extract data and download emails for invoice processing
  • How email rules work in-depth

Contact us

Contact us if you have any questions, suggestions about the functionality described in this article.

Author gearmagePosted on March 21, 2016March 22, 2021Categories Mail Attachment Downloader, TipsTags download email invoice, email extractor, email to invoice, extract data from email, extract email invoice, invoice processing

Recent Posts

  • Customizing filenames or email data using fields
  • Support for modern authentication in Outlook-IMAP, Exchange and Google
  • Upgrade Mail Attachment Downloader PRO editions to unlock new functionality
  • Nesting email rules and actions
  • Why did my rule not download the attachment or email?
  • Extract data from emails and attachments – an in-depth guide
  • Archive email tips and tricks
  • New Filter Features in PRO Server v3.2 build 978
  • Save emails and attachments to folders
  • Save csv, excels to a database

Categories

  • General
  • Mail Attachment Downloader
    • Announcements
    • Tips
  • About
  • Products
    • Mail Attachment Downloader Free
    • Mail Attachment Downloader PRO Client
    • Mail Attachment Downloader PRO Server
    • Mail Attachment Downloader SDK
  • Partner
  • Blog
  • Contact
  • About
  • Products
  • Partner
  • Blog
  • Contact
Gearmage Blog Proudly powered by WordPress