Move Mail into Monthly IMAP4 Folders with Python

Posted on December 6, 2017  (Last modified on December 13, 2022 )
4 minutes  • 773 words  • Other languages:  Deutsch

Recently, I had to cope with a very large mail archive in a company – the email account collected all kinds of automatic messages and status reports sent by mail which have to be archived due to legal reasons. Until now, an employee moved about 40,000 mails by hand each month. It is impressive that Firefox has no problems with such numbers, but it takes quite a while, and it is pretty boring to watch the program do this while I could do better things in the meantime.

So I searched the web for a few hints and wrote a small Python script that will log in via IMAP and move mails to the correct monthly folder.

Configuration File

First, create a configuration file, e.g. my_domain.com.ini. It should contain something along these lines:

[server]
hostname: imap.server.com
 
[account]
username: login
password: password

Fill in the hostname of you IMAP server, and include username and password for the account. The script below will try to log in via TLS/SSL. If you want to use an insecure connection, you will have to change a few lines.

The Script

The script imap_folder_per_month.py is shown here:

#!/usr/bin/python3

import configparser
import datetime
import email.utils
import imaplib
import os
import re
import sys

list_response_pattern = re.compile(r'\((?P<flags>.*?)\) "(?P<delimiter>.*)" (?P<name>.*)')


def parse_list_response(line):
    flags, delimiter, mailbox_name = list_response_pattern.match(line.decode()).groups()
    mailbox_name = mailbox_name.strip('"')
    return (flags, delimiter, mailbox_name)


def open_connection(config_file, verbose=False):
    # Read the config file
    config = configparser.ConfigParser()
    config.read([os.path.expanduser(config_file)])

    # Connect to the server
    hostname = config.get('server', 'hostname')
    if verbose:
        print('Connecting to', hostname)
    connection = imaplib.IMAP4_SSL(hostname)

    # Login to our account
    username = config.get('account', 'username')
    password = config.get('account', 'password')
    if verbose:
        print('Logging in as', username)
    connection.login(username, password)
    return connection


if __name__ == '__main__':
    if len(sys.argv) < 2:
        print('Pass the name of the config file as first parameter, please!')
        sys.exit(-1)
    c = open_connection(sys.argv[1], verbose=True)
    try:
        # List of Mailboxes
        mailboxes = []
        typ, data = c.list()
        for line in data:
            flags, delimiter, mailbox_name = parse_list_response(line)
            mailboxes.append(mailbox_name)
            # print 'Parsed response:', (flags, delimiter, mailbox_name)

        # pprint.pprint(mailboxes)
        print(mailboxes)

        # get all messages from inbox
        typ, data = c.select('INBOX')
        num_msgs = int(data[0])
        print('There are %d messages in INBOX' % num_msgs)

        typ, msg_ids = c.search(None, 'ALL')
        msg_ids = msg_ids[0].decode()
        if msg_ids == '':
            msg_ids = []
        else:
            msg_ids = msg_ids.split(' ')[::-1]

        for msg_id in msg_ids:
            typ, msg_data = c.fetch(msg_id, '(RFC822)')
            msg = email.message_from_string(msg_data[0][1].decode())
            date_tuple = email.utils.parsedate_tz(msg['Date'])
            # pprint.pprint(date_tuple)

            # Convert to date string
            local_date = datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
            to_mailbox = 'INBOX/' + local_date.strftime("%Y-%m")

            print('Parsing #' + msg_id + ' ' + local_date.strftime("%Y-%m-%d"))

            if to_mailbox not in mailboxes:
                print(to_mailbox + ' created')
                typ, create_response = c.create(to_mailbox)
                mailboxes.append(to_mailbox)

            # Move message to mailbox
            c.copy(msg_id, to_mailbox)
            c.store(msg_id, '+FLAGS', r'(\Deleted)')

        # Clear
        c.expunge()
    finally:
        c.close()
        c.logout()

The script reads the name of the configuration file as first parameter of the command line and fetches the login data. It will then attempt to log in (the script will fail in an ugly way if something goes wrong).

In the main part, the script will fetch the folders within the INBOX. We do this now, because we want to know which folders have been created already (we need sub folders named 2017-10, 2017-11, 2017-12 etc.). Then the mails of the INBOX will be read using the highest number first (to cope with incoming mails while we are at it).

We will try to read the local date from each mail, considering time zone shifts and adjusting time, if needed (instead of local time, you could use UTC, of course). So, we have a local time for each mail in the end and can now move it to the right folder.

The script will check for existing folders and create non-existent ones. Finally, the mail will be copied and deleted from the INBOX. After all mails have been checked, the postbox will be emtied (expunge in IMAP).

A call of python3 imap_folder_per_month.py my_domain.com.ini should be sufficient to start the script. It is possible that you system does not have all modules installed yet. You can do this by installing them via pip install imaplib, for example. Please check Python tutorials of you operating system and Python installer to see how to do it.

Automatic Movement of Mails with Cron

Once you have checked everything works, you can create a cronjob to move your mails regularly. Depending on how many mails the account receives, it makes sense to do this hourly or even minutely. Using crontab -e on a Unix like system should work:

*/15 * * * * /path/to/imap_folder_per_month.py /path/to/my_domain.com.ini > /dev/null

In this example, the script will be called every 15 minutes. You have to adjust the path names, naturally.

Finally, no more mail monster movements each month!

By logging in into comments, two cookies will be set! More information in the imprint.
Follow me