Trillian/ICQ/MSN Instant Messaging Log Merger by zAlbee

IMmerge 1.05 – Improved Timestamps, Group Chats

December 18, 2011

IMmerge 1.05 is now available for download. This release further improves the accuracy of parsing plain text logs, which is useful when converting to an XML format for Trillian’s Activity History Viewer. It also adds the option of specifying a custom timestamp, in case you don’t use the default [hh:mm] timestamp given by Trillian (or its variants with AM/PM and/or seconds). Lastly, it fixes several parsing issues with group chats and some other bugs.

Improved Message Recognition Using Timestamps

Say you are having a conversation with a buddy, and in that conversation, you copy/paste a transcript of a previous chat. Well, if the chat is saved in a plain-text format, it may be difficult to tell the difference. Here’s an example:

Session Start (User:User2): Mon Nov 07 06:00:00 2011
[06:00] User2: Hello.
[06:01] User: Hi.
[06:01] User: Can you repeat back the beginning of this chat to me?
[06:02] User2: OK. Now I will copy/paste the previous messages.
[06:00] User2: Hello.
[06:01] User: Hi.
[06:01] User: Can you repeat back the beginning of this chat to me?
[06:03] User: Thank you!
Session Close (User2): Mon Nov 07 07:00:00 2011

This transcript looks like there are 8 total messages at first glance, but there are actually only 5 true messages. User2 has copy/pasted the first 3 messages and included them all in the 4th message (spanning 4 lines). Even a human has to look closely to figure this out, so what hope does a machine have? Well, if the log is timestamped, we can infer a copy/paste happened because the chat seemingly went backwards in time between message #4 [06:02] and message #5 [06:00], which would be impossible. (Thanks to Stefan M. for pointing this out.) This logic is now built into IMmerge 1.05, and here is the result of applying it to the above text log:

Converted chat log

As you can see, IMmerge correctly identifies which message was sent by which user at what time, including the part that was copy/pasted! Of course, this is not 100% foolproof. There are valid cases where a timestamp is not really going “backwards”, but is actually seen because the chat went past midnight and we have moved onto a new day. Fortunately, IMmerge already handles this, but previous versions actually applied it too often. Version 1.05 is much improved and will first eliminate all impossible options before making a decision, or prompting the user to choose if the choice is ambiguous.

Thanks to all the people that reported bugs and helped test. As always, please feel free to email me with any suggestions or concerns!


Filed under: Algorithms,IMmerge v1,Release | No Tag
No Tag
December 18th, 2011 01:10:51

IMmerge 1.03 – Smarter Display Name Resolution

March 10, 2011

Today, I’d like to announce the release of IMmerge 1.03. Several important bug fixes are included (thanks to all who reported them!), so I highly recommend you download the new version. Aside from that, this release mainly improves the accuracy and usability of the display name resolution. The number of times where IMmerge now asks you for confirmation is greatly reduced compared to 1.02. In the rest of this post, I will share with you why display name resolution is needed in the first place, and how IMmerge solves the problem.

Names Without an Identity

Display name resolution is mainly needed when IMmerge converts from one log format to another. If you are simply merging one type of log and don’t need conversion, there’s no issue! IMmerge will copy over the information exactly, without caring which part is someone’s display name and which part is what they said. But when format conversion is needed, IMmerge needs to parse the log and understand it. Most plain-text formats like Trillian LOG, and even some XML formats like Windows Live Messenger (MSN), only log the person’s name next to their message, and not the user ID. However, other log formats might want to know who really sent the message (e.g. to colour code the messages, like Trillian Pro), and some formats (such as ICQ) only store whether a message is incoming or outgoing, and do not store the display name at all! So to do the conversion properly, we need a reliable way to match each display name to its user ID.

Back in 2007, I solved this problem for MSN using a heuristic algorithm that compared new names to previously seen names, with good success. When implementing Trillian LOG conversion in IMmerge 1.0, I re-used the same logic. Unfortunately, when run on real-world Trillian logs, it produced a lot of user prompts and many false positives. In IMmerge version 1.03, I have made several tweaks to the algorithm which  improves behaviour on Trillian logs.

The Problem with Plain Text

With Trillian plain-text logs, a single message looks like this:

Alice: Hi Bob!

MSN logs use an XML format, and it looks more like this (greatly simplified):


This takes a lot longer for a human to read, but the advantage of the structured XML format is that it delineates all the fields, so we are always sure which part is the display name and which part is the message. With Trillian logs, we need to guess based on context.


Filed under: Algorithms,IMmerge v1,Release | No Tag
No Tag
March 10th, 2011 05:30:10

IMmerge 1.02 released!

December 11, 2010

This version brings the new and improved CLI (command-line interface)! The CLI now gives you three ways to add folders — the legacy mode from IMmerge 0.66 (insert number of folders), the ability to specify each folder from a command-line argument, and an interactive menu system that will allow you to add, remove, or edit the folders before you merge. The CLI is now equally as powerful as the GUI.

New menu system

If you had difficulty with IMmerge asking too many questions about display names in 1.0, I have fixed the now global option for Resolve Display Names: Automatically. You won’t be able to correct IMmerge’s mistakes, so I suggest you only use it if you really can’t stand all the prompts. You can see how well it is doing in the log, whenever it prints “IMmerge thinks <name xyz> is <user>”. I’ll be working on better intelligence for the next version, a non-trivial task!

Get it from the downloads.


Filed under: IMmerge v1,Release | No Tag
No Tag
December 11th, 2010 14:20:43

IMmerge 1.01 Released

September 22, 2010

This is a quick maintenance release to fix a reported bug and a missing feature.

  • Skype added to folders recognized under Trillian logs
  • Fixed string index out of range in TrLogReader.parseMessage

Also, I am dropping the “beta #” naming scheme. Grab it from the downloads.


Filed under: IMmerge v1,Release | No Tag
No Tag
September 22nd, 2010 01:23:49

IMmerge 1.0 Beta 1 Released!

July 8, 2010

After a very long wait, I’m happy to release the first beta of IMmerge 1.0! This is the first public release with the new graphical user interface. If you have hesitated to use IMmerge in the past because of the command line, please give this one a try!

IMmerge window

This beta includes the following new features (for a complete list, see the Readme file):

  • New GUI and New engine
  • Trillian XML <-> LOG conversion
  • Nickname suggestions for those cryptic ICQ user IDs
  • Updated display name detection, with support for Trillian logs

This beta removes the following features, due to not enough time to test:

  • Command-line interface (CLI)

Most of the features above have been discussed in a previous post.

Nickname suggestion window

Don't recognize your friend's UIN? IMmerge makes it easy.

Please keep in mind this is a beta version, so I recommend you make a backup of any files before you replace them. It’s been tested quite a bit, but there will always be something that I didn’t catch. There are also some known issues — so if you are insistent about having perfect logs, then I suggest keeping a backup handy.

There is one major issue I know of where I would suggest waiting for a fixed version:

If you used versions of Trillian OLDER than 4.0 (including 0.74 — 3.1) in MORE THAN ONE timezone, and you run IMmerge on these logs, then you will get duplicates in your logs in the wrong timezone. Logs from Trillian 4.0 beta (later than build 90, 2008-11-21), Trillian 4.0 FINAL and later are unaffected.

This is due to a bug in older versions of Trillian, where timezone information is not logged. Note that only those sessions that differ from the Timezone setting in IMmerge will be duplicated (and only duplicated once). If you are OK with that, then go ahead, but do make a backup.

Finally, feel free to leave your feedback and suggestions here in the comments, in the forum or by email. Thanks go to all the people who have emailed me in the past; you have definitely helped solve a great deal of bugs!


Filed under: IMmerge v1,Release | No Tag
No Tag
July 08th, 2010 04:53:13