Today, I’d like to announce the release of IMmerge 1.03. Several important bug fixes are included (thanks to all who reported them!), so I highly recommend you download the new version. Aside from that, this release mainly improves the accuracy and usability of the display name resolution. The number of times where IMmerge now asks you for confirmation is greatly reduced compared to 1.02. In the rest of this post, I will share with you why display name resolution is needed in the first place, and how IMmerge solves the problem.
Names Without an Identity
Display name resolution is mainly needed when IMmerge converts from one log format to another. If you are simply merging one type of log and don’t need conversion, there’s no issue! IMmerge will copy over the information exactly, without caring which part is someone’s display name and which part is what they said. But when format conversion is needed, IMmerge needs to parse the log and understand it. Most plain-text formats like Trillian LOG, and even some XML formats like Windows Live Messenger (MSN), only log the person’s name next to their message, and not the user ID. However, other log formats might want to know who really sent the message (e.g. to colour code the messages, like Trillian Pro), and some formats (such as ICQ) only store whether a message is incoming or outgoing, and do not store the display name at all! So to do the conversion properly, we need a reliable way to match each display name to its user ID.
Back in 2007, I solved this problem for MSN using a heuristic algorithm that compared new names to previously seen names, with good success. When implementing Trillian LOG conversion in IMmerge 1.0, I re-used the same logic. Unfortunately, when run on real-world Trillian logs, it produced a lot of user prompts and many false positives. In IMmerge version 1.03, I have made several tweaks to the algorithm which improves behaviour on Trillian logs.
The Problem with Plain Text
With Trillian plain-text logs, a single message looks like this:
Alice: Hi Bob!
MSN logs use an XML format, and it looks more like this (greatly simplified):
<Message>
<From>Alice</From>
<To>Bob</To>
<Text>Hi!</Text>
</Message>
This takes a lot longer for a human to read, but the advantage of the structured XML format is that it delineates all the fields, so we are always sure which part is the display name and which part is the message. With Trillian logs, we need to guess based on context.
(more…)