IMmerge
Trillian/ICQ/MSN Instant Messaging Log Merger by zAlbee

IMmerge 1.05 – Improved Timestamps, Group Chats

December 18, 2011

IMmerge 1.05 is now available for download. This release further improves the accuracy of parsing plain text logs, which is useful when converting to an XML format for Trillian’s Activity History Viewer. It also adds the option of specifying a custom timestamp, in case you don’t use the default [hh:mm] timestamp given by Trillian (or its variants with AM/PM and/or seconds). Lastly, it fixes several parsing issues with group chats and some other bugs.

Improved Message Recognition Using Timestamps

Say you are having a conversation with a buddy, and in that conversation, you copy/paste a transcript of a previous chat. Well, if the chat is saved in a plain-text format, it may be difficult to tell the difference. Here’s an example:

Session Start (User:User2): Mon Nov 07 06:00:00 2011
[06:00] User2: Hello.
[06:01] User: Hi.
[06:01] User: Can you repeat back the beginning of this chat to me?
[06:02] User2: OK. Now I will copy/paste the previous messages.
[06:00] User2: Hello.
[06:01] User: Hi.
[06:01] User: Can you repeat back the beginning of this chat to me?
[06:03] User: Thank you!
Session Close (User2): Mon Nov 07 07:00:00 2011

This transcript looks like there are 8 total messages at first glance, but there are actually only 5 true messages. User2 has copy/pasted the first 3 messages and included them all in the 4th message (spanning 4 lines). Even a human has to look closely to figure this out, so what hope does a machine have? Well, if the log is timestamped, we can infer a copy/paste happened because the chat seemingly went backwards in time between message #4 [06:02] and message #5 [06:00], which would be impossible. (Thanks to Stefan M. for pointing this out.) This logic is now built into IMmerge 1.05, and here is the result of applying it to the above text log:

Converted chat log

As you can see, IMmerge correctly identifies which message was sent by which user at what time, including the part that was copy/pasted! Of course, this is not 100% foolproof. There are valid cases where a timestamp is not really going “backwards”, but is actually seen because the chat went past midnight and we have moved onto a new day. Fortunately, IMmerge already handles this, but previous versions actually applied it too often. Version 1.05 is much improved and will first eliminate all impossible options before making a decision, or prompting the user to choose if the choice is ambiguous.

Thanks to all the people that reported bugs and helped test. As always, please feel free to email me with any suggestions or concerns!

-zAlbee


Filed under: Algorithms,IMmerge v1,Release | No Tag
No Tag
December 18th, 2011 01:10:51

IMmerge 1.03 – Smarter Display Name Resolution

March 10, 2011

Today, I’d like to announce the release of IMmerge 1.03. Several important bug fixes are included (thanks to all who reported them!), so I highly recommend you download the new version. Aside from that, this release mainly improves the accuracy and usability of the display name resolution. The number of times where IMmerge now asks you for confirmation is greatly reduced compared to 1.02. In the rest of this post, I will share with you why display name resolution is needed in the first place, and how IMmerge solves the problem.

Names Without an Identity

Display name resolution is mainly needed when IMmerge converts from one log format to another. If you are simply merging one type of log and don’t need conversion, there’s no issue! IMmerge will copy over the information exactly, without caring which part is someone’s display name and which part is what they said. But when format conversion is needed, IMmerge needs to parse the log and understand it. Most plain-text formats like Trillian LOG, and even some XML formats like Windows Live Messenger (MSN), only log the person’s name next to their message, and not the user ID. However, other log formats might want to know who really sent the message (e.g. to colour code the messages, like Trillian Pro), and some formats (such as ICQ) only store whether a message is incoming or outgoing, and do not store the display name at all! So to do the conversion properly, we need a reliable way to match each display name to its user ID.

Back in 2007, I solved this problem for MSN using a heuristic algorithm that compared new names to previously seen names, with good success. When implementing Trillian LOG conversion in IMmerge 1.0, I re-used the same logic. Unfortunately, when run on real-world Trillian logs, it produced a lot of user prompts and many false positives. In IMmerge version 1.03, I have made several tweaks to the algorithm which  improves behaviour on Trillian logs.

The Problem with Plain Text

With Trillian plain-text logs, a single message looks like this:

Alice: Hi Bob!

MSN logs use an XML format, and it looks more like this (greatly simplified):

<Message>
 <From>Alice</From>
 <To>Bob</To>
 <Text>Hi!</Text>
</Message>

This takes a lot longer for a human to read, but the advantage of the structured XML format is that it delineates all the fields, so we are always sure which part is the display name and which part is the message. With Trillian logs, we need to guess based on context.

(more…)


Filed under: Algorithms,IMmerge v1,Release | No Tag
No Tag
March 10th, 2011 05:30:10

Solving Continuity Levels with Dijkstra’s Shortest Path

April 16, 2010

This post is probably off-topic for this blog, but it is the only blog I use, so enjoy. (By the way, WordPress editor still greatly annoys me and so does this theme’s style. I’m going to have to fix the CSS eventually).

Continuity is a new Flash game that I was playing recently. It combines platforming and puzzle in a novel way – your character must jump from platform to platform collecting keys to open a door, but must do so inside mini-levels (squares). The macro-game is essentially an 8-puzzle (8 pieces in a 3×3 square grid), where you shift the puzzle pieces using the one empty space, such that your character can succeed in the mini-levels.

This game is perfectly solvable through logic and working backwards, though for some of the harder levels you need to keep a long memory of steps. In short, I became bored and impatient of solving the puzzles in my head, and decided after ~30 levels, it’d be more interesting to automate it by writing a solver.

I also took this opportunity to get familiar with Python, a language well known for its fast prototyping and easy learning curve. I have only worked in Python once, during a 4-month internship 4 years ago, but otherwise, my main skills are in C and Java. However the syntax required to write good algorithms and data structures in these “classic” languages can get quite long and annoying. So I coded up a Python script to solve the final two levels.

The rest of this post details how to develop the solution. The solution is relatively simple because we can just treat each exit in a square as a point in a directed graph, where each edge in the graph A->B specifies that point B is reachable from point A, and then apply a search algorithm through the graph. I chose Dijkstra’s shortest path algorithm, which runs fast and finds the best (shortest) solution. However, my program did require a lot of manual data entering to specify the paths/exits in each puzzle piece, so it wasn’t fully automatic. Perhaps Python has some magic that could automate this :). Lastly, I skipped solving the 8-puzzle, since it’s not necessary.

(more…)


Filed under: Algorithms,Games,Puzzles | No Tag
No Tag
April 16th, 2010 11:46:32