IMmerge
Trillian/ICQ/MSN Instant Messaging Log Merger by zAlbee

Pidgin Support Coming Soon

January 22, 2012

I started work on the Pidgin plugin for IMmerge. So far, I have been able to convert Pidgin text logs to Trillian successfully.

Here is the completion progress for the various parts:

  • Pidgin Txt Reader: 95%
  • Pidgin Txt Writer: 0%
  • Pidgin HTML Reader: 0%
  • Pidgin HTML Writer: 0%

How you can help:

While parsing is working on my own set of logs, there are certain cases I still don’t know about. You can help by providing me this information:

  1. Folder names for different protocols. So far I know about:
    • “icq”
    • “msn”
    • “jabber” (used for Google Talk as well)
    • “yahoo”
  2. Localization – Does Pidgin always log these strings in English, or does it translate them? If your system is using another language, simply search inside your logs to see if these have been translated or not.
    • <Name> entered the room.
    • <Name> left the room.
    • Offering to send <filename> to <Name>
    • <Name> is offering to send file <filename>
    • Starting transfer of <filename> from <Name>
    • Transfer of file <filename> complete
    • <UserID> is now known as <Name>
  3. Is it possible to customize (or remove) the logging timestamps? If so, in what way?

If you can answer any of these, just leave a comment or send me an email. Thanks!

-zAlbee

 


Filed under: IMmerge v1 | No Tag
No Tag
January 22nd, 2012 12:54:18

IMmerge 1.05 – Improved Timestamps, Group Chats

December 18, 2011

IMmerge 1.05 is now available for download. This release further improves the accuracy of parsing plain text logs, which is useful when converting to an XML format for Trillian’s Activity History Viewer. It also adds the option of specifying a custom timestamp, in case you don’t use the default [hh:mm] timestamp given by Trillian (or its variants with AM/PM and/or seconds). Lastly, it fixes several parsing issues with group chats and some other bugs.

Improved Message Recognition Using Timestamps

Say you are having a conversation with a buddy, and in that conversation, you copy/paste a transcript of a previous chat. Well, if the chat is saved in a plain-text format, it may be difficult to tell the difference. Here’s an example:

Session Start (User:User2): Mon Nov 07 06:00:00 2011
[06:00] User2: Hello.
[06:01] User: Hi.
[06:01] User: Can you repeat back the beginning of this chat to me?
[06:02] User2: OK. Now I will copy/paste the previous messages.
[06:00] User2: Hello.
[06:01] User: Hi.
[06:01] User: Can you repeat back the beginning of this chat to me?
[06:03] User: Thank you!
Session Close (User2): Mon Nov 07 07:00:00 2011

This transcript looks like there are 8 total messages at first glance, but there are actually only 5 true messages. User2 has copy/pasted the first 3 messages and included them all in the 4th message (spanning 4 lines). Even a human has to look closely to figure this out, so what hope does a machine have? Well, if the log is timestamped, we can infer a copy/paste happened because the chat seemingly went backwards in time between message #4 [06:02] and message #5 [06:00], which would be impossible. (Thanks to Stefan M. for pointing this out.) This logic is now built into IMmerge 1.05, and here is the result of applying it to the above text log:

Converted chat log

As you can see, IMmerge correctly identifies which message was sent by which user at what time, including the part that was copy/pasted! Of course, this is not 100% foolproof. There are valid cases where a timestamp is not really going “backwards”, but is actually seen because the chat went past midnight and we have moved onto a new day. Fortunately, IMmerge already handles this, but previous versions actually applied it too often. Version 1.05 is much improved and will first eliminate all impossible options before making a decision, or prompting the user to choose if the choice is ambiguous.

Thanks to all the people that reported bugs and helped test. As always, please feel free to email me with any suggestions or concerns!

-zAlbee


Filed under: Algorithms,IMmerge v1,Release | No Tag
No Tag
December 18th, 2011 01:10:51

IMmerge 1.03 – Smarter Display Name Resolution

March 10, 2011

Today, I’d like to announce the release of IMmerge 1.03. Several important bug fixes are included (thanks to all who reported them!), so I highly recommend you download the new version. Aside from that, this release mainly improves the accuracy and usability of the display name resolution. The number of times where IMmerge now asks you for confirmation is greatly reduced compared to 1.02. In the rest of this post, I will share with you why display name resolution is needed in the first place, and how IMmerge solves the problem.

Names Without an Identity

Display name resolution is mainly needed when IMmerge converts from one log format to another. If you are simply merging one type of log and don’t need conversion, there’s no issue! IMmerge will copy over the information exactly, without caring which part is someone’s display name and which part is what they said. But when format conversion is needed, IMmerge needs to parse the log and understand it. Most plain-text formats like Trillian LOG, and even some XML formats like Windows Live Messenger (MSN), only log the person’s name next to their message, and not the user ID. However, other log formats might want to know who really sent the message (e.g. to colour code the messages, like Trillian Pro), and some formats (such as ICQ) only store whether a message is incoming or outgoing, and do not store the display name at all! So to do the conversion properly, we need a reliable way to match each display name to its user ID.

Back in 2007, I solved this problem for MSN using a heuristic algorithm that compared new names to previously seen names, with good success. When implementing Trillian LOG conversion in IMmerge 1.0, I re-used the same logic. Unfortunately, when run on real-world Trillian logs, it produced a lot of user prompts and many false positives. In IMmerge version 1.03, I have made several tweaks to the algorithm which  improves behaviour on Trillian logs.

The Problem with Plain Text

With Trillian plain-text logs, a single message looks like this:

Alice: Hi Bob!

MSN logs use an XML format, and it looks more like this (greatly simplified):

<Message>
 <From>Alice</From>
 <To>Bob</To>
 <Text>Hi!</Text>
</Message>

This takes a lot longer for a human to read, but the advantage of the structured XML format is that it delineates all the fields, so we are always sure which part is the display name and which part is the message. With Trillian logs, we need to guess based on context.

(more…)


Filed under: Algorithms,IMmerge v1,Release | No Tag
No Tag
March 10th, 2011 05:30:10

IMmerge 1.02 released!

December 11, 2010

This version brings the new and improved CLI (command-line interface)! The CLI now gives you three ways to add folders — the legacy mode from IMmerge 0.66 (insert number of folders), the ability to specify each folder from a command-line argument, and an interactive menu system that will allow you to add, remove, or edit the folders before you merge. The CLI is now equally as powerful as the GUI.

New menu system

If you had difficulty with IMmerge asking too many questions about display names in 1.0, I have fixed the now global option for Resolve Display Names: Automatically. You won’t be able to correct IMmerge’s mistakes, so I suggest you only use it if you really can’t stand all the prompts. You can see how well it is doing in the log, whenever it prints “IMmerge thinks <name xyz> is <user>”. I’ll be working on better intelligence for the next version, a non-trivial task!

Get it from the downloads.

-zAlbee


Filed under: IMmerge v1,Release | No Tag
No Tag
December 11th, 2010 14:20:43

IMmerge 1.01 Released

September 22, 2010

This is a quick maintenance release to fix a reported bug and a missing feature.

  • Skype added to folders recognized under Trillian logs
  • Fixed string index out of range in TrLogReader.parseMessage

Also, I am dropping the “beta #” naming scheme. Grab it from the downloads.

-zAlbee


Filed under: IMmerge v1,Release | No Tag
No Tag
September 22nd, 2010 01:23:49

IMmerge 1.0 Beta 1 Released!

July 8, 2010

After a very long wait, I’m happy to release the first beta of IMmerge 1.0! This is the first public release with the new graphical user interface. If you have hesitated to use IMmerge in the past because of the command line, please give this one a try!

IMmerge window

This beta includes the following new features (for a complete list, see the Readme file):

  • New GUI and New engine
  • Trillian XML <-> LOG conversion
  • Nickname suggestions for those cryptic ICQ user IDs
  • Updated display name detection, with support for Trillian logs

This beta removes the following features, due to not enough time to test:

  • Command-line interface (CLI)

Most of the features above have been discussed in a previous post.

Nickname suggestion window

Don't recognize your friend's UIN? IMmerge makes it easy.

Please keep in mind this is a beta version, so I recommend you make a backup of any files before you replace them. It’s been tested quite a bit, but there will always be something that I didn’t catch. There are also some known issues — so if you are insistent about having perfect logs, then I suggest keeping a backup handy.

There is one major issue I know of where I would suggest waiting for a fixed version:

If you used versions of Trillian OLDER than 4.0 (including 0.74 — 3.1) in MORE THAN ONE timezone, and you run IMmerge on these logs, then you will get duplicates in your logs in the wrong timezone. Logs from Trillian 4.0 beta (later than build 90, 2008-11-21), Trillian 4.0 FINAL and later are unaffected.

This is due to a bug in older versions of Trillian, where timezone information is not logged. Note that only those sessions that differ from the Timezone setting in IMmerge will be duplicated (and only duplicated once). If you are OK with that, then go ahead, but do make a backup.

Finally, feel free to leave your feedback and suggestions here in the comments, in the forum or by email. Thanks go to all the people who have emailed me in the past; you have definitely helped solve a great deal of bugs!

-Albert


Filed under: IMmerge v1,Release | No Tag
No Tag
July 08th, 2010 04:53:13

Project Status

June 17, 2010

This week, I took a vacation for the first time in 6 months, and coincidentally this will also be my first blog post about IMmerge in 6 months. During some downtime on the plane, I decided to whip out my laptop and figure out what is needed to get IMmerge finished. There’s nothing quite like breaking your daily routine to get you kickstarted on some new (or old) projects!

Interesting stats:

Last sync to IMmerge source code repository: Sun Mar 21 16:30:26 CST 2010
Last IMmerge-related blog post: January 11, 2010

So here is the current status. IMmerge 1.0 uses a new merge engine, so much of the code had to be rewritten; I am satisfied that all old features in 0.66 are working in the new 1.0 engine. Next, I will release a public beta of IMmerge 1.0, which will include the GUI, and Trillian LOG-XML interconversion, but no other new features. Before I release this, the following issues must be fixed:

  • TrLogReader: multiple blank lines are collapsed into one
  • TrLogReader: fails to parse message timestamp containing seconds

The following feature needs to be implemented:

  • TrLogReader: does not distinguish between incoming/outgoing messages yet

The first 2 issues can be fixed quickly, while the 3rd (feature) will take some time. Note that all the issues deal with TrLogReader parsing, which is only used when a session in Trillian .LOG needs to be converted into some other format (currently only XML possible)

Question: Would you rather wait until the incoming/outgoing feature is finished, or is it acceptable to skip it for the first beta?


Filed under: IMmerge v1 | No Tag
No Tag
June 17th, 2010 23:08:55

New Features in IMmerge 1.0

January 11, 2010

Here is a summary of new features to expect in the next major version of IMmerge, 1.0.

Graphical User Interface

The most obvious change from the user’s perspective is the introduction of a GUI at last! No more having to move and rename your log folders, as you can now select folders with your mouse. This is written in Java Swing and will use native look-and-feel of your OS where available. (I suggest you keep Java up-to-date. Many GUI improvements were made in Java 6.) It is also threaded properly so that if the merge operation errors or hangs (*knocks on wood*), it won’t take down the entire program. A log of IMmerge’s actions will be available like before.

Image of main windowImage of log windowImage of options window

New Command-Line Interfaces

(more…)


Filed under: IMmerge v1 | No Tag
No Tag
January 11th, 2010 12:11:25

Teaser for IMmerge 1.0

November 1, 2009

Now for the real reason to start a blog… Here is a taste of things to come!

To answer some quick questions: It will still be in Java and hence still be cross-platform, but with native look-and-feel (pictured on Vista). Also, this is a working version, not a mock-up.

-zAlbee


Filed under: IMmerge v1 | No Tag
No Tag
November 01st, 2009 02:45:40

Welcome to the blog!

October 31, 2009

IMmerge is a program that I wrote in my spare time — a small command-line utility for merging IM logs (mainly Trillian). This could have easily been a throw-away program to forget about (the source code for the currently released version is shamefully unmodular!), but over the years I have maintained, supported, and added features to it. Unfortunately, I had left the website in this simplistic state.

It is time to update the website, which has been decidedly Web 1.0 for too long now. I still believe that having static webpages is the best way to provide information to new users, and I will continue to support this. However, the user input available through blogs and forums is too valuable to continue missing out on. So here we go — a blog to provide news and updates on ongoing development, and hopefully gain some user insight.

w00t!

–Albert Choi (zAlbee)
Program Author


Filed under: IMmerge,Site | No Tag
No Tag
October 31st, 2009 03:46:11