zAlbee's IMmerge Trillian/ICQ/MSN Instant Messaging Log Merger by: zAlbee email: zalbee -AT- gmail -DOT- com site: http://zalbee.intricus.net/immerge version 1.0 beta 1 (2010.07.08) ============================================================= What it does: ------------- Do you use Trillian on multiple computers to chat? IMmerge can synchronize your chat logs together, in chronological order. It can even merge in certain logs from the original clients (ICQ Pro/2003b, ICQ Lite/5.1, MSN 7.5/Windows Live Messenger). All merged logs come out as Trillian-format. New in 1.0: GUI, XML <-> LOG conversion, new engine, and more! IMmerge supports: - merging classic text logs (.LOG) from all versions of Trillian 0.74 - 4.1, Basic and Pro - merging new XML logs (.XML) from Trillian 3.0+ Basic and Pro. This means you can continue to view your history with the Trillian 3.0+ Pro history viewer. - converting ICQ Pro (98-2003b) logs into Trillian LOG and XML, automatically during merging - converting ICQ Lite (4.0-5.1) logs into Trillian LOG and XML, automatically during merging - converting MSN (7.5) and WLM (8.0+) logs into Trillian LOG and XML, automatically during merging Notes for this beta: This is a pre-release version of IMmerge 1.0! This Readme may be incomplete. It includes GUI only. Final version will have these new features: - Three user interfaces: 1. New graphical user interface (GUI) 2. New command-line arguments for automation 3. Old interactive mode is still supported - though order of prompts may have changed - New, revamped merge pipeline and framework: - Pipeline is now: read -> organize -> merge -> write/convert - Select any number of input or output folders, and the merge will be done in 1 pass (assuming similar log organization) - Internal priority system prefers logs of the same format, so that conversion is only done when necessary - New pipeline will make it easier to support new formats in the future (not finalized) - Options specified per-format or per-folder with inheritance Format support in 1.0: Format Read Write ------ ---- ----- Trillian Log Y Y Trillian XML Y Y ICQ Lite 4.0-5.1 Y - ICQ Pro 98-2003 Y* - MSN/Windows Live Y - *via ICQ History Reader What it doesn't do: ------------------- It doesn't compare the message text itself. It only compares session timestamps. It doesn't re-sort the messages. If for some reason your messages are out of order, they will stay in that order. (It is fine if two logs overlap, as long as each individual log is in correct chronological order.) Does not support logs from any version of MSN Messenger for Mac or Microsoft Messenger for Mac. "Probably" does not support ICQ 6.0 or later logs - you can try. Requirements: ------------- Java - http://www.java.com - version 6 (build 1.6) or greater If you plan to merge ICQ Pro logs: - ICQ History Reader - converts ICQ 98/99/2000/2001/2002 databases to HTML format - this is not necessary for ICQ Lite - http://hitu.host.sk/icqhr.php or http://www.headstrong.de/cgi-bin/download.cgi?icqhr How to install: --------------- Simply extract the contents of the ZIP file into a folder of your choice. For example, "C:\immerge". Make sure to extract with original paths. How to run: ------------- This section is being written. Technical Notes: ---------------- - Currently, the program merges Trillian logs found in the following folders: - AIM - ICQ - IRC - MSN - YAHOO - ASTRA - GOOGLE - JABBER - FACEBOOK - MYSPACE - XFIRE - SAMETIME - Note that ICQ Pro/98-2003b and ICQ Lite/4.0+ single messages are transformed into Trillian-format sessions by IMmerge. The way it works is all messages within 24 hours of the first message are put in the same session. Then a new session is started, and the process is repeated. - When using ICQHR to convert your history, you should leave the ICQHR TimeZone setting on default, otherwise it may omit large portions of your history. - [this is improved in 1.0] The conversion for MSN to Trillian LOG/XML converts all chat messages, but does not convert information-type messages (such as User sends wink, File Transfer Requested/Received/Sent, User Joins/Leaves Conversation). If this is important to you, you can wait for a newer version of IMmerge that supports this, and then re-merge the original MSN logs (they will have priority). - The display name detection algorithm scores the similarity of two display names by the longest common substring with penalty for matches further from the beginning. If you are interested in helping me refine this metric, contact me. Changelog: ---------- - v1.0 beta 1 (July 8, 2010) - New Graphical User Interface - New, revamped merge pipeline and framework - Can now specify any log folders; No need to rename - multiple output folders supported - options are now set per-format or per-folder with inheritance - will no longer ask the user for missing options until we absolutely need it - Trillian LOG and XML files are now merged at the same time; any missing sessions in one is automatically filled by the other - log->xml conversion will intelligently distinguish between incoming/outgoing messages - display name detection is updated, fewer prompts and messages - added merging of Trillian (Pro/Astra) medium: SAMETIME - added nickname suggestions for unknown ICQ UINs using Trillian logs and ICQHR first/last name - changed the way we interpret ICQ History Reader timezones. No longer need to specify offset. - added conversion of MSN file-transfer and chatroom join/leave messages to Trillian informational messages - improved Trillian XML group message format writer to conform better to Trillian's format - v0.66 (Sep 22, 2009): - fixed Trillian XML log size growing exponentially, in the rare case where text occurs before the first session, creating duplicates (if you experience this, manually delete duplicates from the top of the affected files). - improved merge speed of Trillian LOG and XML by using StringBuffer. - v0.65 (Dec 19, 2008): - read new timezone introduced in Trillian Astra 4.0 Build 90 (Nov 21). Previous IMmerge versions are unable to read the date correctly from *.LOG files after Build 90. Update strongly suggested. - write new timezone in output. - better detection of missing session close marker in Trillian *.LOG files. Fixes incorrect ordering in some cases. Also add blank lines here for visual clarity. - v0.64 (Oct 1, 2008): - fixed unable to detect UTF-8 header in Trillian *.LOG files on some systems. - added commandline to choose character set for *.LOG files that do not contain UTF-8 BOM. Default is based on Java's default for your locale/region. It is unlikely you need to use this unless you have very old (Trillian 0.74) logs. Use: java immerge -defaultcharset CHARSETNAME - v0.63 (Feb 25, 2008): - now uses case-insensitive comparison of filenames (user IDs), while retaining the original case on output. - added option to skip an MSN Messenger log if you don't know the email address. - v0.62 (Dec 8, 2007): - fixed bug where sessions might not be in the correct order in the following situation: if merging 3 or more logs, reading the last session in one log (A), 2 other logs (B, C) have the same session as A, but C's next session comes before B's next session. - added version string - added merging of Trillian (Pro/Astra) mediums: ASTRA, FACEBOOK, GOOGLE, JABBER, MYSPACE, XFIRE - v0.61 (May 21, 2007): - decreased memory use, fixed out-of-memory error with large number (1000's) of log files. (Avoid allocating read buffers before needed.) - fixed unable to read any sessions from ICQ Lite logs created from ICQ 2003b history using ICQ Conversion Tool. Linebreaks in XML files no longer matter. - fixed potential early termination if empty events found in ICQ Lite logs. - v0.6 (Apr 6, 2007): - added MSN/WLM log conversion/merging into Trillian LOG + XML (messages only). -> detects most user IDs (hotmail.com, msn.com, gmail.com) from hashcode. -> supports correct incoming/outgoing message formatting, by detecting the owner of a display name with a custom similarity metric. User-correctable. - now encodes display names and user IDs into entities when converting to Trillian XML (only encoded message text before). - v0.5 (Feb 24, 2007): - added ICQ Lite log conversion/merging into Trillian LOG + XML. - added immerge.ini file to save ICQ nicknames. -> Thanks to Prasad P. Khandekar for the freely available INIFile class. - added support for Channel subfolder in Trillian 3.0+. -> on input, will find logs in Channel subfolder. -> if using "Output to Query/Channel" option, auto-detects MSN Group Conversations and IRC channels (with # in name), and places them in Channel subfolder. - v0.45 (Jan 24, 2007): - greatly improved the conversion from ICQ Pro to Trillian XML logs. -> fixed the way non-ASCII characters were converted from ICQ Pro to character entities in Trillian XML logs, including UTF-8 entity sequences. Also fixes certain combinations of HTML entities. -> fixed a bug where merging ICQ Pro with Trillian XML logs would leave all Trillian sessions at the beginning and ICQ Pro at the end (date was in seconds, should be ms) - now adds a space in front of subsequent lines of a multi-line message, when converting from ICQ Pro. This is more like the newer Trillian behaviour (in both LOG and XML files) - now requires JRE 1.4 (for regex support) - v0.4 (Jul 13, 2006): - added XML log merging - also produces XML logs from ICQ history files, and then merges - fixed the read/write methods for UTF-8 files when detected. It was using the system default character encoding before. IMmerge is now completely safe to use on logs with foreign Unicode characters. - v0.3 (Nov 30, 2005): - fixed Trillian log timestamp-parsing in other locales/language regions. - added option to use 24hr display - added option to change timezone correction of ICQHR - v0.22 (Nov 27, 2005): - important fix! did not fix corrupt UTF-8 header before, now it does. - v0.21 (Nov 26, 2005): - no longer needs JRE 1.5. now runs on 1.2 and up. - v0.2 (Nov 23, 2005): - now looks in Query folder + regular folder - added option to output to Query folder - now keeps true to all the text in the original logs, even if it's formatted incorrectly, not in a session, etc. - fixed nested session handling - now outputs own UTF-8 header - (--fixes logs with corrupt header-- note: was not actually implemented) - now works with ICQ History Reader logs that don't have user info - v0.1 (Oct 4, 2005): - first version, little buggy Known Issues: ------------- - [TrLogReader] Only accepts one timezone. If you used versions of Trillian OLDER than 4.0 (including 0.74 -- 3.1) in MORE THAN ONE timezone, and you run IMmerge on these logs, then you will get duplicates in your logs in the wrong timezone. Logs from Trillian 4.0 beta (later than build 90, 2008-11-21), Trillian 4.0 FINAL and later are unaffected. - Windows Live Messenger "archive" logs are ignored - Hash algorithm does not work for Windows Live Messenger logs for Yahoo! contacts - Can not really cancel merge (GUI) - Can not give feedback for false positives in display name detection - No option to change name detection sensitivity for Trillian yet - IMmerge remembers your responses to questions until exit (GUI) Things that are being worked on: -------------------------------- * more complete MSN conversion (information messages) * output formats other than Trillian (two-way conversion) * Pidgin (GAIM) support * Sametime support Requested Features: ------------------- - AIM+ (?) support - Miranda support - mIRC support - MySpace IM (?) support - Pidgin (GAIM) support - Yahoo Messenger support - checksum (CRC or MD5) to skip merging unchanged files - IMsplit - split logs by session (or by date) - find ICQ names and MSN IDs from Trillian's Buddies.xml