zAlbee's IMmerge Trillian/ICQ/MSN Instant Messaging Log Merger by: zAlbee email: zalbee -AT- gmail -DOT- com site: http://zalbee.intricus.net/immerge version 1.05 (2011.12.17) ============================================================= What it does: ------------- Do you use Trillian on multiple computers to chat? IMmerge can synchronize your chat logs together, in chronological order. It can even merge in certain logs from the original clients (ICQ Pro/2003b, ICQ Lite/5.1, MSN 7.5/Windows Live Messenger). All merged logs come out as Trillian-format. New in 1.0: GUI, XML <-> LOG conversion, new engine, and more! New in 1.05: Fixes and improvements in timestamp, group chat and IRC parsing. IMmerge supports: - merging classic text logs (.LOG) from all versions of Trillian 0.74 - 4.2, Basic and Pro - merging new XML logs (.XML) from Trillian 3.0+ Basic and Pro. This means you can continue to view your history with the Trillian 3.0+ Pro history viewer. - converting between Trillian plain-text (.LOG) and XML, automatically during merging - converting ICQ Pro (98-2003b) logs into Trillian LOG and XML, automatically during merging - converting ICQ Lite (4.0-5.1) logs into Trillian LOG and XML, automatically during merging - converting MSN (7.5) and WLM (8.0+) logs into Trillian LOG and XML, automatically during merging Notes for this beta: This is a pre-release version of IMmerge! This Readme may be incomplete. Since this is a beta version, please remember to backup your data. IMmerge 1.0 brings these new features: - Three user interfaces: 1. New graphical user interface (GUI) 2. New command-line arguments for automation 3. Old interactive mode is still supported - though order of prompts may have changed - New, revamped merge pipeline and framework: - Pipeline is now: read -> organize -> merge -> write/convert - Select any number of input or output folders, and the merge will be done in 1 pass (assuming similar log organization) - Internal priority system prefers logs of the same format, so that conversion is only done when necessary - New pipeline will make it easier to support new formats in the future (not finalized) - Options specified per-format or per-folder with inheritance Format support in 1.0: Format Read Write ------ ---- ----- Trillian Log Y Y Trillian XML Y Y ICQ Lite 4.0-5.1 Y - ICQ Pro 98-2003 Y* - MSN/Windows Live Y - *via ICQ History Reader What it doesn't do: ------------------- It doesn't compare the message text itself. It only compares session timestamps. It doesn't re-sort the messages. That means if you decide to put a random session into the middle, it will stay there. IMmerge will merge your logs in chronological order as best it can without messing up the original order. Does not support logs from any version of MSN Messenger for Mac or Microsoft Messenger for Mac. "Probably" does not support ICQ 6.0 or later logs - you can try. Requirements: ------------- Java - http://www.java.com - version 6 (build 1.6) or greater If you plan to merge ICQ Pro logs: - ICQ History Reader - converts ICQ 98/99/2000/2001/2002 databases to HTML format - this is not necessary for ICQ Lite - http://hitu.host.sk/icqhr.php or http://www.headstrong.de/cgi-bin/download.cgi?icqhr How to install: --------------- Simply extract the contents of the ZIP file into a folder of your choice. For example, "C:\immerge". Make sure to extract with original paths. How to run: ------------- To run the graphical version, just double-click on IMmerge.jar. To run the command line version, make sure you have extracted the contents of the ZIP file into a folder of your choice. Run immerge or immerge.bat. Use the -h option for help. Specify all the log folders that you wish to merge together in the Inputs. Then specify where you want the new logs to be as one Output folder. The location of the Output must be different from the Inputs. Click merge to start the process. Input Folder locations: ----------------------- Trillian: Please select the folder named "logs". - v.4+ (Windows Vista, 7): C:\Users\\AppData\Roaming\Trillian\users\\logs - v.4+ (Windows XP): C:\Documents and Settings\\Application Data\Trillian\users\\logs - v.3 or older: C:\Program Files\Trillian\users\\logs ICQ Lite (4.0-5.1): Please select the folder named with your ICQ Number. - Windows Vista, 7: C:\Users\\AppData\Roaming\ICQLite\HistoryDB\ - Windows XP, 2003: C:\Documents and Settings\\Application Data\ICQLite\HistoryDB\ - Windows 98, ME: C:\WINDOWS\Profiles\\Application Data\ICQLite\HistoryDB\ MSN Messenger 7.5 / Windows Live Messenger: Please select the folder named "History" with XML files in it. - Windows Vista, 7: C:\Users\\Documents\My Received Files\\History - Windows XP, 2003: C:\Documents and Settings\\My Documents\My Received Files\\History - Windows 98, ME: C:\My Documents\My Received Files\\History Technical Notes: ---------------- - Currently, the program merges Trillian logs found in the following folders: - AIM - ICQ - IRC - MSN - YAHOO - ASTRA - GOOGLE - JABBER - FACEBOOK - MYSPACE - XFIRE - SAMETIME - SKYPE - Note that ICQ Pro/98-2003b and ICQ Lite/4.0+ single messages are transformed into Trillian-format sessions by IMmerge. The way it works is all messages within 24 hours of the first message are put in the same session. Then a new session is started, and the process is repeated. - When using ICQHR to convert your history, you should leave the ICQHR TimeZone setting on default, otherwise it may omit large portions of your history. - The display name detection algorithm scores the similarity of two display names by the longest common substring with penalty for matches further from the beginning. If you are interested in helping me refine this metric, contact me. Changelog: ---------- - v1.05 (Dec 17, 2011) - Fixed: String index out of range error in TrXMLReader.parseAttr. - Fixed: Using automatic name detection level didn't select "Neither" - Fixed: Conversion from MSN group chats to Trillian XML swapped the incoming/outgoing colours - Fixed: Trillian plain-text logs weren't being recognized as group chats - Improved: Clearer choices when prompting about a display name from a group chat, esp. in IRC - Added: Custom message timestamp option for Trillian Log inputs - Improved Trillian message timestamp parsing: - Now properly handles the case where message timestamp is earlier than session timestamp. - Can now detect when someone copy/pasted a chat transcript in some cases (based on timestamps). - Fixed bug where the date would be incorrectly bumped up by one day in those cases. - Add ability for user to select the correct timestamp when ambiguous. - v1.04 beta (Jul 3, 2011) - Slight change to the way names in a group chat are remembered internally. - Fixed: Skype chatrooms not being recognized as group chats - Fixed: Conversion from Trillian XML may have removed tags that should have been part of the conversation - Fixed: Conversion from Trillian XML omitted the user name in " buzzed you" or " performed an action" messages. - Trillian Log Reader: Will no longer prompt on names without a timestamp (for logs that have timestamps) - v1.03 beta (Mar 10, 2011) - Display name resolution improvements: - Now compares session header to vastly improve accuracy and reduce user prompts - Ability to mark a detected display name as a false positive - New tie-break scheme when certainty is low - Under General Options: you can seed your display name again - Conversion from Trillian LOG to XML now makes links clickable in History Viewer - Some user interface improvements - Fixed: group conversations getting put into Query folder - Fixed: [regression] original case-sensitive file name lost since 1.0 - Fixed: IMmerge inserts more blank lines on repeated merging - Fixed: "String index out of range" error - v1.02 beta (Dec 11, 2010) - Fixed: Name detection prompt level wasn't working since 1.0. Move name detection options from plugin-specific to global - Fixed: MSN folder -> edit global options -> cannot press OK on folder - New: Command line interface is back, with interactive menu. You can now review and edit folders from the command line. - New: Will automatically find and merge archived logs produced by WLM/MSN messenger (untested) - v1.01 beta (Sep 22, 2010) - Skype added to folders recognized under Trillian logs - Fixed string index out of range in TrLogReader.parseMessage - v1.0 beta 1 (July 8, 2010) - New Graphical User Interface - New, revamped merge pipeline and framework - Can now specify any log folders; No need to rename - multiple output folders supported - options are now set per-format or per-folder with inheritance - will no longer ask the user for missing options until we absolutely need it - Trillian LOG and XML files are now merged at the same time; any missing sessions in one is automatically filled by the other - log->xml conversion will intelligently distinguish between incoming/outgoing messages - display name detection is updated, fewer prompts and messages - added merging of Trillian (Pro/Astra) medium: SAMETIME - added nickname suggestions for unknown ICQ UINs using Trillian logs and ICQHR first/last name - changed the way we interpret ICQ History Reader timezones. No longer need to specify offset. - added conversion of MSN file-transfer and chatroom join/leave messages to Trillian informational messages - improved Trillian XML group message format writer to conform better to Trillian's format - v0.66 (Sep 22, 2009): - fixed Trillian XML log size growing exponentially, in the rare case where text occurs before the first session, creating duplicates (if you experience this, manually delete duplicates from the top of the affected files). - improved merge speed of Trillian LOG and XML by using StringBuffer. - v0.65 (Dec 19, 2008): - read new timezone introduced in Trillian Astra 4.0 Build 90 (Nov 21). Previous IMmerge versions are unable to read the date correctly from *.LOG files after Build 90. Update strongly suggested. - write new timezone in output. - better detection of missing session close marker in Trillian *.LOG files. Fixes incorrect ordering in some cases. Also add blank lines here for visual clarity. - v0.64 (Oct 1, 2008): - fixed unable to detect UTF-8 header in Trillian *.LOG files on some systems. - added commandline to choose character set for *.LOG files that do not contain UTF-8 BOM. Default is based on Java's default for your locale/region. It is unlikely you need to use this unless you have very old (Trillian 0.74) logs. Use: java immerge -defaultcharset CHARSETNAME - v0.63 (Feb 25, 2008): - now uses case-insensitive comparison of filenames (user IDs), while retaining the original case on output. - added option to skip an MSN Messenger log if you don't know the email address. - v0.62 (Dec 8, 2007): - fixed bug where sessions might not be in the correct order in the following situation: if merging 3 or more logs, reading the last session in one log (A), 2 other logs (B, C) have the same session as A, but C's next session comes before B's next session. - added version string - added merging of Trillian (Pro/Astra) mediums: ASTRA, FACEBOOK, GOOGLE, JABBER, MYSPACE, XFIRE - v0.61 (May 21, 2007): - decreased memory use, fixed out-of-memory error with large number (1000's) of log files. (Avoid allocating read buffers before needed.) - fixed unable to read any sessions from ICQ Lite logs created from ICQ 2003b history using ICQ Conversion Tool. Linebreaks in XML files no longer matter. - fixed potential early termination if empty events found in ICQ Lite logs. - v0.6 (Apr 6, 2007): - added MSN/WLM log conversion/merging into Trillian LOG + XML (messages only). -> detects most user IDs (hotmail.com, msn.com, gmail.com) from hashcode. -> supports correct incoming/outgoing message formatting, by detecting the owner of a display name with a custom similarity metric. User-correctable. - now encodes display names and user IDs into entities when converting to Trillian XML (only encoded message text before). - v0.5 (Feb 24, 2007): - added ICQ Lite log conversion/merging into Trillian LOG + XML. - added immerge.ini file to save ICQ nicknames. -> Thanks to Prasad P. Khandekar for the freely available INIFile class. - added support for Channel subfolder in Trillian 3.0+. -> on input, will find logs in Channel subfolder. -> if using "Output to Query/Channel" option, auto-detects MSN Group Conversations and IRC channels (with # in name), and places them in Channel subfolder. - v0.45 (Jan 24, 2007): - greatly improved the conversion from ICQ Pro to Trillian XML logs. -> fixed the way non-ASCII characters were converted from ICQ Pro to character entities in Trillian XML logs, including UTF-8 entity sequences. Also fixes certain combinations of HTML entities. -> fixed a bug where merging ICQ Pro with Trillian XML logs would leave all Trillian sessions at the beginning and ICQ Pro at the end (date was in seconds, should be ms) - now adds a space in front of subsequent lines of a multi-line message, when converting from ICQ Pro. This is more like the newer Trillian behaviour (in both LOG and XML files) - now requires JRE 1.4 (for regex support) - v0.4 (Jul 13, 2006): - added XML log merging - also produces XML logs from ICQ history files, and then merges - fixed the read/write methods for UTF-8 files when detected. It was using the system default character encoding before. IMmerge is now completely safe to use on logs with foreign Unicode characters. - v0.3 (Nov 30, 2005): - fixed Trillian log timestamp-parsing in other locales/language regions. - added option to use 24hr display - added option to change timezone correction of ICQHR - v0.22 (Nov 27, 2005): - important fix! did not fix corrupt UTF-8 header before, now it does. - v0.21 (Nov 26, 2005): - no longer needs JRE 1.5. now runs on 1.2 and up. - v0.2 (Nov 23, 2005): - now looks in Query folder + regular folder - added option to output to Query folder - now keeps true to all the text in the original logs, even if it's formatted incorrectly, not in a session, etc. - fixed nested session handling - now outputs own UTF-8 header - (--fixes logs with corrupt header-- note: was not actually implemented) - now works with ICQ History Reader logs that don't have user info - v0.1 (Oct 4, 2005): - first version, little buggy Known Issues: ------------- - If you used Trillian in a different time zone than you are currently in right now, you may get duplicates after merging. This can happen if the version of Trillian was 3.1 and older, as well as any beta version before 4.0 build 90 (2008-11-21). Trillian 4.0 FINAL and later are unaffected. To ensure you don't get duplicates, in IMmerge, edit the Trillian Input Options to set the timezone where the OLD version of Trillian was used. If you used the OLD version of Trillian in one more than one timezone, you are out of luck. Sorry, I am working on a better solution for the future. - Hash algorithm does not work for Windows Live Messenger logs for Yahoo! contacts - Can not really cancel merge (GUI) - IMmerge remembers your responses to questions until exit (GUI) - Files are held open when merge canceled. Workaround: Close IMmerge. - Channel directory is created every time, even when there are no files inside. - Log files in the Channel directory may get moved to the Query directory if the filename does not correspond to "Group Conversation" or "#channel" naming convention. Things that are being worked on: -------------------------------- * Pidgin (GAIM) support Requested Features: ------------------- - Support for other IM client log formats: - AIM+ (?) - Digsby - Miranda - mIRC - MySpace IM (?) - Pidgin (GAIM) - Sametime - Skype - Yahoo! Messenger - checksum (CRC or MD5) to skip merging unchanged files - IMsplit - split logs by session (or by date) - find ICQ names and MSN IDs from Trillian's Buddies.xml