zAlbee's IMmerge Trillian/ICQ/MSN Instant Messaging Log Merger by: zAlbee email: zalbee -AT- gmail -DOT- com site: http://zalbee.intricus.net/immerge version 0.66 (Sep 22, 2009) ============================================================= What it does: ------------- Do you use Trillian on multiple computers to chat? IMmerge can synchronize your IM logs together, in chronological order. It can even merge in certain logs from the original clients (ICQ Pro/2003b, ICQ Lite/5.1, MSN 7.5/Windows Live Messenger). All merged logs come out as Trillian-format. New in 0.6: Now merges from MSN/Windows Live! New in 0.65: Trillian Astra (4.0) support IMmerge supports: - merging classic text logs (.LOG) from all versions of Trillian 0.74 - 3.1, Basic and Pro - merging new XML logs (.XML) from Trillian 3.0+ Basic and Pro. This means you can continue to view your history with the Trillian 3.0+ Pro history viewer. - converting ICQ Pro (98-2003b) logs into Trillian LOG and XML, automatically during merging - converting ICQ Lite (4.0-5.1) logs into Trillian LOG and XML, automatically during merging - converting MSN (7.5) and WLM (8.0+) logs into Trillian LOG and XML, automatically during merging What it doesn't do: ------------------- It doesn't compare the message text itself. It only compares session timestamps. It doesn't yet produce XML logs from Trillian text logs, or vice-versa. It also does not do conversion to anything except Trillian logs. These will all be addressed in a future version. It doesn't re-sort the messages. If for some reason your messages are out of order, they will stay in that order. (It is fine if two logs overlap, as long as each individual log is in correct chronological order.) Does not support logs from any version of MSN Messenger for Mac or Microsoft Messenger for Mac. "Probably" does not support ICQ 6.0 logs - you can try. Requirements: ------------- Java Runtime Environment (JRE). http://www.java.com - version 1.4 or greater If you plan to merge ICQ Pro logs: - ICQ History Reader - converts ICQ 98/99/2000/2001/2002 databases to HTML format - this is not necessary for ICQ Lite - http://hitu.host.sk/icqhr.php or http://www.headstrong.de/cgi-bin/download.cgi?icqhr How to install: --------------- Simply extract the contents of the ZIP file into a folder of your choice. For example, "C:\immerge". Make sure to extract with original paths. How to run: ------------- Right now, this program can only be run from the command line. * Short version: 1. Collect your log folders that you want to merge. Put the folders where IMmerge is installed, and make sure they are named: where is one of: "logs", "icql", "icqh", or "msn" for Trillian, ICQ Lite, ICQ History Reader, or MSN respectively. and is at least "1". 2. Run "java immerge" from a command prompt. 3. The program will ask you for a few things: Such as how many folders you have in Step 1. 4. The merged logs will be created in a folder called "output". Stats and error messages are displayed on screen. 5. If all was sucessful, you can copy the output folder back to your Trillian folder. * Long version, with explanations of options: 0. Run ICQ History Reader (only if you have logs from the original ICQ Pro client). Click "Open Database", select your ICQ DB, then "Export to HTML". Under settings, use "UIN per file". 1. Collect your log folders that you want to merge. a) Go to your Trillian logs from one computer. By default, they are in "C:\Program Files\Trillian\users\\logs". Copy this entire folder, and name it "logs1". Put this folder in the same place where IMmerge is installed. Repeat for the other computers, renaming them to logs2, logs3, logs4, etc. b) Put your ICQ Pro HTML logs (from step 0: ICQ History Reader) from one computer into one folder, named "icqh1". Put this folder in the same place where IMmerge is installed. Repeat for the other computers, renaming them to icqh2, icqh3, etc. c) Go to your ICQ Lite logs from one computer. By default, they are in: "C:\Documents and Settings\\Application Data\ICQLite\HistoryDB\" for Win XP/2003, or "C:\WINDOWS\Profiles\\Application Data\ICQLite\HistoryDB\" for Win 98/ME. Take a copy of the entire folder, and rename it from to "icql1". Put this folder in the same place where IMmerge is installed. Repeat for the other computers, renaming them to icql2, icql3, etc. d) Go to your MSN/WLM logs from one computer. By default, they are in: "C:\Documents and Settings\\My Documents\My Received Files\\History" for Win XP/2003, or "C:\My Documents\My Received Files\\History" for Win 98/ME. Take a copy of this folder, and rename it from "History" to "msn1". Put this folder in the same place where IMmerge is installed. Repeat for the other computers, renaming them to msn2, msn3, etc. 2. Run "java immerge" from a command prompt. Open a command prompt by clicking Start > Run, then type: "cmd" for Win XP/2003, or "command" for Win 98/ME. and press . Navigate to wherever you installed IMmerge (e.g. C:\immerge) by typing: "cd C:\immerge" and then "java immerge" -> If you get a NoClassDefFoundError, make sure you are in the directory with immerge.class. If that doesn't work, try "java -classpath . immerge" 3. The program will ask you for a few things: 1) How many folders ... : enter the number of "logs" folders (from step 1.a) 2) How many folders ... : enter the number of "icqh" folders (from step 1.b) 3) How many folders ... : enter the number of "icql" folders (from step 1.c) 4) How many folders ... : enter the number of "msn" folders (from step 1.d) 5) Should the logs be output to a "Query" or "Channel" folder? - enter "Yes" to write both LOG and XML files into the Query (or Channel) subfolder (this is what fresh installs of Trillian 3.0+ expect) - enter "No" to write LOG files into the parent folder (this is what pre-3.0 versions OR pre-3.0 upgraded to 3.0+ installs expect) - XML files are always written to the Query (or Channel) folder - This only affects the output folder, IMmerge will still read all your input logs successfully If you are merging non-Trillian logs, it may also ask you: a) Your ICQ UIN: - this is your ICQ number (required) - affects logs that are merged from ICQ Pro, ICQ Lite - required for making functional XML files b) Your ICQ Nickname: - affects logs that are merged from ICQ Pro, ICQ Lite c) Display time in 24-hr format? - Enter Yes to write log times like [18:45] - Enter No to write log times like [06:45 PM] d) Correct ICQHR timezone by how many hours?: - Check the times that ICQ History Reader produces in your logs; if they are wrong, enter a number of hours to correct it. - For example, I found that -16 hours is the correct offset for mine. e) Enter nickname for : - Nickname of this ICQ user (it will be saved in immerge.ini) - If you don't know, leave blank and the ICQ number will be used in the converted logs instead. f) Your MSN login (e.g. abc@hotmail.com): - Required. g) Your MSN display name (most common or earliest): - helps resolve display names - the program will use this name initially to compare unknown display names h) How should unknown MSN display names be resolved? - Since MSN/WLM logs only store display names, not user IDs, IMmerge can not be certain whether some display names belong to the owner (you) or the user, but will make an educated guess. This option determines how much control you have on that decision. 1. Automatically - Always uses the program's guess. May be useful if you are running the program unattended. 2. Prompt on vague names - Will ask you to confirm a display name if it is not sufficiently similar to a known name. 3. Prompt on any name that is new/uncertain - Will ask you to confirm a display name if it is not 100% equal to a known name. - Resolving the display name correctly means that the message will be correctly formatted as incoming or outgoing in Trillian XML logs. There is no other effect. i) Enter the MSN ID (full email address) for : - The user is using a non-standard email address to login to MSN (not hotmail.com, msn.com, or gmail.com) - Enter the email they use. IMmerge will check for its correctness. - If you don't know it, type 'skip'. - This will be saved in immerge.ini. 4. The merged logs will be created in a folder called "output". Stats and error messages are displayed on screen. 5. If all was sucessful, you can copy the output folder back to your Trillian folder. Technical Notes: ---------------- - Currently, the program only merges Trillian logs found in the following folders: - AIM - ICQ - IRC - MSN - YAHOO - ASTRA - GOOGLE - JABBER - FACEBOOK - MYSPACE - XFIRE However, you could trick it into merging logs of other mediums by renaming the folder. There is no actual difference in the way it treats one medium vs. another. - Note that ICQ Pro/98-2003b and ICQ Lite/4.0+ single messages are transformed into Trillian-format sessions by IMmerge. The way it works is all messages within 24 hours of the first message are put in the same session. Then a new session is started, and the process is repeated. - It is unknown how ICQ History Reader calculates timezones, as it does not handle as I expected. When using ICQHR to convert your history, you should leave the ICQHR TimeZone setting on default, otherwise it may omit large portions of your history. Then in the IMmerge program specify an offset. A value of -16 works for my logs; you may have to experiment. Please e-mail me (zalbee@gmail.com) if have any additional insight into this. - The conversion for MSN to Trillian LOG/XML converts all chat messages, but does not convert information-type messages (such as User sends wink, File Transfer Requested/Received/Sent, User Joins/Leaves Conversation). If this is important to you, you can wait for a newer version of IMmerge that supports this, and then re-merge the original MSN logs (they will have priority). - The display name detection algorithm scores the similarity of two display names by the longest common substring with penalty for matches further from the beginning. If you are interested in helping me refine this metric, contact me. Changelog: ---------- - v0.66 (Sep 22, 2009): - fixed Trillian XML log size growing exponentially, in the rare case where text occurs before the first session, creating duplicates (if you experience this, manually delete duplicates from the top of the affected files). - improved merge speed of Trillian LOG and XML by using StringBuffer. - v0.65 (Dec 19, 2008): - read new timezone introduced in Trillian Astra 4.0 Build 90 (Nov 21). Previous IMmerge versions are unable to read the date correctly from *.LOG files after Build 90. Update strongly suggested. - write new timezone in output. - better detection of missing session close marker in Trillian *.LOG files. Fixes incorrect ordering in some cases. Also add blank lines here for visual clarity. - v0.64 (Oct 1, 2008): - fixed unable to detect UTF-8 header in Trillian *.LOG files on some systems. - added commandline to choose character set for *.LOG files that do not contain UTF-8 BOM. Default is based on Java's default for your locale/region. It is unlikely you need to use this unless you have very old (Trillian 0.74) logs. Use: java immerge -defaultcharset CHARSETNAME - v0.63 (Feb 25, 2008): - now uses case-insensitive comparison of filenames (user IDs), while retaining the original case on output. - added option to skip an MSN Messenger log if you don't know the email address. - v0.62 (Dec 8, 2007): - fixed bug where sessions might not be in the correct order in the following situation: if merging 3 or more logs, reading the last session in one log (A), 2 other logs (B, C) have the same session as A, but C's next session comes before B's next session. - added version string - added merging of Trillian (Pro/Astra) mediums: ASTRA, FACEBOOK, GOOGLE, JABBER, MYSPACE, XFIRE - v0.61 (May 21, 2007): - decreased memory use, fixed out-of-memory error with large number (1000's) of log files. (Avoid allocating read buffers before needed.) - fixed unable to read any sessions from ICQ Lite logs created from ICQ 2003b history using ICQ Conversion Tool. Linebreaks in XML files no longer matter. - fixed potential early termination if empty events found in ICQ Lite logs. - v0.6 (Apr 6, 2007): - added MSN/WLM log conversion/merging into Trillian LOG + XML (messages only). -> detects most user IDs (hotmail.com, msn.com, gmail.com) from hashcode. -> supports correct incoming/outgoing message formatting, by detecting the owner of a display name with a custom similarity metric. User-correctable. - now encodes display names and user IDs into entities when converting to Trillian XML (only encoded message text before). - v0.5 (Feb 24, 2007): - added ICQ Lite log conversion/merging into Trillian LOG + XML. - added immerge.ini file to save ICQ nicknames. -> Thanks to Prasad P. Khandekar for the freely available INIFile class. - added support for Channel subfolder in Trillian 3.0+. -> on input, will find logs in Channel subfolder. -> if using "Output to Query/Channel" option, auto-detects MSN Group Conversations and IRC channels (with # in name), and places them in Channel subfolder. - v0.45 (Jan 24, 2007): - greatly improved the conversion from ICQ Pro to Trillian XML logs. -> fixed the way non-ASCII characters were converted from ICQ Pro to character entities in Trillian XML logs, including UTF-8 entity sequences. Also fixes certain combinations of HTML entities. -> fixed a bug where merging ICQ Pro with Trillian XML logs would leave all Trillian sessions at the beginning and ICQ Pro at the end (date was in seconds, should be ms) - now adds a space in front of subsequent lines of a multi-line message, when converting from ICQ Pro. This is more like the newer Trillian behaviour (in both LOG and XML files) - now requires JRE 1.4 (for regex support) - v0.4 (Jul 13, 2006): - added XML log merging - also produces XML logs from ICQ history files, and then merges - fixed the read/write methods for UTF-8 files when detected. It was using the system default character encoding before. IMmerge is now completely safe to use on logs with foreign Unicode characters. - v0.3 (Nov 30, 2005): - fixed Trillian log timestamp-parsing in other locales/language regions. - added option to use 24hr display - added option to change timezone correction of ICQHR - v0.22 (Nov 27, 2005): - important fix! did not fix corrupt UTF-8 header before, now it does. - v0.21 (Nov 26, 2005): - no longer needs JRE 1.5. now runs on 1.2 and up. - v0.2 (Nov 23, 2005): - now looks in Query folder + regular folder - added option to output to Query folder - now keeps true to all the text in the original logs, even if it's formatted incorrectly, not in a session, etc. - fixed nested sesssion handling - now outputs own UTF-8 header - (--fixes logs with corrupt header-- note: was not actually implemented) - now works with ICQ History Reader logs that don't have user info - v0.1 (Oct 4, 2005): - first version, little buggy Known Issues: ------------- - Windows Live Messenger "archive" logs are ignored - Hash algorithm does not work for Windows Live Messenger logs for Yahoo! contacts Things being worked on: ----------------------- - public API for use in other programs/front-ends - extendable LogReader / LogWriter class - commandline arguments - more complete MSN conversion (information messages) - let user specify timestamp format when converting - produce XML logs from Trillian logs if they do not have a corresponding XML file already - output formats other than Trillian (two-way conversion) - GUI / front-end Requested Features: ------------------- - AIM+ (?) support - Miranda support - mIRC support - MySpace IM (?) support - Pidgin (GAIM) support - Sametime support - Yahoo Messenger support - checksum (CRC or MD5) to skip merging unchanged files - IMsplit - split logs by session (or by date) - find ICQ names and MSN IDs from Buddies.xml