For Mac/Windows/Linux/etc — diglloydTools IntegrityChecker Java version 1.3b5 Now Available — Includes Automated Install Script for macOS users
“Bit rot” is rare, but lots of other Bad Stuff can happen to your data, as I learned recently. Whatever the cause, making sure your original data and your backups are 100% exactly right bit-for-bit is something all professionals ought to be doing.
...
diglloydTools IntegrityChecker Java version on any computer with Java—Mac, Windows, Linux, etc. More about IntegrityChecker and why every professional should be using it.
IntegrityChecker Java version (icj) 1.3b5
This version is the culmination of three weeks of coding for 14 hours a day even over the holidays. A tremendous amount of effort has gone into this version of icj, including researching how to deliver performance that very few native-code programs can match. icj makes use of CPU cores and disk I/O speed like very few programs can.
Just posted is diglloydTools IntegrityChecker Java version 1.3b5 (if downloaded before 16:00 Pacific time Jan 19, please download it again a minor error was in the jdk download script):
This is a beta version but well tested (and IntegrityChecker only reads your files anyway!). It will likely become the official release version soon.
Installation
Version 1.3b5 builds includes an install script for macOS that both installs IntegrityChecker Java (icj) and also optionally installs Java itself. All you have to do is answer "yes" or "no" when prompted.
Documentation/help
Documentation is currently inadequate and is coming soon. For now please see the overview page.
If you need support, please copy/paste the text from the Terminal window—don’t send screen shots as they are very large and frequently contain too little information.
Improvements/additions
- Automated installation script for macOS.
- Even higher performance, particularly on hard drives. On a 16-core Mac Pro and suitably fast drives, total hashing throughput can exceed 7.5 gigabytes per second. The “fast” 3.2 GB/sec internal Apple SSDs are too slow for full icj performance on 8 CPU core machines and reasonably recent 6 core machines.
- Auto-detects whether the drive is an SSD or hard drive, an optimizes accordingly. For technical reasons this cannot always be known; use --optimize HDD if icj incorrectly decides the drive is an SSD.
- Improved handling of special files, permissions issues, etc.
- Greatly reduced per-file memory usage (however, due to the extreme performance, memory usage may still rise quickly when the Java virtual machine has no time to garbage collect).
- icj saved my 700 gigabytes of wasted space via its "dupes" command, which finds duplicate files (I had inadvertently downloaded several of the same large photo shoots this year). It determines which files are most likely the original/best ones to keep by dates and by existence of sidecar files, etc. Super-cool is that it emits commands that can either clone the duplicates (in effect leaving them there as before but eliminating 100% of the duplicate space usage) or commands that remove the duplicates. Cloning does require APFS on macOS however.
- icj can compare two folder trees, so for example it is possible to see if a folder one one drive is the same as on another.
Quick intro
Most people are not comfortable with the command line. It is simple to use.
- Open a Terminal window (/Applications/Utilities/Terminal).
- Type "icj" followed by a space followed by the command (e.g. verify or update).
- Type a space after the command, then drag anything into the Terminal window from the Finder (a folder or multiple folders, or a volume or even multiple volumes).
- Press the RETURN or ENTER key.
In step #3, you can also type in the desired item(s) instead.
Usage tips
- To stop (forcibly kill) icj in a Terminal window, type control-C (control, not cmd).
- To temporarily suspend icj in a Terminal window, type control-Z. To resume, type "fg" (foreground).
- The macOS file cache can steadily degrade performance as it caches absolutely everything that icj reads. There is currently no technical way to tell macOS to not cache when reading files from a Java program. To enable icj to flush this cache, run icj using "sudo" as in:
sudo icj verify MasterData - Spaces matter in file, folder and volume names. So you want to verify the volume "My Stuff", the volume name must be quoted, like this (include the straight quotes):
icj verify "My Stuff"
This is a nuisance, so it is better just to not use spaces in volume or folder names that you use a lot, e.g., “MyStuff”, not “My Stuff”. - You can run icj in more than one Terminal window. So if you want to simultaneously operate on three different backup drives (such as to verify), just open three Terminal windows and start icj in each window, something like:
icj verify Volume1
icj verify Volume2
icj verify Volume3 - You can "detune" icj if it is desirable for it to use less CPU time buy specifying few threads and buffers. For example, adding these options will limit icj to just two hashing threads and four I/O buffers:
icj verify --threads 2 --large-buffers 4 MyData - If you want to verify things one after another to restrict CPU and/or memory usage, the best way is separate invocations. This command works fine, but it does the three items as one job:
icj verify MyStuff Work MasterData
Instead, use a ";" to separate the commands, like this:
icj verify MyStuff; icj verify Work; icj verify MasterData
Overview of commands
Typing "icj" or "icj help" in Terminal will show this summary.
diglloyd-MacPro:MPG lloyd$ icj # icj version 1.3b5 @ 2020-01-19 13:00 # Copyright 2018-2020 DIGLLOYD INC. All Rights Reserved # Use of this software requires a license. https://macperformanceguide.com/Software-License.html # Sun Jan 19 13:23:05 PST 2020 Available commands: verify verify hash values status summarize files that are new, or of changed size or date compare compare: compares two folders for equality update update new and date/size changed files, forget missing items update-all update hash values for all files, whether or not they already have hashes update-new update only files lacking hash values clean remove all hash data files dupes show duplicate files: options --size--types type[,type]* --emit <rm|clone|nop> empty show empty files sha test hashing speed: options --size 1M --sha SHA-512 version display the version and other information help show help, specify which command such as 'help verify' Manual at https://diglloydtools.com/manual/integritychecker-icj.html
Example commands
Lines that start with "#" are comments. These examples assume a folder called MyStuff and a volume (entire drive) called Work.
# ensure that hashes exist for all files on volume Work
icj update Work # ensure that hashes exist for all files in folder MyStuff icj update MyStuff # ensure that hashes exist for all files in folder /Volumes/Work/Photos icj update /Volumes/Work/Photos