Rsync - Backup All Your Data With a Single Terminal Command
A few years ago, I was faced with the daunting task of manually migrating information from one SAN to another SAN, without the luxury of having an automated migration system. Anyone who is dealing with terabytes of data has, at one time or another, faced having to archive, transfer, or backup thousands of files at once.
|
I was introduced to a command line program called rsync to accomplish this task. Little did I know this one utility would be a permanent part of my toolset when managing large amounts of data. If you can't copy all of your data in one session, and you need to break it up into several days (or weeks) of data copying, rsync's usefulness becomes all the more apparent.
Rsync is a command line program. There are GUI variants built on the CLI base, but for the most part, the execution of rsync is so simple you shouldn't need a GUI to use it. We are going to set up a basic local rsync to show its basic function.
1. Go into your Documents folder. Create a test folder called "Data" and another folder called "DataBackup". For the purposes of this test, put a few files in the "Data" folder that aren't very big. A few image files or documents should be fine.
For this basic exercise, we are going to assume your hard drive is named "Macintosh HD" and your user name is "Joe". Please treat these as placeholders when reviewing the following examples, and put your own information in as required.
Note: Spaces in the command line are handled with a \ preceding the space. For example: Macintosh HD = Macintosh\ HD
2. Navigate to Applications>Utilities>Terminal.app.
3. Type the following command:
rsync -avx --progress /Volumes/Macintosh\ HD/Users/Joe/Documents/Data/ /Volumes/Macintosh\ HD/Users/Joe/Documents/DataBackup/
You should see something similar to this as on output:
building file list ...
3 files to consider
./
Picture 1.png
409559 100% 179.67MB/s 0:00:00 (xfer#1, to-check=1/3)
self_portrait_two_sides.jpg
4721398 100% 15.74MB/s 0:00:00 (xfer#2, to-check=0/3)
sent 5131825 bytes received 70 bytes 10263790.00 bytes/sec
total size is 5130957 speedup is 1.00
--
Now, what exactly did rsync do? It would appear, by virtue of the output, that two files were simply copied. Actually, rsync parsed the source and the target folders, and copied over files that didn't exist on the target.
The real power of rsync is evident in my next execution. For this example, I added two more files to my "Data" folder, and ran the exact same rsync command:
building file list ...
5 files to consider
./
Army.jpg
29993 100% 0.00kB/s 0:00:00 (xfer#1, to-check=3/5)
Manny375.jpg
56776 100% 54.15MB/s 0:00:00 (xfer#2, to-check=2/5)
sent 87057 bytes received 70 bytes 174254.00 bytes/sec
total size is 5217726 speedup is 59.89
Notice how the number of files "to consider" increased, but only two additional files were copied? Rsync knew the other files were the same as the originals in my "Data" directory, so it did not take the time to copy them. Now that you have an idea of rsync's mechanisms, here is a breakdown of the command and how it was used in this example:
rsync [options] [source] [target]
The options I used in my example are the defaults I use for most of my rsyncing. Here is a breakdown of the options I used and how the affect the outcome of the rsync:
-a - "archive" rsync, includes ownership info and extended attributes extremely useful for moving large volumes of data and keeping AD/OD/POSIX permissions intact
-v - "verbose" gives the user more information on the rsync display
-x - prevents crossing filesystem boundaries
--progress - combined with the "-v" option, gives you the best in-terminal display of rsync's progress
---
Now that you've exposed yourself to a very basic rsync, here are a few tips to make using this software easier to use. Not only will these tips help you speed up your rsync use, some of them will help with you feeling comfortable on the command line overall.
First, rsync has an option that allows a "dry run", so that you can test an rsync execution without actually moving any data. By default I always include this option in my first run of an archive to make sure my directories in order. This option is:
-n - "test run", "dry run", shows output but doesn't actually copy anything
Very often this just translates to adding an "n" to your option string, so instead of typing
-avx
you will type
-avxn
The second tip for efficient command line execution of rsync is how your "Tab" key operates when typing out directory names. If you think you need to type out "/Volumes/Macintosh\ HD/Documents/blahbalhbalbhalbahaba" every single time, you are wrong! Here are a few CLI shortcuts to help you avoid excessive typing:
up arrow / down arrow - cycle through commands previously typed
TAB - autofill known directory names, for example, if I type "/Volu" and hit TAB the CLI will auto-fill the rest and display "/Volumes".
Lastly, spaces in command line directory names can be a little frustrating. Make use of the backslash to indicate where a space exists:
directory name: /Volumes/MacHD/Users/Joe/Stuff I Like/
is actually
directory name: /Volumes/MacHD/Users/Joe/Stuff\ I\ Like/
Jason Schroeder
September 16, 2009 at 7:54pm
Hello!
First, Cron is a great utility but you have to be careful when using it. It will precisely automate scripts containing things like rsync. However, the one major danger you can really encounter with rsync is if the computer is scheduled to do them during a time you have a volume offline during a volume to volume rsync.
In the event the volume is offline, OS X will copy your data to /localHD/Volumes/target/. This means if your source data is larger than your internal hard drive, you will fill it up and your system will most likely crash. I suggest you craft or either have someone assist you in crafting a script that contains a check to see if the volume is actually there before running the rsync.
Second, regarding the "-E" error: permissions with filesystems are reaching a synergy where ACLs and POSIX can work in a strange random harmony in both Active Directory and Open Directory environments. It isn't perfect, and until it is, you will *always* see changes in the way variables behave in the application. As OS X has been tuned and improved over the years, certain functions have changed. Some may be added. If filesystems like ZFS become popular, you will probably even see added variables for stuff like snapshots.
Apple's biggest leaps in the way they stored resource forks came in 10.4, when special utilities or commands were phased out and made such that a regular copy brought everything with it.It is always advised to do tests and use the "-n" flag as much as possible. It will give you a clear idea of what to expect without actually committing.
Hope this helps!
fotmasta
August 27, 2009 at 7:03am
rsync version 2.6.0 protocol version 27
The -E option is in the man page, but throws an error when invoked.
'rsync: -E: unknown option'
JBracy
August 12, 2009 at 8:48am
The -E option includes Mac specific file data including resource forks. Not so imortant for jpeg files and such, but vital for other types of files. Do a "man rsync" in the terminal for more info.
allenwatson
May 27, 2009 at 9:07am
Another Unix utility, cron, schedules things to run at given intervals. And a free GUI utility, cronnix, will help you schedule cron runs for any command line or script. Just type the rsync command into cronnix and set the intervals for running it.
rickpdx
May 26, 2009 at 9:00am
This is great! How do I create something automated so I can run this once a week to back up my entire hard drive, or better yet, create this to run automatically once a week?
















