Transferring files to and from the clusters¶
This section is about transferring data between your computer and a cluster. To transfer data between cluster, see the section trsf in Using the common filesystem
Those examples are only for Linux and MacOs computers or if you
are using WSL. The commands are executed in your computer.
Your SSH client configuration file must be correct as explained in the
Connecting from a UNIX/Linux or MacOS computer
section. Replace cecicluster
by the Host
name of the cluster defined in
your .ssh/config
file.
Copying a file or directory¶
The simplest way to copy a file to or from a cluster is to use the
scp
command.
scp ./file.txt cecicluster:destination/path/
Copying it back is done with
scp cecicluster:path/to/file.txt .
If you want to copy a directory and its content, use the -r
option,
just like with cp
.
scp -r cecicluster:path/to/folder .
Transferring a large number of small files¶
Transferring a lot of small files will take a very long time with scp
because of the overhead of copying every file individually. In such
case, using the tar command will
reduce the transfer time significantly. You can first create a tar
archive, then scp
it as a single file and then ‘untar’ the file. But
the most efficient way is to do all three operations in one go, without
creating an intermediate file, like this:
tar cz ./source_dir | ssh cecicluster 'tar xvz -C destination/path'
This will create a large file containing the small files and remove the overhead of dealing with many small files.
Copying it back from cluster to your computer is done with:
ssh cecicluster 'tar -C source/path -cz source_dir' | tar -xz
Transferring large files¶
When transferring large files, it is often interesting to use the -C
option of scp
to first compress the file, send it, and then
decompress it. Using it simply with
scp -C ./large_file.txt cecicluster:destination/path/
Resuming interrupted transfers¶
If, for any reason, a transfer is interrupted, you might end up with part of
the files being transferred. Rather than restarting the transfer from scratch,
you should then use the rsync command.
The rsync
command will compare the source and destination directories and
only transfer what needs to be transferred: missing files, modified files, etc.
Use it this way (assuming again that your SSH client is properly configured):
rsync -va ./source_dir cecicluster:destination/path
Make sure not to leave trailing slashes in your path names (e.g. NOT
destination/path/
) as you might end up with a full copy of the
directory inside the existing, partial, one. Use the -n
(dry-run)
option of rsync
to check what will happen before you run the actual
command.
If one large file is left half-transferred, you can resume it using the
--partial
.
Transferring code¶
Source code is a specific type of data and should be treated as such. The best way to transfer code from one computer to another is to host the code in a source code repository using a versioning system such as git (more common) or mercurial (easier to use) and clone the repository from your laptop to the cluster.
Synchronising with a local directory¶
If you want to keep two directories (one on your laptop, and one on the
cluster) in sync, you can do that with rsync
using its --delete
option. But that is only one-way so you need to really think in what
direction you do it, and it does not scale beyond two synchronized
directories.
A real option is to use Unison, a piece of software that can detect and handle conflicts (incompatible changes made to the same file in the two directories that must be kept in sync.)