Monday, March 25, 2013

QT007: FTP Polling Considerations

Quick Tip #007: FTP Polling Considerations

FTP has been around since 1980 and still remains a popular protocol for integration.  The protocol is simple and widely adopted.  There are free open source implementations of the client and server for many platforms and the protocol is supported by major integrations platforms such as IBM Cast Iron, Dell Boomi, Informatica, and many others.

Active vs Passive FTP

There are two modes in which connections are established in FTP, Active and Passive.  In the original protocol which is now called Active mode, the client establishes a control connection to the server and uses the PORT command to tell the server which port to use when establishing a data connection to transfer files.  Such a protocol requires the client to be directly addressable by the server and therefore causes problems if the client is behind a firewall.  There are ways to use active FTP from behind a firewall, however, certain considerations if you are behind a firewall.  If the client has a public or an addressable IP address by the server, then you simply need to open a port to the client for the data connection and tell the client to pass that port when issuing the PORT command.  If the client has a private IP and your firewall uses Network Address Translation (NAT) the firewall may have a feature to enable passive FTP by proxying the PORT command and data connection.  If it does not, then the client and server need to support Passive mode.  In passive FTP, instead of issuing a PORT command, the client issues a PASSV command and initiates the data connection from the client side rather than the server side.

Security Concerns

Basic FTP does not use any form of encryption even for FTP passwords and therefore is not suitable when sensitive information is being transfered over public networks.  There are a couple of protocols that deal with this problem, both are in wide use today.

FTPS

FTPS is the secure implementation of the File Transfer Protocol.  It is an implementation of the entire set of FTP commands over a secure connection.  Again there are two modes, Implicit and Explicit.  Implicit FTPS is now deprecated, but as the name implies all traffic is sent over a secure SSL/TLS connection.  Implicit FTPS uses SSL/TLS to negotiate a secure connection with the client before any commands can be executed.  Explicit FTPS is the currently supported standard and allows one server to  provide both FTP and FTPS, by using the the AUTH SSL or AUTH TLS command the client can request a secure connection or use the standard AUTH command to request an unencrypted connection.  Obviously, users with access to sensitive data should be required to AUTH SSL or TLS and would be rejected if the do not.

sFTP

sFTP is the SSH File Transfer Protocol and is not strictly related to FTP, but implements a very similar command set and for most purposes from the users perspective is very similar.  This protocol uses the same security standards as SSH and is widely available on Unix platforms because most SSH Servers implement the protocol.

Avoiding Partial File Transfers

In FTP there is no standard way to lock a file to indicate that a transfer is in progress.  Many clients have unobtrusive ways of avoiding transferring a partial file.  IBM Cast Iron for example will check the file size before and after the transfer to see if it has changed.  If the file size changes during transfer then the client knows that the file was being transferred while it was being downloaded and it may not have received the entire file, in which case it will restart the transfer and repeat the process until it receives the entire file without the file size changing.  This system works however it is not foolproof, there really is no implicit way to know that the uploader is done with the file before it is downloaded.  There are however several easy ways to avoid this problem by having the uploader take specific action to indicate that the file is ready for download.  The first way is to rename the file after transfer.  If you are loading a file called my-file.csv, you can load the file as my-file.tmp and then rename the file once it has been loaded completely to my-file.csv.  This will ensure that the entire file is loaded before you try to download it.  Another solution is to use a control file.  A control file is a separate file that is loaded that indicates to the client which files are ready to be downloaded and may include some processing instructions such as what encoding was used for the file, etc.  A third option is to use a checksum file, by loading a cryptographic checksum file along with the file to be transfered you can ensure that not only is the file transfered completely, you can ensure that the file has not been corrupted.  

No comments:

Post a Comment