Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Why do people still use FTP?

2009

Every now and again, I stumble upon a website that uses FTP to power its downloads. I really can't understand why, in this day and age, people are still using FTP to make data available.

Firstly, FTP is a particularly unfriendly protocol when it comes to accessing data. It opens up parallel connections; a 'command' channel over TCP, and a 'data' channel over UDP. However, whilst the ports are well defined (port 20 for command and 21 for data), the actual incoming ports use a randomly-assigned port that's negotiated as part of the FTP channel. That means firewalls (which are a necessity in this day and age) either end up blocking FTP or have some serious deep packet inspection to try and open up the data coming back in again. In fact, the only reason that FTP works via NAT is that the NAT device 'knows' about the way FTP works, and performs a juggling act to hook it together.

FTP is an anachronistic protocol that has long since been superseded by HTTP. For a start, the concept of password-protected FTP sites (where the password is transmitted in clear-text) is pointless in this day and age; and even now, most sites/browsers use an anonymous login for downloading files. The anonymous login - where you transmit your e-mail address as the password - is also something that is seen as less worthwhile, especially given the volume of spam that's seen these days, to the extent where browsers fill in a dummy value (mozilla@example.com) instead of providing a real e-mail address.

Although extensions exist (like SFTP*, FTP over SSH, FTP over Socks over SSH), all of these are trying to work around the obvious hole that occurs when the data isn't protected.* SFTP isn't based on FTP, but rather SSH file transfer protocol - thanks to Brett for highlighting this in the comments.

When HTTP was first released, it was able to compete with FTP on almost every level. The only thing that HTTP/1.0 didn't support is the concept of resumable downloads. A reusable download occurs when a client is disconnected through the process, and needs to start again. Although optional, most FTP servers and clients support this feature. It wasn't until HTTP/1.1, and the associated Accept-Ranges and Content-Range headers, which allowed ranges of bytes to be defined, that resumable HTTP transfers were possible. Although technically more flexible than FTP (allowing arbitrary ranges of data to be acquired rather than just a starting point through to the end), the practical effect is that it's possible to resume the download of files if the client loses the connection to the server. (Some 'fast' HTTP proxies work by initiating multiple requests for non-overlapping data ranges instead of one request to handle it all.) It would even be possible to support a bit-torrent like 'chunking' of data over HTTP ranges in the same way that HTTP servers work today (and you could even include HTTP redirects for others who might have a different chunk ...)

HTTP also has many other advantages over FTP for pure downloads. FTP has always been a bit bipolar when it sees the world as either 'Binary' or 'ASCII', not the least of which is that the world has moved on from ASCII to Unicode. But realistically, a client and server know nothing of the file type that's being sent back, and it's up to the client to guess the type based on the extension. HTTP on the other hand can positively identify what the file type is as part of the response (and doesn't have to be a file on the hard disk either).

Further, HTTP provides a number of additional benefits - automatic compression of data, finding out about the type of a file without downloading it, proxy support - not to mention the ability to run over SSL with HTTPS - which aren't possible in vanilla FTP at all.

So, why do companies still insist on setting up and making data available via FTP? Well, sometimes the company has been around for a while (like before Windows 3.1) and they may just have an antiquated system that they can't (or won't) update. But more often than not, it's ignorance; the assumption that HTTP is used for the web and FTP is used for downloads. Many of these servers in fact support HTTP as well - just changing the protocol from ftp: to http: is enough to convince the server to use this century's mechanisms.

FTP is of course used for more than just download. One aspect is the ability to navigate through a repository, and of course to be able to upload files as well. But both of these uses are far less often used than the data acquisition across the web. There are also standardisation problems - the format of the output generated by 'dir' or 'ls' commands can vary, and clients have to screen-scrape the text to present a list of files.

All of this is possible with HTTP - WebDAV is a set of additional HTTP methods; and so, works within the same framework, proxying, encryption, type decoding, partial range satisfying etc. Although WebDAV can do a lot more (versioning, in particular) it's possible to use WebDAV to connect to a remote server, get the contents of the files and then download them. And the nice thing about HTTP is that should another method be more appropriate (or be located on a different server) the protocol has built in redirection to allow the server to inform the client to go somewhere else.

I suspect one of the reasons WebDAV isn't as widely used as FTP is of historical inertia. That and the fact that command-line programs for WebDAV aren't that common (when in fact, most OSs have the ability to mount WebDAV drives far more efficiently than FTP clients can navigate a remote hierarchy). It's also less likely that you can sell CuteWebDAV or FastWebDAV that bring GUIs to the FTP experience, since you can mount WebDAV drives and traverse them with your favourite file manager commands.

Still, it won't be long before there are command line tools and WebDAV GUI clients, and when that happens perhaps FTP can finally be laid to rest.