You've come to this page because you've asserted something similar to the following on Wikipedia:
Most common web browsers can retrieve files hosted on FTP servers
This is the Frequently Given Answer to that claim, which turns out to be based solely upon a one-sentence off-hand and unspecific remark in a book.
In fact, many common WWW browsers either have no support for FTP at all or have one or more fairly basic and egregious problems relating to FTP that mean that they cannot retrieve files hosted on FTP servers.
SIZE
is mandatory.
Google Chrome thinks that the SIZE
verb is mandatory, and performs a SIZE /
(or whatever path it wants in place of /
) immediately upon login.
If an FTP server responds 502 ("command not implemented") to that verb, Google Chrome quits the entire FTP session and fails to retrieve the URL.
In fact SIZE
is optional (per the IANA FTP command registry), and 502 is a perfectly legitimate response.
Not only is SIZE
defined by an RFC (3659) whose very title is "Extensions to FTP", but that same RFC explains (in § 4.3) how an FTP client uses the FEAT
command to determine that the SIZE
extension is a supported feature in the first place.
Google Chrome does not even issue the FEAT
command.
(The RFC pre-dates the existence of Google Chrome by a year and a half.)
As Daniel J. Bernstein observed, and as noted in RFC 1123, the FTP specification failed to adequately describe the 227 response to the PASV
verb, even though it was supposed to be machine-readable.
Worse, it gave one example response in § 4.2.1 in one form and another example response in § 5.2 in another form.
Bernstein's original FTP server from his publicfile package follows Bernstein's suggestion of a simplified § 5.2 form that contains only the IP address and port numbers.
Bernstein's suggestion incorporates one bodge, an extra =
character, to work around a bug in one of Mozilla Firefox's predecessors.
Mozilla Firefox adds another bug, unfortunately.
Mozilla Firefox only accepts the § 4.2.1 form, in practice. In particular, it looks for an initial comma followed by 6 comma-separated numbers, then it looks for brackets surrounding all of the numbers; and thus fails if talking to a Bernstein FTP server. This is in violation of RFC 1123 § 4.1.2.6 which states that
an FTP client cannot assume that the parentheses […] will be present [… and] must scan the reply for the first digit of the host and port numbers
Firefox's failure mode, moreover, is to present the response from the preceding command verb to the user in an error dialogue box.
Usually, because the last command that Firefox will have issued in these circumstances is a TYPE I
command, this results in the user seeing a very confusing "200 Okay, using binary." error message dialogue box when attempting to access FTP sites.
And of course, Firefox aborts the entire FTP session and fails to retrieve the URL.
LIST
being vulnerable to a command-injection attack.
RFC 959 designates the optional argument to the LIST
command to be a pathname of either a file or a directory.
Google Chrome sends a LIST -l
command to retrieve the contents of a directory.
What Google Chrome is relying upon is the FTP server being vulnerable to a command-injection attack.
It is assuming that the FTP server runs a POSIX-compatible ls
command to implement the LIST
FTP verb, and it is assuming that everything in the FTP verb is passed as-is to the ls
command, without any sanitization to prevent the FTP client from injecting things of its own devising into the ls
command being executed.
In other words, it is using a command-injection vulnerability to manipulate how the FTP server runs an external command on the server machine.
Several FTP servers do not rely upon an external ls
command in the first place; and even of those that do, inserting a --
option before any attacker-supplied arguments read from the network is a best common practice.
In such cases, Google Chrome ends up asking for a listing of a file or directory with the pathname "-l", rather than tricking an ls
command via a command-injection vulnerability to produce a different kind of listing.
And of course, when it receives a quite proper 550 response saying that there is no such pathname as "-l" in the current directory, Google Chrome quits the entire FTP session, fails to retrieve the URL, and announces to the end-user that the FTP server is down.
Mozilla Firefox understands the Easily Parsed LIST Format, which has been around since 1996, and is in fairly widespread use nowadays.
RFC 3659's MLST
and MLSD
mechanisms are close to non-existent in the real world outwith the RFC, in contrast.
Google Chrome does not understand either one, in any case.
Google Chrome instead tries to work out what FTP server program it is talking to, in order to make guesses as to the format of the results of the LIST
command.
(Ironically, it doesn't guess at the FTP server program when it blithely assumes that all FTP server programs have a command-injection vulnerability.)
It wrongly guesses that all of the FTP server softwares for an operating system will employ exactly the same format; and thus bases its guesses on the result of the SYST
command.
This is patently not the case, as there is a wide range of FTP server softwares for many operating systems; including, not least, some that are ports from one operating system to another and that endeavour to take their LIST
response semantics with them.
Worse still, it wrongly guesses what the results of the SYST
command mean, despite the fact that there is an IANA registry listing them; that is even explicitly referenced, in its prior incarnation as RFC 943, by RFC 959 § 4.1.3 .
The FEAT
command would have been the right way to query this feature of an FTP server software; had that been around in 1996.
Nonetheless, it is a good idea to advertise EPLF as an "EPLF
" feature, just in case the world decides finally to eliminate the guessing and make EPLF a named mechanism that servers can say to clients that they speak.
Ironically, a better guessing system, with the world still valiantly clinging on to the idea of having guessing systems for this (more than 20 years after EPLF was invented), has no need to know the operating system (and thus implicitly copes with FTP server softwares that have been ported across operating systems) or use the FEAT
command.
A fairly reasonable guess as to the format can be made from nearly just the first character of each response line.
Anything beginning with +
is an EPLF line;
anything beginning with -
, d
, l
, b
, c
, p
, or s
is something attempting to resemble the output of the UNIX ls -l
command;
anything beginning with a digit from 0
to 9
is something attempting to resemble the output of the Microsoft/IBM command interpreter's dir
command;
and
anything with a ;
in the first "word" is something attempting to resemble the output of MultiNet for VMS.
This guessing system was invented almost a decade before the existence of Google Chrome.
Daniel J. Bernstein's ftpparse library is an implementation of it, for starters.
Instead, Google Chrome displays blank directory listings.
RFC 1738 explains Common Internet Scheme (CIS) URLs, and in § 3.2.2 explains how the path part of an ftp:
scheme URL is translated.
This translates from the CIS URL worldview, where the idea of a hierarchical path exists, to the Network Virtual File System concept of FTP, where that is not necessarily the case.
Many people over the years, such as Jukka Korpela here for example, have repeated the RFC's explanation in their explanations of how WWW browsers do things.
However, none of these people, or the RFC that they are parroting, are actually right.
What Google Chrome, Mozilla Firefox, and Internet Explorer actually do is blithely treat every FTP server (apart from a special dispensation from Google Chrome for ones that run on VMS) as if it has a POSIX-style hierarchical file system, irrespective of whether that server states support for the TVFS
feature (per RFC 3659).
Indeed Google Chrome and Internet Explorer do not even issue the FEAT
command to find out about the TVFS
feature.
(Mozilla Firefox at least issues the command, although it just bulldozes over any FTP server that does not say that it is TVFS
.)
FTP URLs are actually handled by WWW browsers like this:
ftp://example.com/a/b/c/d/e
Google Chrome issues one giant CWD /a/b/c/d/e
command followed by LIST -l
.
ftp://example.com/a/b/c/d/e;type=a
Google Chrome issues one giant CWD /a/b/c/d
command followed by TYPE A
and RETR e
.
ftp://example.com/a/b/c/d/e;type=i
Google Chrome issues one giant CWD /a/b/c/d
command followed by TYPE I
and RETR e
.
In all cases, it assumes that the CWD
command will treat /
as a pathname separator in a hierarchical directory naming scheme.
Additionally, in the directory case it assumes that FTP server programs have a command-injection vulnerability.
The RFC also explains that WWW browsers decode filename characters, which cannot be used as-is in the path parts of URLs, from an encoded form before passing them to FTP.
Google Chrome does not do this, and an FTP URL ftp://example.com/%2Fetc/motd
erroneously becomes CWD /%2Fetc/motd
.
To add insult to injury, if the RETR
command fails, even if only because of a mis-spelling, Google Chrome quits the FTP session and reports back to the end-user that the FTP server is down.
ftp://example.com/a/b/c/d/e
Internet Explorer issues one giant CWD /a/b/c/d/e
command followed by LIST
.
ftp://example.com/a/b/c/d/e;type=a
Internet Explorer issues TYPE A
followed by one giant RETR /a/b/c/d/e
command, and if that fails it issues one giant CWD /a/b/c/d/e
command followed by LIST
.
ftp://example.com/a/b/c/d/e;type=i
Internet Explorer issues TYPE I
followed by one giant RETR /a/b/c/d/e
command, and if that fails it issues one giant CWD /a/b/c/d/e
command followed by LIST
.
In all cases, it assumes that the CWD
and RETR
commands will treat /
as a pathname separator in a hierarchical directory naming scheme.
Internet Explorer does, at least, perform character decoding of path components correctly.
ftp://example.com/a/b/c/d/e
,
ftp://example.com/a/b/c/d/e;type=d
,
ftp://example.com/a/b/c/d/e;type=a
,
and
ftp://example.com/a/b/c/d/e;type=i
Mozilla Firefox issues one giant RETR /a/b/c/d/e
command, and if that fails it issues one giant CWD /a/b/c/d/e
command followed by LIST
.
It assumes that the RETR
and CWD
commands will treat /
as a pathname separator in a hierarchical directory naming scheme.
It simply ignores the type code entirely; and contrary to the RFC does not issue a TYPE
command to match it.
Indeed, it always sets the type to "binary" as part of logging in to the FTP server.
Mozilla Firefox does, at least, perform character decoding of path components correctly.
ftp://example.com/a/b/c/d/e
Opera issues one giant CWD /a/b/c/d/e
command followed by LIST -l
.
ftp://example.com/a/b/c/d/e;type=a
Opera issues TYPE A
followed by one giant RETR /a/b/c/d/e
command.
ftp://example.com/a/b/c/d/e;type=i
Opera issues TYPE I
followed by one giant RETR /a/b/c/d/e
command.
In all cases, it assumes that the CWD
and RETR
commands will treat /
as a pathname separator in a hierarchical directory naming scheme.
Additionally, in the directory case it assumes that FTP server programs have a command-injection vulnerability.
The RFC also explains that WWW browsers decode filename characters, which cannot be used as-is in the path parts of URLs, from an encoded form before passing them to FTP.
Opera does not do this, and an FTP URL ftp://example.com/%2Fetc/motd
erroneously becomes CWD /%2Fetc/motd
.
To add insult to injury, if the RETR
command fails, even if only because of a mis-spelling, Opera quits the FTP session and reports back to the end-user that the FTP server is down.
In FTP, the way to move up a directory is the CDUP
command.
Mozilla Firefox never uses this, since it always issues CWD
commands with absolute pathnames starting from the root.
This is of course the root of the FTP server's public file area.
A number of FTP servers run in a "changed root" environment such that the server process can see no part of the server machine's filesystem that lives above that root.
Some do not, however, and rely upon a simple string prefixing scheme to turn CWD /
into a directory change request for (say) /var/ftproot/
under the covers.
Mozilla Firefox tries to escape a server's root directory restrictions. Every directory listing that it presents has a "Up to higher level directory" hyperlink. This includes the directory listing of the root directory. In subordinate directory listings, the "Up to higher level" hyperlink simply denotes a URL with the last path component removed. In the root directory listing, however, the "Up to higher level" hyperlink denotes a URL with a component added.
This component is ../
.
It causes Mozilla Firefox to attempt a CWD /../
command.
This is, firstly, not the way to traverse to a parent directory in FTP (as aforementioned).
Secondly, it is an attempt to access stuff outwith the FTP server's public file area.
The FTP servers that do not change root end up changing directory to (continuing the previous example) /var/ftproot/../
.