On November 16, an informal meeting was held at UCLA to discuss prospects for a network standard Remote Job Service (RJS) protocol. In attendance were representatives of UCLA-CCN and UCSB, the network's only current RJS sites, as well as UCLA-NMC and the BBN network project. A report on that discussion will be published as an RFCshortly and will not be discussed here. In thinking about the use of the proposed File Transfer Protocol (FTP) (RFC#265) for RJS, however, we came to the conclusion that a "restart" procedure might be an extremely useful addition to the FTP.
Many, perhaps most, of the individuals involved in protocol design thus far are oriented toward the use of short date transmissions over the network the transmission lengths that have been considered "typical" are a few characters, a print line, or perhaps as much as a page of text. The eXPerience of the current RJS sites, however, is that single files are commonly much longer, for example a line-printer output file of 400 pages would not seem unusual to these sites. Further, one might reasonably predict that network use of Remote Job Services will be preselected with a tendency toward large jobs (although large jobs do not necessarily imply large I/O files) and that the addition of other batch service sites (ILLIAC, UCSD) will increase the number of long-file transfers. In light of this kind of experience/prediction, it would seem that the FTP should include (perhaps as an option which interactive-user oriented systems could ignore) a method of "restarting" a long file transfer if some element in the transmission path fails after a large volume of data has been transferred.
The critical element in a "restart" procedure is the ability to arrange agreement between both ends of the transmission path as to where, exactly, the retransmission should begin. There are two potential candidates for marking possible restart locations already built into the proposed Data Transfer Protocol (RFC#264) which underlies the FTP; these are:
a) The "information separators" (transaction type B4) which are available in both "transparent block" transfers and "descriptor and counts" transfers, and
b) The "sequence numbers" which can be used with the "descriptor and counts" transfer mode.
After some discussion, we seemed to agree that the "information separators" (as they would be used in "transparent block" transfer mode, i.e., without "sequence numbers") were unlikely to serve as UNAMBIGUOUS restart location marker, and therefore we suggest the use of "sequence numbers" as markers. We were aware of the fact that this choice might exclude TIPs and other Hosts which do not use sequence numbering from the type of recovery under discussion; we believe, however, that our suggestion eliminates at least some of this problem.
Imagine that some site, which we will call the "user site" or "user", has initiated a connection from its own file transfer process to a file transfer process at some other site, which we will call the "server site" or "server". After the appropriate exchanges of information, a file transfer (using the File Transfer Protocol) begins over the path between these two sites. After some information is transferred, the path between the user and server is broken. At some later time the user initiates a new connection between the file transfer processes at user and server, establishes relevant access privileges, and wishes to resume the transmission which was in progress when the path was broken. First we describe four new op- codes for the File Transfer Protocol:
Hex Operation --- --------- 10 Append at sequence number
This command is essentially the same as any of the "Store" or "Append..." commands except that a 16-bit sequence number immediately follows the op-code (before the pathname).
11 Retrieve at sequence number
This is the same as the "Retrieve"command except that a 16-bit sequence number immediately follows the op-code (before the pathname).
12 Resume Retrieve
To be used when the user wishes the server to choose the sequence number; in other respects this is identical to the "Retrieve" command.
13 Use the sequence number
This command contains only the op-code and a 16-bit sequence number. It is intended as a denial of the ability to locate the sequence number given in an "Append at sequence number" or "Retrieve at sequence number" command and, simultaneously, to suggest another number which can be located.
There are several possible cases which are shown below. The user site is always presumed to be the site at the left of the page.
A. User site sending at time path is broken: / / Append at sequence number -------------------------------------------> Acknowledge <------------------------------------------ Data ------------------------------------------->
/ The server site agrees to resume at the user-chosen point. / The first data transaction is numbered with the chosen / sequence number.
/ Append at sequence number / -------------------------------------------> Unsuccessful Terminate <-------------------------------------------
/ The server site will never permit restart for some reason / (seq. #s were ignored or not used initially, seq #s were not / stored with the file, the file was lost when the path was /broken, etc.)
/ Append at sequence number / -------------------------------------------> Use this sequence number <------------------------------------------- / Data / ---------------------------------> / The user site agrees to use the server-chosen number / and the first data transaction is numbered with the / chosen number.
or
/ Unsuccessful Terminate / / -------------------------------> / / The user site cannot restart at this number for / / some reason.
B. User site receiving at time path is broken, and the user site does not particularly care about the exact sequence number (for example, if the user site is sending the file to a printer, some duplicate pages are probably acceptable and the user site would probably not want to remember sequence numbers).
/ Resume Retrieve / ------------------------------------> Data <------------------------------------ The server picks some point and begins transmission at that point. If sequence numbers were used during the original transmission, then the first transaction of this transmission must exactly match (including sequence number) some transaction of the original / transmission.
/ Resume Retrieve / ------------------------------------> Unsuccessful Terminate / <------------------------------------ / The server site is unable or unwilling to restart the / transmission.
C. User site receiving at time path is broken, and does care about the value of the sequence number.
/ Retrieve at sequence number / ----------------------------------------> Data <---------------------------------------- / The server agrees to resume at the user-chosen / point. The first data transaction is numbered / with the chosen sequence number.
/ Retrieve at sequence number / ----------------------------------------> Unsuccessful Terminate <----------------------------------------
The server site will never permit restart for some reason. Retrieve at sequence number ----------------------------------------> Use this sequence number <----------------------------------------- / Acknowledge / ----------------------------> Data <---------------------------- / The user site agrees to use the / server-chosen number. The first data / transaction is numbered with the chosen / number.
/ / The server cannot use the user-chosen / / number and the user cannot use the / / server-chosen number. Therefore the attempt / / to restart must be abandoned.
Some sites (e.g., UCLA-CCN) have agreed (in principle, at least) to implement these commands and, more important, to store sequence numbers (with files being transferred) on a non-volatile storage medium so that restarts may be effected. This will be done, of course, only if this, or some similar, proposal is accepted by the NWG as part of the File Transfer Protocol. We hope interested parties will communicate comments or counter-proposals to the FTP committee and/or publish their ideas in the RFCseries.
AAM/jm
[This RFCwas put into machine readable form for entry] [into the online RFCarchives by Kelly Tardif, Viaginie 10/99]