views:

908

answers:

2

I'm trying to automate downloading of some text files from a z/os PDS, using Python and ftplib.

Since the host files are EBCDIC, I can't simply use FTP.retrbinary().

FTP.retrlines(), when used with open(file,w).writelines as its callback, doesn't, of course, provide EOLs.

So, for starters, I've come up with this piece of code which "looks OK to me", but as I'm a relative Python noob, can anyone suggest a better approach? Obviously, to keep this question simple, this isn't the final, bells-and-whistles thing.

Many thanks.

#!python.exe
from ftplib import FTP

class xfile (file):
    def writelineswitheol(self, sequence):
        for s in sequence:
            self.write(s+"\r\n")

sess = FTP("zos.server.to.be", "myid", "mypassword")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
sess.cwd("'FOO.BAR.PDS'")
a = sess.nlst("RTB*")
for i in a:
    sess.retrlines("RETR "+i, xfile(i, 'w').writelineswitheol)
sess.quit()

Update: Python 3.0, platform is MingW under Windows XP.

z/os PDSs have a fixed record structure, rather than relying on line endings as record separators. However, the z/os FTP server, when transmitting in text mode, provides the record endings, which retrlines() strips off.

Closing update:

Here's my revised solution, which will be the basis for ongoing development (removing built-in passwords, for example):

import ftplib
import os
from sys import exc_info

sess = ftplib.FTP("undisclosed.server.com", "userid", "password")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
for dir in ["ASM", "ASML", "ASMM", "C", "CPP", "DLLA", "DLLC", "DLMC", "GEN", "HDR", "MAC"]:
    sess.cwd("'ZLTALM.PREP.%s'" % dir)
    try:
     filelist = sess.nlst()
    except ftplib.error_perm as x:
     if (x.args[0][:3] != '550'):
      raise
    else:
     try:
      os.mkdir(dir)
     except:
      continue
     for hostfile in filelist:
      lines = []
      sess.retrlines("RETR "+hostfile, lines.append)
      pcfile = open("%s/%s"% (dir,hostfile), 'w')
      for line in lines:
       pcfile.write(line+"\n")
      pcfile.close()
     print ("Done: " + dir)
sess.quit()

My thanks to both John and Vinay

+2  A: 

You should be able to download the file as a binary (using retrbinary) and use the codecs module to convert from EBCDIC to whatever output encoding you want. You should know the specific EBCDIC code page being used on the z/OS system (e.g. cp500). If the files are small, you could even do something like (for a conversion to UTF-8):

file = open(ebcdic_filename, "rb")
data = file.read()
converted = data.decode("cp500").encode("utf8")
file = open(utf8_filename, "wb")
file.write(converted)
file.close()

Update: If you need to use retrlines to get the lines and your lines are coming back in the correct encoding, your approach will not work, because the callback is called once for each line. So in the callback, sequence will be the line, and your for loop will write individual characters in the line to the output, each on its own line. So you probably want to do self.write(sequence + "\r\n") rather than the for loop. It still doesn' feel especially right to subclass file just to add this utility method, though - it probably needs to be in a different class in your bells-and-whistles version.

Vinay Sajip
Thanks, Vinay, that's an interesting idea, but how do I insert the newlines? (These are conventional zos PDSs, not OpenEdition files)
Brent.Longborough
How are the lines terminated on the host system, then, if not with EBCDIC line feeds?
Vinay Sajip
The host file system is record-based. It's either fixed-length, in which case all the records have the same length, or variable-length, where the length is stored in a descriptor field at the start of each record. FTP.retrlines() extracts the records correctly, but (correctly, I think) doesn't provide the newlines.
Brent.Longborough
@Vinay.Update: Oops, yes, I understand. When I get back to the mainframe, later this week, I'll give some ideas a try, and post back.
Brent.Longborough
+1  A: 

Your writelineswitheol method appends '\r\n' instead of '\n' and then writes the result to a file opened in text mode. The effect, no matter what platform you are running on, will be an unwanted '\r'. Just append '\n' and you will get the appropriate line ending.

Proper error handling should not be relegated to a "bells and whistles" version. You should set up your callback so that your file open() is in a try/except and retains a reference to the output file handle, your write call is in a try/except, and you have a callback_obj.close() method which you use when retrlines() returns to explicitly file_handle.close() (in a try/except) -- that way you get explict error handling e.g. messages "can't (open|write to|close) file X because Y" AND you save having to think about when your files are going to be implicitly closed and whether you risk running out of file handles.

Python 3.x ftplib.FTP.retrlines() should give you str objects which are in effect Unicode strings, and you will need to encode them before you write them -- unless the default encoding is latin1 which would be rather unusual for a Windows box. You should have test files with (1) all possible 256 bytes (2) all bytes that are valid in the expected EBCDIC codepage.

[a few "sanitation" remarks]

  1. You should consider upgrading your Python from 3.0 (a "proof of concept" release) to 3.1.

  2. To facilitate better understanding of your code, use "i" as an identifier only as a sequence index and only if you irredeemably acquired the habit from FORTRAN 3 or more decades ago :-)

  3. Two of the problems discovered so far (appending line terminator to each character, wrong line terminator) would have shown up the first time you tested it.

John Machin
John, thank you. Please be assured that I have taken your just criticisms on board.
Brent.Longborough