Lingfo – Python “delimited” extension


·         This is preliminary documentation, provided for participants in the discussion on a standard CSV module on news:comp.lang.python.

·         Acknowledgement: I have had the benefit of perusing the source code of Dave Cole’s “csv” module; see http://www.object-craft.com.au; in particular the manner of handling newlines embedded in quoted fields is “borrowed” from Dave’s work.

·         No downloads yet; it needs some tidying before hitting the presses.

 

John Machin

2002-02-11

mailto:sjmachin@lexicon.net

 

 

NAME

    delimited - Provides functions for packing & unpacking fields of "delimited" or "CSV" or "TSV" files.

 

CLASSES

    exceptions.Exception

        AuthorError [Assertion failure: I stuffed up]

        CallerError [You stuffed up]

        DataError   [Somebody stuffed up]

 

FUNCTIONS

 

    unpacker(...)

        Return a function which unpacks delimited fields form a string into a list.

        The data may contain "embedded newlines" i.e. inside a quoted field.

        Otherwise a newline may occur optionally at the end of the input string.

        The contract is that you supply a line at a time, so a newline elsewhere

        is a DataError.

 

        unpacking_func = delimited.unpacker(options...)

        Options (keyword arguments) are:

           delimiter=','

           front_quote='"'

           back_quote= ... value of front_quote

           quote='"' (synonym for front_quote)

           quote_mode=2 (quote back_quotes by doubling them)

           -- or 3 (quote quotes with alt_quote char)

           -- or 1 (only quote delimiters)

           -- or 0 (no quoting)

           alt_quote="'"

         [All of the above are common options]

           ignore_leading_space=0 (if 1, unpacking_func will skip over

              leading spaces at start of each field)

        Using the result (example):

           for buff in open("somefile"):

              alist = unpacking_func(buff)

              if alist is None:

               # newline inside quoted field, need more input

                 continue

              do_something_with(alist)

 

    importer(...)

        Return an iterator which imports delimited fields from a sequence of

        strings into lists.

 

        output_iterator = delimited.importer(input_iterable, options...)

        The input_iterable must be a suitable arg for the iter() builtin and

        must deliver strings. The newlines may appear wherever you like. The

        input parcels may be single bytes, lines, blocks (from .read(BUFSIZ)),

        a whole file (from .read()).

        Options (keyword arguments) are:

           Common options, plus:

           ignore_leading_space=0 (if 1, will skip over

              leading spaces at start of each field)

           allow_embedded_newlines=1 (if 0, will raise

              an exception if newline found inside quotes)

        Using the result (example):

         myiter = delimited.importer(open("foo.csv"), ignore_leading_spaces=1)

           for field_list in myiter:

             do something with field_list

 

    packer(...)

        Return a function packing data from a sequence into delimited

        fields in a string.

 

        packing_func = delimited.packer(options...)

        Options (keyword arguments) are:

           Common options, plus:

           strict_pack=1 (if 0, packing_func will not raise exception

              when packing data that will not be unpackable)

           forced_quoting=0 (if 1, wrap quotes even around fields that

              don't contain delimiters or quotes)

           append_newline=0 (if 1, packing_func will append

              a newline character to its result)

        Using the result:

           string = packing_func(sequence)

 

    restricted_unpacker(...)

        Return a function which unpacks data from delimited fields into a list.

        The data may not contain embedded newlines.

 

        unpacking_func = delimited.restricted_unpacker(options...)

        Options (keyword arguments) are:

           Common options, plus:

           ignore_leading_space=0 (if 1, unpacking_func will skip over

              leading spaces at start of each field)

        Using the result:

           list = unpacking_func(string)

 

All of the above "functions/iterators" are really callable objects which maintain a

considerable amount of state. This can be inspected by accessing the following read-only

attributes:

 

(a) Your input options: delimiter, front_quote, back_quote, quote_mode, alt_quote,

    ignore_leading_space, allow_embedded_newlines,

    strict_pack, forced_quoting, append_newline

 

(b) Other items of varying utility:

 

stopping_status: 0 = not stopping, 1 = must raise StopIteration on next call, 2 = stopped

error_occurred: 0 or non-zero

input_iter_count: number of times we've called your .next() method

input_newline_count: like it says, but see the next which is more useful.

input_row_number: Obtained by counting newlines. First row is 0. Negative (-1) means no input yet.

input_char_column: 0 is first char, -1 means the newline.

[The above two are intended for use in detailed error reporting by the caller]

output_count

embedded_newline_count

error_count

__name__: "pack", "unpack" etc (to identify the object in testing & timing scripts)

 

DATA

    QUOTE_ALT = 3

    QUOTE_DOUBLE = 2

    QUOTE_NONE = 0

    QUOTE_SINGLE = 1

 

 


Lingfo Pty Ltd - ABN 97 084 236 199