Lingfo – Python “delimited” extension


         This is preliminary documentation, provided for participants in the discussion on a standard CSV module on news:comp.lang.python.

         Acknowledgement: I have had the benefit of perusing the source code of Dave Cole’s “csv” module; see http://www.object-craft.com.au; in particular the manner of handling newlines embedded in quoted fields is “borrowed” from Dave’s work.

         No downloads yet; it needs some tidying before hitting the presses.

 

John Machin

2002-02-11

mailto:sjmachin@lexicon.net

 

 

NAME

delimited - Provides functions for packing & unpacking fields of "delimited" or "CSV" or "TSV" files.

 

CLASSES

exceptions.Exception

AuthorError [Assertion failure: I stuffed up]

CallerError [You stuffed up]

DataError [Somebody stuffed up]

 

FUNCTIONS

 

unpacker(...)

Return a function which unpacks delimited fields form a string into a list.

The data may contain "embedded newlines" i.e. inside a quoted field.

Otherwise a newline may occur optionally at the end of the input string.

The contract is that you supply a line at a time, so a newline elsewhere

is a DataError.

 

unpacking_func = delimited.unpacker(options...)

Options (keyword arguments) are:

delimiter=','

front_quote='"'

back_quote= ... value of front_quote

quote='"' (synonym for front_quote)

quote_mode=2 (quote back_quotes by doubling them)

-- or 3 (quote quotes with alt_quote char)

-- or 1 (only quote delimiters)

-- or 0 (no quoting)

alt_quote="'"

[All of the above are common options]

ignore_leading_space=0 (if 1, unpacking_func will skip over

leading spaces at start of each field)

Using the result (example):

for buff in open("somefile"):

alist = unpacking_func(buff)

if alist is None:

# newline inside quoted field, need more input

continue

do_something_with(alist)

 

importer(...)

Return an iterator which imports delimited fields from a sequence of

strings into lists.

 

output_iterator = delimited.importer(input_iterable, options...)

The input_iterable must be a suitable arg for the iter() builtin and

must deliver strings. The newlines may appear wherever you like. The

input parcels may be single bytes, lines, blocks (from .read(BUFSIZ)),

a whole file (from .read()).

Options (keyword arguments) are:

Common options, plus:

ignore_leading_space=0 (if 1, will skip over

leading spaces at start of each field)

allow_embedded_newlines=1 (if 0, will raise

an exception if newline found inside quotes)

Using the result (example):

myiter = delimited.importer(open("foo.csv"), ignore_leading_spaces=1)

for field_list in myiter:

do something with field_list

 

packer(...)

Return a function packing data from a sequence into delimited

fields in a string.

 

packing_func = delimited.packer(options...)

Options (keyword arguments) are:

Common options, plus:

strict_pack=1 (if 0, packing_func will not raise exception

when packing data that will not be unpackable)

forced_quoting=0 (if 1, wrap quotes even around fields that

don't contain delimiters or quotes)

append_newline=0 (if 1, packing_func will append

a newline character to its result)

Using the result:

string = packing_func(sequence)

 

restricted_unpacker(...)

Return a function which unpacks data from delimited fields into a list.

The data may not contain embedded newlines.

 

unpacking_func = delimited.restricted_unpacker(options...)

Options (keyword arguments) are:

Common options, plus:

ignore_leading_space=0 (if 1, unpacking_func will skip over

leading spaces at start of each field)

Using the result:

list = unpacking_func(string)

 

All of the above "functions/iterators" are really callable objects which maintain a

considerable amount of state. This can be inspected by accessing the following read-only

attributes:

 

(a) Your input options: delimiter, front_quote, back_quote, quote_mode, alt_quote,

ignore_leading_space, allow_embedded_newlines,

strict_pack, forced_quoting, append_newline

 

(b) Other items of varying utility:

 

stopping_status: 0 = not stopping, 1 = must raise StopIteration on next call, 2 = stopped

error_occurred: 0 or non-zero

input_iter_count: number of times we've called your .next() method

input_newline_count: like it says, but see the next which is more useful.

input_row_number: Obtained by counting newlines. First row is 0. Negative (-1) means no input yet.

input_char_column: 0 is first char, -1 means the newline.

[The above two are intended for use in detailed error reporting by the caller]

output_count

embedded_newline_count

error_count

__name__: "pack", "unpack" etc (to identify the object in testing & timing scripts)

 

DATA

QUOTE_ALT = 3

QUOTE_DOUBLE = 2

QUOTE_NONE = 0

QUOTE_SINGLE = 1

 

 


Lingfo Pty Ltd - ABN 97 084 236 199