· This is preliminary documentation, provided for participants in the discussion on a standard CSV module on news:comp.lang.python.
· Acknowledgement: I have had the benefit of perusing the source code of Dave Cole’s “csv” module; see http://www.object-craft.com.au; in particular the manner of handling newlines embedded in quoted fields is “borrowed” from Dave’s work.
· No downloads yet; it needs some tidying before hitting the presses.
John Machin
2002-02-11
mailto:sjmachin@lexicon.net
NAME
delimited - Provides functions for packing & unpacking
fields of "delimited" or "CSV" or "TSV" files.
CLASSES
exceptions.Exception
AuthorError [Assertion failure: I stuffed up]
CallerError [You stuffed up]
DataError [Somebody
stuffed up]
FUNCTIONS
unpacker(...)
Return a
function which unpacks delimited fields form a string into a list.
The data
may contain "embedded newlines" i.e. inside
a quoted field.
Otherwise a
newline may occur optionally at the end of the input
string.
The
contract is that you supply a line at a time, so a newline
elsewhere
is a DataError.
unpacking_func = delimited.unpacker(options...)
Options
(keyword arguments) are:
delimiter=','
front_quote='"'
back_quote= ... value of front_quote
quote='"' (synonym for front_quote)
quote_mode=2 (quote back_quotes
by doubling them)
-- or 3 (quote quotes with alt_quote
char)
-- or 1 (only quote delimiters)
-- or 0 (no quoting)
alt_quote="'"
[All of the above are common options]
ignore_leading_space=0 (if 1, unpacking_func
will skip over
leading spaces at start of each field)
Using the
result (example):
for buff in open("somefile"):
alist = unpacking_func(buff)
if alist is None:
# newline inside quoted field, need more input
continue
do_something_with(alist)
importer(...)
Return an iterator which imports delimited fields from a sequence of
strings into lists.
output_iterator = delimited.importer(input_iterable, options...)
The input_iterable must be a suitable arg
for the iter() builtin and
must deliver strings. The newlines
may appear wherever you like. The
input parcels may be single bytes, lines, blocks (from
.read(BUFSIZ)),
a whole file (from .read()).
Options
(keyword arguments) are:
Common
options, plus:
ignore_leading_space=0 (if 1, will skip over
leading spaces at start of each field)
allow_embedded_newlines=1 (if 0, will raise
an exception if newline found
inside quotes)
Using the
result (example):
myiter
= delimited.importer(open("foo.csv"),
ignore_leading_spaces=1)
for field_list in myiter:
do something with field_list
packer(...)
Return a
function packing data from a sequence into delimited
fields in a string.
packing_func = delimited.packer(options...)
Options
(keyword arguments) are:
Common
options, plus:
strict_pack=1 (if 0, packing_func
will not raise exception
when packing data that will not be unpackable)
forced_quoting=0 (if 1, wrap quotes even around fields that
don't contain delimiters or quotes)
append_newline=0 (if 1, packing_func
will append
a newline character to its result)
Using the
result:
string = packing_func(sequence)
restricted_unpacker(...)
Return a
function which unpacks data from delimited fields into a list.
The data
may not contain embedded newlines.
unpacking_func = delimited.restricted_unpacker(options...)
Options (keyword
arguments) are:
Common
options, plus:
ignore_leading_space=0 (if 1, unpacking_func
will skip over
leading spaces at start of each field)
Using the
result:
list = unpacking_func(string)
All of the above "functions/iterators"
are really callable objects which maintain a
considerable amount of state.
This can be inspected by accessing the following read-only
attributes:
(a) Your input options: delimiter, front_quote,
back_quote, quote_mode, alt_quote,
ignore_leading_space, allow_embedded_newlines,
strict_pack, forced_quoting, append_newline
(b) Other items of varying utility:
stopping_status: 0 = not
stopping, 1 = must raise StopIteration on next call,
2 = stopped
error_occurred: 0 or non-zero
input_iter_count: number of
times we've called your .next() method
input_newline_count: like it
says, but see the next which is more useful.
input_row_number: Obtained by
counting newlines. First row is 0. Negative
(-1) means no input yet.
input_char_column: 0 is first
char, -1 means the newline.
[The above two are intended for use in detailed error
reporting by the caller]
output_count
embedded_newline_count
error_count
__name__: "pack", "unpack" etc (to
identify the object in testing & timing scripts)
DATA
QUOTE_ALT = 3
QUOTE_DOUBLE =
2
QUOTE_NONE = 0
QUOTE_SINGLE =
1