Lingfo – Python “delimited” extension

         This is preliminary documentation, provided for participants in the discussion on a standard CSV module on news:comp.lang.python.

         Acknowledgement: I have had the benefit of perusing the source code of Dave Cole’s “csv” module; see; in particular the manner of handling newlines embedded in quoted fields is “borrowed” from Dave’s work.

         No downloads yet; it needs some tidying before hitting the presses.


John Machin





delimited - Provides functions for packing & unpacking fields of "delimited" or "CSV" or "TSV" files.




AuthorError [Assertion failure: I stuffed up]

CallerError [You stuffed up]

DataError [Somebody stuffed up]





Return a function which unpacks delimited fields form a string into a list.

The data may contain "embedded newlines" i.e. inside a quoted field.

Otherwise a newline may occur optionally at the end of the input string.

The contract is that you supply a line at a time, so a newline elsewhere

is a DataError.


unpacking_func = delimited.unpacker(options...)

Options (keyword arguments) are:



back_quote= ... value of front_quote

quote='"' (synonym for front_quote)

quote_mode=2 (quote back_quotes by doubling them)

-- or 3 (quote quotes with alt_quote char)

-- or 1 (only quote delimiters)

-- or 0 (no quoting)


[All of the above are common options]

ignore_leading_space=0 (if 1, unpacking_func will skip over

leading spaces at start of each field)

Using the result (example):

for buff in open("somefile"):

alist = unpacking_func(buff)

if alist is None:

# newline inside quoted field, need more input





Return an iterator which imports delimited fields from a sequence of

strings into lists.


output_iterator = delimited.importer(input_iterable, options...)

The input_iterable must be a suitable arg for the iter() builtin and

must deliver strings. The newlines may appear wherever you like. The

input parcels may be single bytes, lines, blocks (from .read(BUFSIZ)),

a whole file (from .read()).

Options (keyword arguments) are:

Common options, plus:

ignore_leading_space=0 (if 1, will skip over

leading spaces at start of each field)

allow_embedded_newlines=1 (if 0, will raise

an exception if newline found inside quotes)

Using the result (example):

myiter = delimited.importer(open("foo.csv"), ignore_leading_spaces=1)

for field_list in myiter:

do something with field_list



Return a function packing data from a sequence into delimited

fields in a string.


packing_func = delimited.packer(options...)

Options (keyword arguments) are:

Common options, plus:

strict_pack=1 (if 0, packing_func will not raise exception

when packing data that will not be unpackable)

forced_quoting=0 (if 1, wrap quotes even around fields that

don't contain delimiters or quotes)

append_newline=0 (if 1, packing_func will append

a newline character to its result)

Using the result:

string = packing_func(sequence)



Return a function which unpacks data from delimited fields into a list.

The data may not contain embedded newlines.


unpacking_func = delimited.restricted_unpacker(options...)

Options (keyword arguments) are:

Common options, plus:

ignore_leading_space=0 (if 1, unpacking_func will skip over

leading spaces at start of each field)

Using the result:

list = unpacking_func(string)


All of the above "functions/iterators" are really callable objects which maintain a

considerable amount of state. This can be inspected by accessing the following read-only



(a) Your input options: delimiter, front_quote, back_quote, quote_mode, alt_quote,

ignore_leading_space, allow_embedded_newlines,

strict_pack, forced_quoting, append_newline


(b) Other items of varying utility:


stopping_status: 0 = not stopping, 1 = must raise StopIteration on next call, 2 = stopped

error_occurred: 0 or non-zero

input_iter_count: number of times we've called your .next() method

input_newline_count: like it says, but see the next which is more useful.

input_row_number: Obtained by counting newlines. First row is 0. Negative (-1) means no input yet.

input_char_column: 0 is first char, -1 means the newline.

[The above two are intended for use in detailed error reporting by the caller]




__name__: "pack", "unpack" etc (to identify the object in testing & timing scripts)









Lingfo Pty Ltd - ABN 97 084 236 199