Tk Library Source Code

View Ticket
Login
Ticket UUID: 565051
Title: csv::split parsing bug
Type: Bug Version: None
Submitter: todolson Created on: 2002-06-05 21:41:47
Subsystem: None Assigned To: andreas_kupries
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2002-06-25 06:17:38
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2002-06-24 23:17:38
Description:
csv::split handles a quoted, null field incorrectly if
the field is neither the first nor the last field in a
line/record:

% info tclversion
8.3
% info patch
8.3.3
% package require csv
0.2
% ::csv::split {1 2 "" ""} { }
1 2 {"} {}
% ::csv::split {"" ""} { }
{} {}
% ::csv::split {"" "" ""} { }
{} {"} {}

A change to the character map seems to fix the problem:

% proc ::csv::split {line {sepChar ,}} {
    regsub -all -- {(^\"|\"$)} $line \0 line
    set line [string map [list \
    $sepChar\"\"\" $sepChar\0\" \
    \"\"\"$sepChar \"\0$sepChar \
    \"\"$sepChar $sepChar \
    \"\"           \" \
    \"             \0 \
    ] $line]
    set end 0
    while {[regexp -indices -start $end --
{(\0)[^\0]*(\0)} $line \
    -> start end]} {
set start [lindex $start 0]
set end   [lindex $end 0]
set range [string range $line $start $end]
if {[string first $sepChar $range] >= 0} {
    set line [string replace $line $start $end \
    [string map [list $sepChar \1] $range]]
}
incr end
    }
    set line [string map [list $sepChar \0 \1 $sepChar
\0 {} ] $line]
    return [::split $line \0]
}
% ::csv::split {"" "" "" "" "" ""} { }
{} {} {} {} {} {}
User Comments: andreas_kupries added on 2002-06-25 06:17:38:
Logged In: YES 
user_id=75003

Patch applied to head and committed.

andreas_kupries added on 2002-06-25 06:16:20:

File Added - 25748: 565051.diff

Logged In: YES 
user_id=75003

Here is a patch for the true problem. Contains an extended 
testsuite.

andreas_kupries added on 2002-06-25 05:38:22:
Logged In: YES 
user_id=75003

There is a bug in split parsing, but it is not in the handling of 
inner fields. It is the outer fields, i.e. first and last which are 
off, i.e. wrongly handled.

The string {"" "" ""} parsed with separator character <space> 
does not contain three empty fields. It contains 3 fields each 
of which contains a single ". See the definition of the CSV 
format. This might be even more clear when parsing {"","",""} 
using the comma as separator character. An empty field is 
specified by two adjacent separator characters without any 
intervening other characters. In the case of <space> this 
means a space following a space, or a space at beginning 
and/or end of the string.

todolson added on 2002-06-06 04:47:32:

File Added - 24471: csv.tcl

Logged In: YES 
user_id=450877

A revised version of cvs.tcl is attached.  Still csv package
version number 0.2.

Attachments: