Ticket UUID: | 565051 | |||
Title: | csv::split parsing bug | |||
Type: | Bug | Version: | None | |
Submitter: | todolson | Created on: | 2002-06-05 21:41:47 | |
Subsystem: | None | Assigned To: | andreas_kupries | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2002-06-25 06:17:38 | |
Resolution: | Fixed | Closed By: | andreas_kupries | |
Closed on: | 2002-06-24 23:17:38 | |||
Description: |
csv::split handles a quoted, null field incorrectly if the field is neither the first nor the last field in a line/record: % info tclversion 8.3 % info patch 8.3.3 % package require csv 0.2 % ::csv::split {1 2 "" ""} { } 1 2 {"} {} % ::csv::split {"" ""} { } {} {} % ::csv::split {"" "" ""} { } {} {"} {} A change to the character map seems to fix the problem: % proc ::csv::split {line {sepChar ,}} { regsub -all -- {(^\"|\"$)} $line \0 line set line [string map [list \ $sepChar\"\"\" $sepChar\0\" \ \"\"\"$sepChar \"\0$sepChar \ \"\"$sepChar $sepChar \ \"\" \" \ \" \0 \ ] $line] set end 0 while {[regexp -indices -start $end -- {(\0)[^\0]*(\0)} $line \ -> start end]} { set start [lindex $start 0] set end [lindex $end 0] set range [string range $line $start $end] if {[string first $sepChar $range] >= 0} { set line [string replace $line $start $end \ [string map [list $sepChar \1] $range]] } incr end } set line [string map [list $sepChar \0 \1 $sepChar \0 {} ] $line] return [::split $line \0] } % ::csv::split {"" "" "" "" "" ""} { } {} {} {} {} {} {} | |||
User Comments: |
andreas_kupries added on 2002-06-25 06:17:38:
Logged In: YES user_id=75003 Patch applied to head and committed. andreas_kupries added on 2002-06-25 06:16:20: File Added - 25748: 565051.diff Logged In: YES user_id=75003 Here is a patch for the true problem. Contains an extended testsuite. andreas_kupries added on 2002-06-25 05:38:22: Logged In: YES user_id=75003 There is a bug in split parsing, but it is not in the handling of inner fields. It is the outer fields, i.e. first and last which are off, i.e. wrongly handled. The string {"" "" ""} parsed with separator character <space> does not contain three empty fields. It contains 3 fields each of which contains a single ". See the definition of the CSV format. This might be even more clear when parsing {"","",""} using the comma as separator character. An empty field is specified by two adjacent separator characters without any intervening other characters. In the case of <space> this means a space following a space, or a space at beginning and/or end of the string. todolson added on 2002-06-06 04:47:32: File Added - 24471: csv.tcl Logged In: YES user_id=450877 A revised version of cvs.tcl is attached. Still csv package version number 0.2. |