Tcl Source Code

View Ticket
Bounty program for improvements to Tcl and certain Tcl packages.
Ticket UUID: 0e92c404f19ede5b2eb06e6db27647d3138cc56
Title: Side effect on string range command (from encoding and format commands)
Type: Bug Version: tcl 8.5
Submitter: anonymous Created on: 2014-04-30 16:54:49
Subsystem: 16. Commands A-H Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2014-04-30 20:38:01
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2014-04-30 20:38:01
Hi all,

I found one very confusing behavior in tcl.

set x "\u5317\u4eac"             # Beijing in Chinese
set y $x
puts "x: '$x' -> '[string range $x 0 100]''"
puts "y: '$y'"
encoding convertfrom "iso8859-1" $y
puts "y: '$y'"
# format "%s" $y
puts "x: '$x' -> '[string range $x 0 100]'"

x: '北京' -> '北京''
y: '北京'
y: '北京'
x: '北京' -> '¬'

if we uncomment second to last line above output is ok:

x: '北京' -> '北京''
y: '北京'
y: '北京'
x: '北京' -> '北京'

encoding and format commands have side effects on string range command (In the second example they negate each other producing good result). Pay attention that neither x nor y were manipulated in any way. However string range command seems to be affected. I use tcl 8.5.

User Comments: jan.nijtmans added on 2014-04-30 20:38:01:

Fixed in [77a7d8d123] (core-8-5-branch). In trunk, the fix was already applied more than 5 years ago ([ee4709ceaf], thanks to dkf), so I just backported dkf's fix.

Explanation: The "encoding convertfrom" command creates an internal byte array representation from the string, and "string range" tries to make smart use of the byte array representation. But that shortcut was simply wrong.