Description: |
Hi all,
I found one very confusing behavior in tcl.
set x "\u5317\u4eac" # Beijing in Chinese
set y $x
puts "x: '$x' -> '[string range $x 0 100]''"
puts "y: '$y'"
encoding convertfrom "iso8859-1" $y
puts "y: '$y'"
# format "%s" $y
puts "x: '$x' -> '[string range $x 0 100]'"
output:
x: '北京' -> '北京''
y: '北京'
y: '北京'
x: '北京' -> '¬'
if we uncomment second to last line above output is ok:
x: '北京' -> '北京''
y: '北京'
y: '北京'
x: '北京' -> '北京'
encoding and format commands have side effects on string range command (In the second example they negate each other producing good result). Pay attention that neither x nor y were manipulated in any way. However string range command seems to be affected. I use tcl 8.5.
Thanks,
Nikola
|
User Comments: |
jan.nijtmans added on 2014-04-30 20:38:01:
(text/x-fossil-wiki)
Fixed in [77a7d8d123] (core-8-5-branch). In trunk, the fix was already applied more than 5 years ago ([ee4709ceaf], thanks to dkf), so I just backported dkf's fix.
Explanation: The "encoding convertfrom" command creates an internal byte array representation from the string, and "string range" tries to make smart use of the byte array representation. But that shortcut was simply wrong.
|