Tcl Source Code

View Ticket
Login
Ticket UUID: 42d14c495a096159841d2601a16850273713b31b
Title: Parsing long floating point strings
Type: Bug Version: >= 8.6
Submitter: chw Created on: 2023-09-22 09:16:17
Subsystem: 48. Number Handling Assigned To: kbk
Priority: 9 Immediate Severity: Critical
Status: Closed Last Modified: 2025-05-05 14:54:55
Resolution: Fixed Closed By: oehhar
    Closed on: 2025-05-05 14:54:55
Description:
The following snippet gives strange results when the mantissa gets large enough:

for {set i 1} {$i < 1000} {incr i} {
    set s 1.[string repeat 1 $i]e-321
    set d ""
    scan $s %g d
    if {$d eq ""} break
    puts [format "%3d: %.20g" $i $d]
    if {$d == inf || $d == -inf} break
}

Output on x86_64 Linux for 8.6.11, core-8-branch, trunk:

  1: 1.1017663902259797935e-321
  2: 1.1116477031428047244e-321
...
190: 1.1116477031428047244e-321
191: 8.2870452568891190738e+36
...
701: -1.6598062275523971834e+181
702: 8.2870452568891190738e+36
703: inf

Expectation: like C library sscanf(str, "%lg", &d), i.e.

...
703: 1.1116477031428047244e-321
...
999: 1.1116477031428047244e-321
User Comments: oehhar added on 2025-05-05 14:54:55:

This is probably a bug in libtomath. But we add the workaround here until it is fixed there.

Commits:

  • core-8-6-branch: [3cc7e2aa7d]
  • core-9-0-branch: [42d14c495a]
  • main: [013063b5fc]

I hope, we will not get a firework on CI, as this was not tested. But two TCT members said "merge" and it happened ;-).

Thanks for all. Long live magic Mr.Androwish !

Bug closed, Harald


oehhar added on 2025-05-05 06:24:51:

A very elaborated test was provided by Christian and added by this commit: [c9b91b8c9f0dcc0b] In addition, I added some source code comment to describe this bug.

IMHO, this is now ready for review and merge.

Take care, Harald


oehhar added on 2025-05-03 13:19:03:

Christian Werner explained to me, that this issue specially happens, if an external lib delivers a double as string representation. This bug was introduced with libtommath 20 years ago.

The problem is, that a double scan of the string "1.[string repeat 1 191]e-321" results in a very large number:

scan
"1.11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111e-321" %g
8.2870452568891191e+036

While one "1" less results in an acceptable number close to the value:

scan "1.[string repeat 1 190]e-321" %g
1.11e-321

More ones in the mantissa let the scanned value grow. With 703 1's, Inf is reached.

I have put those two lines in tests by this commit: [4351b324]. I am not sure, if this is very stable, but I get the same results on Windows with 8.6 32bit and main 64bit.

I suppose, it would be better to have a test case with a range check...

I hope with this test, we may merge the branch.

Thanks for all, Harald


chw added on 2025-04-03 15:10:10:
Howdy Jan,

see the initial ticket text for a test. The problem occurs when the string's
number of mantissa digits becomes larger than 512 plus/minus the exponent.
And this is somehow magically related to the not so many bits a IEEE double
has for its exponent. Therefore the brute force bignum division or
multiplication in my POC change in order to get the exponent into a IEEE
double friendly range.

jan.nijtmans added on 2025-04-03 14:45:10:

Merged to a [ca62367d61824d32|branch]

Anyone in for further reviewing? I'll try to add some testcases.


chw added on 2025-03-16 17:54:39:
I believe that I found a possible solution for this problem,
see this check-in on androwish.org:

  https://www.androwish.org/home/info/7f96e5a3c6548852

Please could an expert have a thorough review of it.
I think the problem itself exists since the earliest
8.5 version. And since it has the potential of data
driven malfunction it is crucial to fix it.

sebres added on 2023-10-13 15:16:58:

Tcl doesn't use long double for internal conversions before it gets stored to double... Historically instead it uses mp-math, e. g. wide+bignum with significand and int for significant digits and calculate double with something like AccumulateDecimalDigit and MakeHighPrecisionDouble or MakeLowPrecisionDouble (if no significand bignum is used), which may overflow differently to long double conversion which can be used internally by C-libs in sscanf for %lg.

However myself would prefer that nothing overflows here at all in TclParseNumber by such FP conversion (at least by negative exponent), neither to 8.28e+36 nor and even less to Inf. Rather I'd expect that the number would near to 1.11e-321 in that case and to 0.0 in cases below smallest positive double close to zero (2.0**-1074 so ca. 5e-324 in IEEE 754's double), no matter how long the mantissa becomes.

In the way like this diff illustrates:

  % for {set i 189} {$i < 193} {incr i} { puts [expr 0.[string repeat 1 $i]e-320] }
  1.11e-321
  1.11e-321
  1.11e-321
- 8.287045256889119e+36
+ 1.11e-321

  % for {set i 189} {$i < 193} {incr i} { puts [expr 1.[string repeat 1 $i]e-321] }
  1.11e-321
  1.11e-321
- 8.287045256889119e+36
- 8.287045256889119e+36
+ 1.11e-321
+ 1.11e-321

  % for {set i 189} {$i < 193} {incr i} { puts [expr 10.[string repeat 1 $i]e-322] }
  1.013e-321
- 7.541211183769098e+36
- 7.541211183769098e+36
- 7.541211183769098e+36
+ 1.11e-321
+ 1.11e-321
+ 1.11e-321

Also for real overflow this could be more "tclish" since Tcl's numeric values are more dynamic and must not necessarily overflow by implicit conversions, even if it could overflow in C.

This can be achieved by shift of significant bits (move float-point) and/or increase of exponent and/or simply trim of right (needless) part of mantissa by reaching of some threshold.


jan.nijtmans added on 2023-09-27 20:15:05:

Assigning to Kevin


chw added on 2023-09-22 15:12:31:
For additional discussion see

https://www.exploringbinary.com/maximum-number-of-decimal-digits-in-binary-floating-point-numbers/