Tcl Source Code: View Ticket

Ticket UUID:	42d14c495a096159841d2601a16850273713b31b
Title:	Parsing long floating point strings
Type:	Bug	Version:	>= 8.6
Submitter:	chw	Created on:	2023-09-22 09:16:17
Subsystem:	48. Number Handling	Assigned To:	kbk
Priority:	9 Immediate	Severity:	Critical
Status:	Closed	Last Modified:	2025-05-05 14:54:55
Resolution:	Fixed	Closed By:	oehhar
		Closed on:	2025-05-05 14:54:55
Description:	The following snippet gives strange results when the mantissa gets large enough: for {set i 1} {$i < 1000} {incr i} { set s 1.[string repeat 1 $i]e-321 set d "" scan $s %g d if {$d eq ""} break puts [format "%3d: %.20g" $i $d] if {$d == inf \|\| $d == -inf} break } Output on x86_64 Linux for 8.6.11, core-8-branch, trunk: 1: 1.1017663902259797935e-321 2: 1.1116477031428047244e-321 ... 190: 1.1116477031428047244e-321 191: 8.2870452568891190738e+36 ... 701: -1.6598062275523971834e+181 702: 8.2870452568891190738e+36 703: inf Expectation: like C library sscanf(str, "%lg", &d), i.e. ... 703: 1.1116477031428047244e-321 ... 999: 1.1116477031428047244e-321
User Comments:	oehhar added on 2025-05-05 14:54:55: This is probably a bug in libtomath. But we add the workaround here until it is fixed there. Commits: core-8-6-branch: [3cc7e2aa7d] core-9-0-branch: [42d14c495a] main: [013063b5fc] I hope, we will not get a firework on CI, as this was not tested. But two TCT members said "merge" and it happened ;-). Thanks for all. Long live magic Mr.Androwish ! Bug closed, Harald oehhar added on 2025-05-05 06:24:51: A very elaborated test was provided by Christian and added by this commit: [c9b91b8c9f0dcc0b] In addition, I added some source code comment to describe this bug. IMHO, this is now ready for review and merge. Take care, Harald oehhar added on 2025-05-03 13:19:03: Christian Werner explained to me, that this issue specially happens, if an external lib delivers a double as string representation. This bug was introduced with libtommath 20 years ago. The problem is, that a double scan of the string "1.[string repeat 1 191]e-321" results in a very large number: scan "1.11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111e-321" %g 8.2870452568891191e+036 While one "1" less results in an acceptable number close to the value: scan "1.[string repeat 1 190]e-321" %g 1.11e-321 More ones in the mantissa let the scanned value grow. With 703 1's, Inf is reached. I have put those two lines in tests by this commit: [4351b324]. I am not sure, if this is very stable, but I get the same results on Windows with 8.6 32bit and main 64bit. I suppose, it would be better to have a test case with a range check... I hope with this test, we may merge the branch. Thanks for all, Harald chw added on 2025-04-03 15:10:10: Howdy Jan, see the initial ticket text for a test. The problem occurs when the string's number of mantissa digits becomes larger than 512 plus/minus the exponent. And this is somehow magically related to the not so many bits a IEEE double has for its exponent. Therefore the brute force bignum division or multiplication in my POC change in order to get the exponent into a IEEE double friendly range. jan.nijtmans added on 2025-04-03 14:45:10: Merged to a [ca62367d61824d32\|branch] Anyone in for further reviewing? I'll try to add some testcases. chw added on 2025-03-16 17:54:39: I believe that I found a possible solution for this problem, see this check-in on androwish.org: https://www.androwish.org/home/info/7f96e5a3c6548852 Please could an expert have a thorough review of it. I think the problem itself exists since the earliest 8.5 version. And since it has the potential of data driven malfunction it is crucial to fix it. sebres added on 2023-10-13 15:16:58: Tcl doesn't use `long double` for internal conversions before it gets stored to `double`... Historically instead it uses mp-math, e. g. wide+bignum with significand and int for significant digits and calculate `double` with something like `AccumulateDecimalDigit` and `MakeHighPrecisionDouble` or `MakeLowPrecisionDouble` (if no significand bignum is used), which may overflow differently to `long double` conversion which can be used internally by C-libs in `sscanf` for `%lg`. However myself would prefer that nothing overflows here at all in TclParseNumber by such FP conversion (at least by negative exponent), neither to `8.28e+36` nor and even less to `Inf`. Rather I'd expect that the number would near to `1.11e-321` in that case and to `0.0` in cases below smallest positive double close to zero (`2.0**-1074` so ca. `5e-324` in IEEE 754's double), no matter how long the mantissa becomes. In the way like this diff illustrates: % for {set i 189} {$i < 193} {incr i} { puts [expr 0.[string repeat 1 $i]e-320] } 1.11e-321 1.11e-321 1.11e-321 - 8.287045256889119e+36 + 1.11e-321 % for {set i 189} {$i < 193} {incr i} { puts [expr 1.[string repeat 1 $i]e-321] } 1.11e-321 1.11e-321 - 8.287045256889119e+36 - 8.287045256889119e+36 + 1.11e-321 + 1.11e-321 % for {set i 189} {$i < 193} {incr i} { puts [expr 10.[string repeat 1 $i]e-322] } 1.013e-321 - 7.541211183769098e+36 - 7.541211183769098e+36 - 7.541211183769098e+36 + 1.11e-321 + 1.11e-321 + 1.11e-321 Also for real overflow this could be more "tclish" since Tcl's numeric values are more dynamic and must not necessarily overflow by implicit conversions, even if it could overflow in C. This can be achieved by shift of significant bits (move float-point) and/or increase of exponent and/or simply trim of right (needless) part of mantissa by reaching of some threshold. jan.nijtmans added on 2023-09-27 20:15:05: Assigning to Kevin chw added on 2023-09-22 15:12:31: For additional discussion see https://www.exploringbinary.com/maximum-number-of-decimal-digits-in-binary-floating-point-numbers/