Tcl Source Code: View Ticket

Ticket UUID:	b5bd08df8d61b4ea9bf0cc885f790b367c430a9
Title:	The "int" type registration should be restored.
Type:	Bug	Version:	8.7, 9.0a4
Submitter:	griffin	Created on:	2022-11-17 18:48:44
Subsystem:	10. Objects	Assigned To:	nobody
Priority:	5 Medium	Severity:	Severe
Status:	Open	Last Modified:	2024-07-04 09:27:51
Resolution:	None	Closed By:	nobody
		Closed on:
Description:	TIP #484 removed the registration for Tcl_ObjType "int". The TIP proposes merging the "int" and "wideint" types into a single 64-bit int value. The change is fundamentally a change in the internal rep, collapsing the 2 integer types into 1. The justification is basically a simplification of the system when dealing with integer values. The TIP then goes on to describe the remove of the "int" and "wideint" type registractions, without really explaining why. I can only assume that the reasoning was based on the internal rep changes, and fears that external code could fail when blindly assuming the "int" internal rep. If that is the case, then the offered work arounds don't solve the issue, other than forcing developers to look at their code. If an internal rep change is all that is required to negate the registration of a type, then why wasn't the "list" type registration removed when it's internal rep was changed with TIP #625? Taking one step back, what is the point of the registration mechanism? Why does it exist? What are the rules for it? Looking at 2 of the cited extensions that use Tcl_GetObjType("int"), they are both employed for the purpose of value classification only. Never is the internal rep accessed directly. Looking at other cases for Tcl_GetObjType(), they are used to access the type's SetFromAny, or UpdateString functions. To generalize, The Tcl_ObjType is a "class", and the 2 prominent use cases are "is a" and "method" calls. I'm not aware of any case where internal rep inspection or manipulation is desired or necessary. My request is that the registration of "int" be put back. If, for some reason, there is no consensus on this, then I seriously question why any* type should be registered at all.
User Comments:	jan.nijtmans added on 2024-07-04 09:27:51: > If there was a Tcl_GetValueFromObj then as far as I can tell tkinter > would have no need to use typePtrs, and no need for type registration. Sounds like a new TIP ..... marc_culler (claiming to be Marc Culler) added on 2024-07-04 01:10:19: @jan: I rewrote part of _tkinter.c using Tcl_GetNumberFromObj() and it worked great. I agree that you could call this the "correct" way. Except for one thing, which I don't understand. Why did you only do part of the job? Why aren't there constants TCL_STRING, TCL_LIST, TCL_DICT etc, and why isn't there a Tcl_GetValueFromObj() which uses those constands? It works fine to return a void* pointer to the internal data of the object and set an integer variable to indicate what the object type is. A switch on the value of that integer is efficient enough (even if it means testing twice) and makes for clean code. The addresses of the core Tcl_ObjType structs can be private to the TclObj.c file, which is appropriate. If there was a Tcl_GetValueFromObj then as far as I can tell tkinter would have no need to use typePtrs, and no need for type registration. marc_culler added on 2024-07-03 16:44:00: > Please have a look at this commit. It's a demonstration (in Tk) how > Tcl_GetNumberFromObj() can be used instead of the registered types I am trying to follow that example to write a conversion function for _tkinter.c which can be called for any numeric type. i will call it first for every Tcl_Obj. If the call fails I think I will still need to test the typePtr to decide which conversion function to call. marc_culler (claiming to be Marc Culler) added on 2024-07-03 15:40:24: In Python3 there is only one int class, which represents an arbitrary precision integer (with internal optimizations for small values). So the Python int is equivalent to the Tcl bignum object. Python2, which is long gone, had separate classes for fixed size binary integers and arbitrary precision integers. According to the Python documentation: There are three distinct numeric types: integers, floating point numbers, and complex numbers. In addition, Booleans are a subtype of integers. Integers have unlimited precision. Floating point numbers are usually implemented using double in C; griffin added on 2024-07-03 14:45:37: My understanding of Tcl’s “int” type is of a specific size as defined by the compiler. Your example illustrates a class of values, “Integer”, which is not a specific type. Is there a subtle difference between the definitions in the two languages? marc_culler (claiming to be Marc Culler) added on 2024-07-03 13:05:30: I am repeating a comment I made on the core list, for the record. The Python tkinter module can create an embedded Tcl interpreter (with the Tk package loaded) and provides the ability to run arbitrary Tcl commands in the interpreter. Therefore _tkinter.c needs to be able to convert any Tcl_Obj to a Python object. In some cases that Python object may be a mostly useless object which just records the type name of the Tcl_Obj but most of the core Tcl object types have meaningful representations as Python objects. Below is a snippet from a Python interpreter session in which a Tcl bignum is converted to a Python int. An instance of the Python class tkinter.Tk is an encapsulation of a Tcl interpreter, although many tkinter users don't know this and think it is only encapsulating the root window. It has a call method which can be used to run an arbitrary Tcl command. $ python3 Python 3.11.8 (v3.11.8:db85d51d3e, Feb 6 2024, 18:02:37) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import tkinter >>> interp = tkinter.Tk() >>> x = interp.call('expr', '11111111111111111111', '', '22222222222222222222') >>> print(x) 246913580246913580241975308641975308642 >>> type(x) <class 'int'> While I don't know of any use for, say, tclExprCodeType in Python, I think we are providing a general purpose tool and we should make it as useful and as transparent as possible without second-guessing how it will be used in the future. And, obviously, Python is not the only client for such a tool. jan.nijtmans added on 2024-07-01 21:41:55: Please have a look at this commit. It's a demonstration (in Tk) how Tcl_GetNumberFromObj() can be used instead of the registered types. Please explain why this isn't a 'correct' solution: If the objType is already set to one of the elementary types, Tcl_GetNumberFromObj() just returns immediately without parsing the string. Would it help if the "boolean" or "index" types were added to Tcl_GetNumberFromObj() too, as additional enum's? Honest, what use-case you have for tclExprCodeType and tclEnsembleCmdType for tkInter? I cannot think of anything. marc_culler (claiming to be Marc Culler) added on 2024-07-01 16:26:01: I would like to record here that removing the registrations of core types breaks the Python _tkinter module. I would estimate 90% of the things that break are specifically due to not registering "int". There are already special hacks in that Python module to deal with the possibility that the boolean and bignum types are not registered. So those hacks could be made systematic in order to deal with not registering any types. But this does not really seem like an improvement to me. And, on the other hand, if the core types were registered the code in _tkinter would work without change. sebres added on 2024-06-24 18:02:42: The checks for int-type have mostly nothing to do with shimmering, vice versa sometimes it helps to avoid shimmering, e. g. if object is not int try index or option (compare string) or whatever type... It also may help to avoid generation of string representation, etc. For everyone who used types it is clear that the object may be an integer but have different type. What didn't exclude simple optimizations in case it is already an int-type. The suggestion in TIP to create int-object to obtain a type (and remove it hereafter) is dirty workaround and cannot be really considered as normal solution, at least for normal known types (especially if they were already there previously). Also suggested by Jan `Tcl_GetNumberFromObj` is not really an alternative - why if one needs to consider int-type only (for whatever reason), one has to use function which also checks doubles, bignums, etc... let alone, the branch mispredictions... even in ideal cases, not to mention up to 9 jumps in worse case, for instance, if it is not an integer and the check was previously simple (not int, then try index or compare string). I'll try to summarize the arguments for type-registration: performance - simplest way ever to check whether the argument is already* an int, so may be used for... shimmer avoidance (if objv[i] may be an int and some other type), or... preventing the generation of string representations (it'd remain pure int) check whether object is plain etc. The Tcl source code itself is full with `TclHasInternalRep(..., &tclIntType)`, so why the extensions may not need that (or shall use weird workarounds). Please stop think for the devs how it is better for them to do something. Many people knows it already, and for others one can improve the documentation or explain the shimmering or whatever consequences instead. apnadkarni added on 2022-11-18 16:58:09: Using Tcl_GetIntFromObj etc. does not achieve the purpose because (for example) the value 1 will be successfully retrieved via the call by shimmering the string "1". The desired intent is that a Tcl_Obj created via Tcl_NewStringObj("1", -1) be passed to Excel as a string, while Tcl_NewIntObj(1) be passed as an int (VT_I4) and Tcl_NewWideInt(1) be passed as VT_I8. Having said that, I fully understand (and always did) that this is simply something Tcl cannot do reliably. It happened to work well in Tcl 8.6 specifically because of how Tcl was implemented, not because of any guarantees in the language or C API. It will not work well with 8.7. End users can either stick with 8.6 or move on to some other scripting language for COM, particularly if the component does not provide a type library. /Ashok jan.nijtmans added on 2022-11-18 10:36:10: > So extensions have made use Tcl_GetObjType to guess the type and offer a "casting" mechanism that only needs to be used in special cases. This has worked surprisingly well in practice. The 'correct' way to check for a specific type in Tcl is to use functions like Tcl_GetIntFromObj(), Tcl_GetWideIntFromObj(), Tcl_GetBignumFromObj(), and see whether they return TCL_OK or not. Since Tcl uses ducktyping (anything looking like an integer is really an integer), the concept is different from other languages. E.g. in java, '5' is a character, "5" is a string and 5 is an integer. In Tcl, there is no distinction, 5 can be a string, a integer, a wideInteger, a bugnum, even a list (with just a single element). You cannot be sure, until you call a function to check that, it's objType just says something about how it is stored internally at the moment. The extension I have seen, check the objType just as optimization: If an object has some type already, you don't need to call Tcl_GetXXXFromObj() to know the real type. In Tcl 8.6, asking for the type of 5, it might be NULL, or tclListType or tclIntType or tclWideIntType. In Tcl 8.7, it will never return tclIntType (since it uses tclWideIntType internally everywhere). In Tcl 9.0, tclIntType is gone, but tclWideIntType is named "int" for maximum compatibility. Yes, it means that extensions need to be adapted, hopefully they start using Tcl_GetIntFromObj()/Tcl_GetWideIntFromObj()/Tcl_GetBignumFromObj() to check for the realy type in stead of depending on the objType. That's my 2c anonymous added on 2022-11-18 07:29:45: > That does not mean they all break with this change Speaking for xotcl/nsf (the old xotcl extension is dead, it was just for Tcl 8.6; XOTcl 2 is part of nsf (same binary). It uses the following simple idiom to obtain the type record of "int". + tmpObj = Tcl_NewIntObj(0); + Nsf_OT_intType = tmpObj->typePtr; + Tcl_DecrRefCount(tmpObj); + assert(Nsf_OT_intType != NULL); Probably, Brian was referring to such a construct above... The "int" registry was removed somewhere in the 8.7a* versions. This is not something, I would go on the street for. I never cared to understand why some types are registered, others not. OTOH: The type registry is important, and should not go away. It is used e.g. by nsf to use type converters (e.g. memory and time units) of NaviServer when used inside NaviServer. This makes usage more seamless. apnadkarni added on 2022-11-18 07:05:59: I'd like to point out one real world impact of this change. There is one category of extensions that are going to break - those interfacing to other languages that are typed (meaning almost all). This includes twapi, tcom, rl_json and probably jacl though I have not checked the last. I'll describe the issue in terms of Windows COM. When invoking a COM method via automation (IDispatch), the arguments are constructed one of two ways using the called component's type library, or in the case where the component does not provide a type library, based on the data type at the call site. For example, in VB `foo(1)` and `foo("1")` will pass the argument as integer and string respectively. The issue with invoking methods on components in the second category from Tcl should be obvious as `[foo 1]` provides no information about the type of the argument. However, even components in the first category, that provide type libraries, suffer from the same issue for overloaded methods take simply describe the type of the argument as `VARIANT` and then take different actions based on whether the argument is provided as an integer or (for example) string. Excel has several such methods. The "correct" way to deal with these cases in Tcl is to supply arguments as typed values, e.g. `foo [list int $val]` or `foo [list string $val]`. However, this is not only inconvenient for interactive use, it makes it difficult for applications and libraries that now have to deal with the same value represented in native Tcl form and a typed version. This is particularly painful in nested structures. So extensions have made use `Tcl_GetObjType` to guess the type and offer a "casting" mechanism that only needs to be used in special cases. This has worked surprisingly well in practice. A grep on my source base shows nsf, xotcl, jblend, twapi, tcom, rl_json and tcllib using `Tcl_GetObjType`. That does not mean they all break with this change, I've only verified that twapi, tcom and rl_json are affected. Note this means layered packages like CAWT (on twapi) or pandoc, tcljupyter (for rl_json) are also potentially impacted. I understand that perhaps `Tcl_GetObjType` was not intended to be used in this fashion (which then also raises the question of why it was there in the first place) but there was no other alternative without resorting to passing "type/value" pairs everywhere. /Ashok jan.nijtmans added on 2022-11-17 20:16:46: Indeed, I also question why any type should be registered at all. For int's and bignum's there's a better solution now: Tcl_GetNumberFromObj() (TIP #638). This function can determine whether a Tcl_Obj is an integer or a bignum without shimmering. In my opinion, the best way for Tcl 9.0 is - indeed - remove the registration, so extensions will start using Tcl_GetNumberFromObj(). Problem solved. For Tcl 8.7, the registration for "int" is only a compatibility Tcl_ObjType. Any operation on it will shimmer the "int" to a "wideInt", so any code using Tcl_GetObjType("int") will basically not work as expected anyway. Does it hurt? Not really, because this functionality is only used as optimization, this optimization will just do nothing in Tcl 8.7 any more. If extension start using Tcl_GetNumberFromObj(), the optimization will start working again. The reason why other Tcl_ObjTypes are still registered is simply because there is no good alternative yet. List's can easily shimmer to Dicts's, so - eventually - there could be a Tcl_GetListOrDictFromObj() helping in that. Hope this clarifies the situation a little.