Artifact [ef63cf492a]

Login

Artifact ef63cf492a8f369eb9295638cb5b450f8bc12119ddb3825a73b904d116d3ac3f:


TIP:            270
Title:          Utility C Routines for String Formatting
Version:        $Revision: 1.10 $
Author:         Don Porter <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        19-Jun-2006
Post-History:   
Tcl-Version:    8.5

~ Abstract

This TIP proposes new public C utility routines for the convenience of C-coded
extensions and embedded uses of Tcl.

~ Background

During development of Tcl 8.5, several internal routines have been created
that provide useful string formatting functions. These routines are most
commonly used in the construction of error messages, but have a generally
useful nature. The Tcl source code itself makes significant use of them.

Making some of these routines public also addresses
Feature Request 1184069.

~ Proposed Changes

Add the following routines to Tcl's public interface:

~~ Tcl_AppendObjToErrorInfo

 > void '''Tcl_AppendObjToErrorInfo'''(Tcl_Interp *''interp'',
   Tcl_Obj *''objPtr'')

This routine is analogous to the existing routine '''Tcl_AddErrorInfo''',
but permits appending a Tcl_Obj value rather than requiring
a '''(const char *)'''.

~~ Tcl_AppendLimitedToObj

 > void '''Tcl_AppendLimitedToObj'''(Tcl_Obj *''objPtr'', const char *''bytes'',
   int ''length'', int ''limit'', const char *''ellipsis'')

This routine is used to append a string, but to impose a limit on how many
bytes are appended. This can be handy when the string to be appended might be
very large, but the value being constructed should not be allowed to grow
without bound. A common usage is when constructing an error message, where the
end result should be kept short enough to be read. 

Bytes from ''bytes'' are appended to ''objPtr'', but no more than ''limit''
bytes total are to be appended. If the limit prevents all ''length'' bytes
that are available from being appended, then the appending is done so that
the last bytes appended are from the string ''ellipsis''. This allows for
an indication of the truncation to be left in the string.

When ''length'' is -1, all bytes up to the first zero byte are appended,
subject to the limit. When ''ellipsis'' is NULL, the default string '''...'''
is used. When ''ellipsis'' is non-NULL, it must point to a zero-byte-terminated
string in Tcl's internal UTF encoding.  The number of bytes appended can
be less than the lesser of ''length'' and ''limit'' when appending fewer
bytes is necessary to append only whole multi-byte characters.

The ''objPtr'' must be unshared, or the attempt to append to it will panic.

~~ Tcl_Format

 > Tcl_Obj * '''Tcl_Format'''(Tcl_Interp *''interp'',
   const char *''format'', int ''objc'', Tcl_Obj *const ''objv''[])

This routine is the C-level interface to the engine of Tcl's '''format'''
command.  The actual command procedure for '''format''' is little more
than

| Tcl_Format(interp, Tcl_GetString(objv[1]), objc-2, objv+2);

The ''objc'' Tcl_Obj values in ''objv'' are formatted into a string
according to the conversion specification in ''format'' argument, following
the documentation for the '''format''' command.  The resulting formatted
string is converted to a new Tcl_Obj with refcount of zero and returned.
If some error happens during production of the formatted string, NULL is
returned, and an error message is recorded in ''interp'', if ''interp''
is non-NULL.

~~ Tcl_AppendFormatToObj

 > int '''Tcl_AppendFormatToObj'''(Tcl_Interp *''interp'', Tcl_Obj *''objPtr'',
   const char *''format'', int ''objc'', Tcl_Obj *const ''objv''[])

This routine is an appending alternative form of '''Tcl_Format'''.  Its
function is equivalent to:

| Tcl_Obj *newPtr = Tcl_Format(interp, format, objc, objv);
| if (newPtr == NULL) return TCL_ERROR;
| Tcl_AppendObjToObj(objPtr, newPtr);
| return TCL_OK;

But it is more convenient and efficient when the appending functionality
is needed.

The ''objPtr'' must be unshared, or the attempt to append to it will panic.

~~ Tcl_ObjPrintf

 > Tcl_Obj * '''Tcl_ObjPrintf'''(const char *''format'', ...)

This routine serves as a replacement for the common sequence:

| char buf[SOME_SUITABLE_LENGTH];
| sprintf(buf, format, ...);
| Tcl_NewStringObj(buf, -1);

Use of the proposed routine is shorter and doesn't require the programmer to
determine '''SOME_SUITABLE_LENGTH'''. The formatting is done with the same
core formatting engine used by '''Tcl_Format'''.  This means the set of
supported conversion specifiers is that of Tcl's '''format''' command and
not that of ''sprintf()'' where the two sets differ. When a conversion
specifier passed to '''Tcl_ObjPrintf''' includes a precision, the value is
taken as a number of bytes, as ''sprintf()'' does, and not as a number of
characters, as '''format''' does.  This is done on the assumption that C
code is more likely to know how many bytes it is passing around than the
number of encoded characters those bytes happen to represent.
The variable number of arguments passed in should be of the types that would
be suitable for passing to ''sprintf()''.  Note in this example usage, ''x''
is of type '''long'''.

|  long x = 5;
|  Tcl_Obj *objPtr = Tcl_ObjPrintf("Value is %d", x);

If the value of ''format'' contains internal inconsistencies or invalid
specifier formats, the formatted string result produced by 
'''Tcl_ObjPrintf''' will be an error message instead of any
attempt to Do What Is Meant.

~~ Tcl_AppendPrintfToObj

 > void '''Tcl_AppendPrintfToObj'''(Tcl_Obj *''objPtr'',
   const char *''format'', ...)

This routine is an appending alternative form of '''Tcl_ObjPrintf'''.  Its
function is equivalent to:

| Tcl_AppendObjToObj(objPtr, Tcl_ObjPrintf(format, ...));

But it is more convenient and efficient when the appending functionality
is needed.

The ''objPtr'' must be unshared, or the attempt to append to it will panic.

~ Compatibility

This proposal includes only new features. It is believed that existing scripts
and C code that operate without errors will continue to do so.

~ Reference Implementation

The actual code is already complete as internal routines corresponding to the
proposed public routines. Implementation is just an exercise in renaming,
placing in stub tables, documentation, etc.

~ Copyright

This document has been placed in the public domain.