View Ticket
Not logged in
Bounty program for improvements to Tcl and certain Tcl packages.
Ticket Hash: 839d20d7c3c080f1d36aa558b873e5f7acccbfe1
Title: performance through parallelization / threading
Status: Open Type: Feature_Request
Severity: Important Priority: Medium
Subsystem: General Resolution: Open
Last Modified: 2010-12-24 11:52:53
Version Found In:
Look into working up a C-level thread pool support and then using this to parallelize the various image operations by tiling or striping them and then each parcel handled by a separate thread.

This likely requires us to refactor the operators into the core function and API, so that we can slide the thread management between them.

Question: What C level API does Tcl (and Thread package) provide here which would be useful ?

Note that this assumes that the threads are at the C-level, and invisible to the user. Not image processing by moving them to a wholly separate thread to keep them out of the GUIs way. That is a separate thing CRIMP doesn't have to think about. Except maybe in providing ways of transfering images between Tcl threads without having to convert them between representations, using lots of memory.

andreask added on 2010-09-13 18:48:10:
Note One concern: Is there a portable way to determine the number of CPUs on a system ? Because this seems to me to be the most suitable size for the threadpool used by the crimp internals.

andreask added on 2010-09-13 18:58:29:
Google: sysconf, _SC_NPROCESSORS_{CONF,ONLN}

andreask added on 2010-09-13 19:10:18:

Snarfing relevant parts of the discussion ...

... SYSTEM-INFO.dwNumberOfProcessors on Windows.


... See also: the QThread::idealThreadCount() function.

On Windows, that's: SYSTEM-INFO sysinfo; GetSystemInfo(&sysinfo); return sysinfo.dwNumberOfProcessors;

MacOS X: MPProcessorsScheduled();

HPUX: struct pst-dynamic psd; if (pstat-getdynamic(&psd, sizeof(psd), 1, 0) == -1) { perror("pstat-getdynamic"); cores = -1; } else { cores = (int)psd.psd-proc-cnt; }

{Free,Net,Open}BSD: size-t len = sizeof(cores); int mib[2]; mib[0] = CTL-HW; mib[1] = HW-NCPU; if (sysctl(mib, 2, &cores, &len, NULL, 0) != 0) { perror("sysctl"); cores = -1; }

"integrity" OS, symbian: hard-coded to one core.

VXWorks: a loop to check if CPU #n exists until it fails (see link)

IRIX: cores = (int)sysconf(_SC_NPROC_ONLN);

all other Unix (including Linux): cores = (int)sysconf(_SC_NPROCESSORS_ONLN);


andreask added on 2010-09-13 19:12:03:

andreask added on 2010-09-20 17:55:49:
Thanks to Joe English for the link to the portable 'hardware locality' project.

BSD licensed! This looks to be all I want and much more.

anonymous claiming to be Arjen Markus added on 2010-12-24 11:52:53:
Consider the use of GPU programming - that is well-suited for this type of computations (almost embarassingly parallel, with only small amounts of data per node).

Another thing that comes to mind is the use of OpenMP: it is a high-level set of directives and some platform-independent functions that make life much easier, if it fits the bill.