Tcl package Thread source code

View Ticket
Login
Ticket UUID: 90de9ddf94ecfb6bacd5e754919a21d31fe3f25a
Title: Checking data consistency for unlocking of mutexes
Type: Bug Version: 2.8.4
Submitter: elfring Created on: 2019-05-06 13:53:49
Subsystem: 80. Thread Package Assigned To: nobody
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2019-05-10 16:56:08
Resolution: Invalid Closed By: elfring
    Closed on: 2019-05-10 16:56:08
Description:
I am trying again to get a multi-threaded data processing approach working also together with the programming language “TCL 8.6.9-1.5” on an openSUSE system.
I have detected questionable software behaviour during my tests.

Test output example:
…
3. run
1 threads are working.
mutex is not locked
    while executing
"thread::mutex unlock [tsv::get [tsv::get shared context] pcm]"
    (procedure "main_thread_unlock_start" line 1)
    invoked from within
"main_thread_unlock_start"
    (procedure "perform_command" line 11)
    invoked from within
"perform_command"
    (procedure "repeat_command" line 5)
    invoked from within
"repeat_command"
    (procedure "run_analysis" line 2)
    invoked from within
"run_analysis"
    (file "test-synchronisation1-20190506.tcl" line 133)


How do you think about to check if any more data processing surprises can be found by an example script like the following?


package require Thread
set results {}

proc perform_command {} \
{
tsv::set [tsv::get shared context] pcm [thread::mutex create]
thread::mutex lock [tsv::get [tsv::get shared context] pcm]
tsv::set [tsv::get shared context] pcc [thread::cond create]

try \
   {
   tsv::set [tsv::get shared context] working 0
   tsv::set [tsv::get shared context] finish_notification 0
   } \
finally \
   {
   proc main_thread_unlock_start {} \
   {thread::mutex unlock [tsv::get [tsv::get shared context] pcm]}
   main_thread_unlock_start
   }

for {set x 1} {$x <= 6} {incr x} \
   {
   set t($x) [thread::create {
                             proc handle_input {} \
                             {
                             thread::mutex lock [tsv::get [tsv::get shared context] pcm]

                             try \
                                {
                                puts stderr "[tsv::incr [tsv::get shared context] working] threads are working."
                                } \
                             finally \
                                {
                                proc handle_input_unlock_begin {} \
                                {thread::mutex unlock [tsv::get [tsv::get shared context] pcm]}
                                handle_input_unlock_begin
                                }

                             thread::mutex lock [tsv::get [tsv::get shared context] pcm]

                             try \
                                {
                                tsv::incr [tsv::get shared context] working -1

                                if [tsv::get [tsv::get shared context] finish_notification] \
                                   {
                                   puts stderr {thread::cond notify finish}
                                   thread::cond notify [tsv::get [tsv::get shared context] pcc]
                                   }
                                } \
                             finally \
                                {
                                proc handle_input_unlock_end {} \
                                {thread::mutex unlock [tsv::get [tsv::get shared context] pcm]}
                                handle_input_unlock_end
                                }
                             }

                             handle_input
                             }]
   }

puts stderr "Worker threads were started."
thread::mutex lock [tsv::get [tsv::get shared context] pcm]

try \
   {
   tsv::set [tsv::get shared context] finish_notification 1

   while {[set w [tsv::get [tsv::get shared context] working]] > 0} \
      {
      puts stderr "cond wait finish for $w threads"
      thread::cond wait [tsv::get [tsv::get shared context] pcc] [tsv::get [tsv::get shared context] pcm]
      }
   } \
finally \
   {
   proc main_thread_unlock_finish {} \
   {thread::mutex unlock [tsv::get [tsv::get shared context] pcm]}
   main_thread_unlock_finish
   }

global results
lappend results [llength [thread::names]]
#thread::cond destroy [tsv::get [tsv::get shared context] pcc]
#thread::mutex destroy [tsv::get [tsv::get shared context] pcm]
}

proc repeat_command {} \
{
for {set x 1} {$x <= 12} {incr x} \
   {
   puts stderr "$x. run"
   tsv::set shared context $x
   perform_command
   }
}

proc run_analysis {} \
{
repeat_command
global results

if [llength ${results}] \
   {
   proc report {} \
   {
   global results

   foreach k ${results} \
      {
      dict incr counts $k
      }

   set delimiter |
   puts [join {{"still running threads"} incidence} ${delimiter}]

   dict for {k v} ${counts} \
      {
       puts "$k${delimiter}$v"
      }
   }

   report
   } \
else \
   {
   puts stderr {No result for this analysis!}
   }
}

run_analysis


See also:
https://core.tcl-lang.org/tcl/tktview/0f439992eff158ca991ee191f89a6bd4c4eb51e9
User Comments: elfring added on 2019-05-10 16:56:08:
> so if main thread switch it within lock,

Now I imagine that this undesired action can be prevented by referring to the mentioned context information by a local TCL variable instead of using the command “[tsv::get shared context]” in the worker threads.
This can be achieved by a concatenation of two code parts like the following.

   thread::create [string cat "set context $x\n" $suffix]


> … your worker pool (context) is protected by "working" counter, which is not guarantee at the moment that all workers really counted …

I have changed the software design to an approach which works with termination notifications from TCL worker threads.
Unfortunately, it contains also open issues for further development considerations.

sebres added on 2019-05-09 09:14:53:

> I would expect that a counter variable like “working” would be kept non-negative here because of the system properties of the command “tsv::incr”.

Yes, but you increment or decrement values in different storages! Firstly in my example it is 11, then it is 12.
So [tsv::incr 11 working +1] and hereafter [tsv::incr 12 working -1].

> This is a possible data processing concern.

No. Still again - this is a result of your wrong protection algorithm.

> I got other expectations. It is attempted to achieve the result that no worker threads will be left over running at the end of the procedure “perform_command”.

But this expectation is wrong by design - your worker pool (context) is protected by "working" counter, which is not guarantee at the moment that all workers really counted (because of the RC).

Either you would increment it before starting of each thread in main thread, and each worker would decrement it. Or something similar.
Bur right now as it build, it is rather sporadic value and main thread get "working" == 0 because workers still not made the increment (suspended in lock before).

> Interesting, isn't it? - Thanks for your test demonstration.

Not really, because as already said - it is wrong by design.

> The logging approach can trigger significant differences for the data flow comparison.

Sure (and with each additional log-message it will be harder to reproduce). But try to follow my explanation - it is clear a wrong design. The situation of steps 1..6 is imaginable but should be avoided, therefore in step 7 it going wrong.

> The shown approach can be incomplete. It corresponds to a software development stage around a try for the safe application of a (single) condition variable.

No. This is wrong. If you cannot guarantee that the pool working properly and does not overwrite some values - this results to bugs/RC you see.
And then to describe this as an issue of tcl-thread package is at least weird in my opinion.
You will get this issue if you would write this with ANY language supporting multi-threading facilities and the approach remind the same.

> I imagine that our different views can contribute to a better solution finally.

No. There are basic differences in our views. I'm developing in multi-threaded environments already 20 years with several languages (C, tcl, python, ruby, java/scala, etc). The protection of the pool (and primitives used within) is always a primary task and should be organized properly. Your protection in script is wrong by design.
Nowhere one would expect the code:

tsv::set shared context 11
tsv::incr [tsv::get shared context] working +1
tsv::set shared context 12
tsv::incr [tsv::get shared context] working -1
puts [tsv::get [tsv::get shared context] working]
would result to something other as -1.

So if you cannot implement basic protection to guarantee working counter always corresponds to the real counter of workers and trying to implement a wait for the completion on the basis of this wrong protection, it is doomed to fail.


elfring added on 2019-05-09 06:27:06:
>… threads should supposedly set the value to negative,

I would expect that a counter variable like “working” would be kept non-negative
here because of the system properties of the command “tsv::incr”.


>…, but "old" still alive workers would simply write/read all the values in/from "new" context.

This is a possible data processing concern.


> The main thread (due to several RC) does not really wait unless all workers are going down.

I got other expectations. It is attempted to achieve the result that no worker threads will be left over running at the end of the procedure “perform_command”.


>The context self is not identified in workers.

Not for this test variant so far.


> so if main thread switch it within lock,

This should not happen here.


> the old still alive thread uses new context as it would be own, what is wrong.

Your conclusion would be appropriate for the unexpected case.


> (but if one does not see this RC,

One race condition might not be appropriately taken into account for this test case version.


> one would say it is impossible). 

The software situation might be still unclear for discussed variations in implementation details.


> This is pretty good reproducible

Interesting, isn't it? - Thanks for your test demonstration.


> and can be easily comprehensible if one adds logging messages

The logging approach can trigger significant differences for the data flow comparison.


> So I'm pretty sure your script is wrong

The shown approach can be incomplete. It corresponds to a software development stage around a try for the safe application of a (single) condition variable.


> Neither conditional/mutex, nor the tsv-storage have any synchronization issue in the direction you're trying to claim.

I imagine that our different views can contribute to a better solution finally.
Are there any more design options to reconsider?

sebres added on 2019-05-08 22:35:50:

As discussed today by telephone (that tsv::incr +1 followed by -1 in all 6 threads should supposedly set the value to negative, I analysed it and can confirm it is possible even due to one of your RC which I already mentioned in my script examples and by telephone.

This occurs because of the mistakenly switch of the content in the main thread (next iteration in repeat_command) where still one or more active worker are in the second lock area of the handle_input.

Here are the problem places (some of that are already marked with **** in second attachment):

  1. your thread pool (workers) operate within context (holding the mutex and variables like "working") identified by [tsv::get shared context], thereby it does no matter which $x is currently used, because if main thread would switch the context, the content of [tsv::get shared context] is incremented surely together with $x, but "old" still alive workers would simply write/read all the values in/from "new" context.
  2. The main thread (due to several RC) does not really wait unless all workers are going down.
  3. The context self is not identified in workers (so if main thread switch it within lock, the old still alive thread uses new context as it would be own, what is wrong.

Here is one example of possible scenarios where it can take place:

  1. let the iteration in repeat_command be 11 (x = 11), main thread set "shared context" to 11 and starts 6 threads, 4 incrementing and decrementing "working" and going down. But 2 are still in the obtaining of first lock (waiting for unlock, main thread holds the mutex), so have not incremented "working".
  2. The value of "working" is now 0 (4*+1 and 4*-1).
  3. Now main thread want to enter while cycle, but the condition checking [tsv::get [tsv::get shared context] working] > 0 is false, because it is 0.
  4. The main thread does unlock of mutex, ends perform_command and returns to repeat_command and start next iteration (so x will be 12)
  5. Before it begins to switch context, because the previous mutex was unlocked, 2 "old" alive threads entering locking area one after the other and incrementing the value of "working" (now it is 2).
  6. Now the main thread (in the next iteration enters perform_command and sets new "shared context" to 12 and then overwrite global tsv context with new mutex and cond-var, set "working" and "finish_notification" to 0.
    So the value of [tsv::get shared context] is 12 (but it does no matter because the workers never check that).
  7. Before the main thread begins to start new 6 threads, 2 "old" alive threads entering now second locking area one after the other and now decrementing the value of "working". But now it is a new context, and its "working" was 0.
    So after tsv::incr [tsv::get shared context] working -1 of thread-5 it will be -1.
    And after tsv::incr [tsv::get shared context] working -1 of thread-6 it will be -2.
This way the mutex (and missing identifier and wait for end of thread-pool) are blame for the race condition and you get negative value in tsv-value "working" (but if one does not see this RC, one would say it is impossible).

This is pretty good reproducible and can be easily comprehensible if one adds logging messages (value $x as well as value of [tsv::get shared context] in threads before decrementing).

One will see something like that:

MAIN:  ctx: 11 | Start of perform_command.
TH01:  ctx: 11 | working +1 = 1 threads are working.
TH01:  ctx: 11 | working -1 = 0 threads are working.
TH02:  ctx: 11 | working +1 = 1 threads are working.
TH03:  ctx: 11 | working +1 = 2 threads are working.
TH04:  ctx: 11 | working +1 = 3 threads are working.
TH02:  ctx: 11 | working -1 = 2 threads are working.
TH04:  ctx: 11 | working -1 = 1 threads are working.
TH03:  ctx: 11 | working -1 = 0 threads are working.
MAIN:  ctx: 11 | Worker threads were started.
MAIN:  ctx: 11 | End of perform_command.
TH05:  ctx: 11 | working +1 = 1 threads are working.
TH06:  ctx: 11 | working +1 = 2 threads are working.
*********** here is the switch of the context (next iteration) ***********
MAIN:  ctx: 12 | Start of perform_command.
TH06:  ctx: 12 | working -1 = -1 threads are working.
TH05:  ctx: 12 | working -1 = -2 threads are working.

So I'm pretty sure your script is wrong (due to many RC's) and therefore the issue is invalid here. Neither conditional/mutex, nor the tsv-storage have any synchronization issue in the direction you're trying to claim.

Regards, Serg.


elfring added on 2019-05-08 09:00:19:
> So forget the first script, and take a look at the second attachment.

I responded also to information in both TCL scripts a moment ago.


> Your script creates your own thread pool 

Some threads are just repeatedly created. - But they are not kept in a “pool”.


> I have send you a mail yesterday in German possibly better explaining your RC's.

I am curious on how the clarification will evolve further (eventually also with the help of a telephone call).

sebres added on 2019-05-08 07:35:09:

> It can demonstrate also specific ideas.

No, there is still one attached script demonstrated four ! RC you introduced.

So forget the first script, and take a look at the second attachment.

> There are no thread pools involved in the test scripts for this issue

What? Your script creates your own thread pool (6 threads).

And even due to mentioned race conditions (BUGS IN YOUR SCRIPT marked from me wiith ****) does it wrong (a second pool started before first "releases" the context).

> We disagree still about specific implementation details.

No, I told you that your script is totally and basically wrong, because it has at least 4 bugs. You trying to say me something about different approach.

BTW take a look in your mailbox (possibly spam-folder) I have send you a mail yesterday in German possibly better explaining your RC's.


elfring added on 2019-05-08 06:10:54:
> I'll attach a "proper" asynchronous pool implementation similar yours,

Thanks for your attached example script.

It can demonstrate also specific ideas.


> but without conditional wait and extra mutex

This difference is relevant for the understanding of the really wanted results while noticing undesirable software behaviour.


I suggest to reconsider data processing properties around the following piece of code from your variant of the command “handle_input”.


…
>        while {$::status ni {"endoftask" "shutdown"}} {
>          vwait ::status; # wait for end of work
>        }
…


These “worker threads” should not wait for an end because they should perform only their own work before they will eventually inform the main control thread about their immediate termination.




>You use a mutex reference saved in the shared tsv,

Yes, exactly.


> and then another thread pool overwrites this tsv-value, 

There are no thread pools involved in the test scripts for this issue.
Each mutex handle is stored in an other thread shared variable (which is selected by the shared variable “context” or the command parameter “ctx”), isn't it?


> so a conditional notify and/or unlock mutex later would access another mutex initialized for another run

I would disagree to your conclusion in the concrete case while I can follow this information in the direction that inappropriate data accesses can happen after software failures.


> - you would see the context is incremented in-between and the mutex is not the same

I did not get such an impression from my test scripts for this issue so far.


> Yes I'm familiar

This is nice.


> (but it looks like you are not familiar with multi-threaded environments completely,

I am used to them for several years (as you can also determine from public archives).


> because you try to use it totally wrong way).

We disagree still about specific implementation details.


> The main point (you are missing to understand) is the usage of unique identifier ($ctx in my case) to avoid overwriting of the context by next run.

I am using the thread shared variable “context” for this purpose.


> Or even to wait until all threads going down before you continue next run.

This should usually happen.


> As already said, your approach is wrong per definition, so it can not work properly at all.

I disagree to this view.


> There is nothing to discuss or to repair, no matter which primitives or facilities you would use.

We present different perspectives for this software (and communication) situation.


> See new attachment where you can see how it may be simply repaired - either if you would resolve ALL race conditions you have introduced there,

The proper handling of race conditions is a general challenge for multi-threaded programming.


…
>        ## **** HERE IS A RACE CONDITION: YOU CAN OBTAIN MUTEX AFTER MAIN THREAD LEAVE perform_command

This information is mostly appropriate.

But all mutexes (which were created for the variable “pcm”) are kept usable for this test (because I omitted the resource clean-up for them at the moment).


>        log "[tsv::incr [tsv::get shared context] working] threads are working."
…

I find that the counter for the working extra threads is properly managed.


…
>        ## **** HERE IS A RACE CONDITION: YOU CAN NOTIFY A CALLER THAT THE LAST THREAD IS GOING, BUT SOME MAY BE STILL THERE
>        tsv::incr [tsv::get shared context] working -1
…

If this counter is consistent, then no additional threads will be left over.


…
>    ## **** HERE IS A RACE CONDITION: IF working IS 0 (YOU DON'T ENTER WAIT AT ALL)
>    while {[set w [tsv::get [tsv::get shared context] working]] > 0} {
…

Your information is appropriate.

If it was determined that no worker threads were left over, the next loop iteration can be executed finally.



> or with a simple trick I fixed that (marked as IMPORTANT part).

I find the call of the command “thread::release -wait $t($x)” unnecessary because the affected threads should have been immediately finished after a possible termination notification.

sebres added on 2019-05-07 20:30:04:

> We might interpret this information in different ways.

There is nothing to interpret:

You use a mutex reference saved in the shared tsv, and then another thread pool overwrites this tsv-value, so a conditional notify and/or unlock mutex later would access another mutex initialized for another run (so for example you're trying to unlock a mutex which was never locked in this worker).

Just try to log mutex name (or rather a value of [tsv::get shared context]) in case of error (catch around unlock if "mutex is not locked" error throw) - you would see the context is incremented in-between and the mutex is not the same. As already multiple times said you switches the context after unlock in main thread too fast so some of 6 threads entering the RC (doing its unlock after switch of context tsv).

> Are you familiar also with the application of the synchonisation primitive “condition variable”?

Yes I'm familiar (but it looks like you are not familiar with multi-threaded environments completely, because you try to use it totally wrong way). And yes, you can rewrite my attached script to use conditional vars and mutex instead of asynchronous handling.

The main point (you are missing to understand) is the usage of unique identifier ($ctx in my case) to avoid overwriting of the context by next run.
Or even to wait until all threads going down before you continue next run.

As already said, your approach is wrong per definition, so it can not work properly at all.

There is nothing to discuss or to repair, no matter which primitives or facilities you would use.

See new attachment where you can see how it may be simply repaired - either if you would resolve ALL race conditions you have introduced there, or with a simple trick I fixed that (marked as IMPORTANT part).
Note all places marked with "****" which explaining the possible RC.
Especially IMPORTANT part at end waiting for the threads exited (if you switch `if 1 {...}` to `if 0 {...}` you will see your errors again).
But it is only one possible solution (or even workaround), another would be to change your entirely wrong approach (or resolve all RC's or even use unique context id to avoid replacements, etc pp).

Still one (last) try from my side: A protection of context with mutex does not help if your context (which contains mutex reference) switches in-between (because your main thread does not really know how many threads are started, obtained the mutex and context or are ended before).

So I'll close it again due to basic misconception and too many errors in your example script.
Please read what is a race condition before you post here again!


elfring added on 2019-05-07 18:51:14:
> I don't need to "decrypt" your entire script

It seems to be very challenging to achieve a better common understanding for the shown use case.


> to see that several things are basically wrong here:

I disagree to specific details from the observable software behaviour.


>your shared context could be overwritten in repeat_command

We might interpret this information in different ways.

1. The command “tsv::set shared context $x” is executed in a for loop as expected.

2. It is demonstrated that the iteration number is used by the called functions in the intended way for data access with thread shared variables.


> if it is intended to wait (in perform_command) until all tasks are done,

This is one of the main goals of this test case.


> it is not really succeeded

I observed that the success depends on some configuration parameters here.


> (due to wrong implementation of conditional wait in case of all 6 workers, as well as the wait for end it completelly wrong)

How should the wait be correctly performed (together with condition variables) according to your software development experience?


> why you're trying to implement your own thread-pool if it is already there (see tpool)?

I do not try a test development for the TCL programming language in this direction.


> or why you won't use "thread::send" with answer via variable?

This detail is an important difference for my test variant. Each started thread should eventually notify the main control thread about its usual termination.


> tsv::* are atomic operations

This information is generally helpful.


> (you don't need to protect it with mutex

A single mutex is created at the beginning of the procedure “perform_command” so that it can be used for the command “thread::cond wait [tsv::get [tsv::get shared context] pcc] ${pcm}” there.
Would you like to suggest a reuse of any other mutex for this data synchonisation interface?


> if you don't need to change consequently a lot of members there,

The number of shared variables is reasonably small for such a test approach, isn't it?


> but also in this case you can use tsv::lock)

This command can be better used when more statements should be unconditionally combined.


> you don't need a conditional wait/notify here (I don't recognize any reason why it may be expected in your code)

Are you familiar also with the application of the synchonisation primitive “condition variable”?


> or simply set your finish_notification in tsv context

Does the command “tsv::set ${context} finish_notification 1” (line 63) fit already to this suggestion?


> Although it would be simplier with tpool and other constructs

These programming tools will become more interesting for other data processing approaches than the small test case which we try to discuss (and clarify) here.

sebres added on 2019-05-07 17:25:01:

> Do you find the following test script variant easier to understand for the clarification of desirable software behaviour?

I don't need to "decrypt" your entire script to see that several things are basically wrong here:

  • your shared context could be overwritten in repeat_command... better use some identity value in order to target a thread-pool or the context like [tsv::get shared context $ctxid] if you cannot guarantee safe work (wait for completion in repeat_command)
  • if it is intended to wait (in perform_command) until all tasks are done, it is not really succeeded (due to wrong implementation of conditional wait in case of all 6 workers, as well as the wait for end it completelly wrong)
  • why you're trying to implement your own thread-pool if it is already there (see tpool)?
  • or why you won't use "thread::send" with answer via variable? etc pp

Also note:

  • thread creation (and initialization) as well as destruction are expensive operations, so normally one tries to avoid that
  • tsv::* are atomic operations (you don't need to protect it with mutex if you don't need to change consequently a lot of members there, but also in this case you can use tsv::lock)
  • you don't need a conditional wait/notify here (I don't recognize any reason why it may be expected in your code)... to signal stop thread you can send to thread something, call thread::release (if thread is waiting) or simply set your finish_notification in tsv context... to signal main thread, all workers are done, just increment some tsv-value or even wait for a variable (and send from threads any command to main thread signaling it goes down)

Anyway, because of basic misconception and too many errors in your example script, I'll close this ticket as invalid.

Although it would be simplier with tpool and other constructs (but you want this solution and I don't know what exactly is needed), I'll attach a "proper" asynchronous pool implementation similar yours, but without conditional wait and extra mutex (the mutex is created there and you can use it, but it is not used internally).


elfring added on 2019-05-07 05:08:11:

Do you find the following test script variant easier to understand for the clarification of desirable software behaviour?

puts stderr "Thread package version: [package require Thread]"
set results {}

proc perform_command {} \
{
thread::mutex lock [set pcm [tsv::set [set context [tsv::get shared context]] pcm [thread::mutex create]]]
puts stderr "Initial pcm: ${pcm}"
tsv::set ${context} pcc [thread::cond create]

try \
   {
   tsv::set ${context} working 0
   tsv::set ${context} finish_notification 0
   } \
finally \
   {
   proc main_thread_unlock_start {} \
   {
   puts stderr "main_thread_unlock_start pcm: [set pcm [tsv::get [tsv::get shared context] pcm]]"
   thread::mutex unlock ${pcm}
   }
   main_thread_unlock_start
   }

for {set x 1} {$x <= 3} {incr x} \
   {
   thread::create {
                  proc handle_input {} \
                  {
                  puts stderr "[tsv::incr [set context [tsv::get shared context]] working] threads are working."
                  thread::mutex lock [tsv::get [tsv::get shared context] pcm]

                  try \
                     {
                     tsv::incr ${context} working -1

                     if [tsv::get ${context} finish_notification] \
                        {
                        puts stderr {thread::cond notify finish}
                        thread::cond notify [tsv::get ${context} pcc]
                        }
                     } \
                  finally \
                     {
                     proc handle_input_unlock_end {} \
                     {
                     puts stderr "handle_input_unlock_end pcm: [set pcm [tsv::get [tsv::get shared context] pcm]]"
                     thread::mutex unlock ${pcm}
                     }
                     handle_input_unlock_end
                     }
                  }

                  handle_input
                  }
   }

puts stderr "Worker threads were started."
thread::mutex lock ${pcm}

try \
   {
   tsv::set ${context} finish_notification 1

   while {[set w [tsv::get ${context} working]] > 0} \
      {
      puts stderr "cond wait finish for $w threads"
      thread::cond wait [tsv::get [tsv::get shared context] pcc] ${pcm}
      }
   } \
finally \
   {
   proc main_thread_unlock_finish {} \
   {
   puts stderr "main_thread_unlock_finish pcm: [set pcm [tsv::get [tsv::get shared context] pcm]]"
   thread::mutex unlock ${pcm}
   }
   main_thread_unlock_finish
   }

global results
lappend results [llength [thread::names]]
#thread::cond destroy [tsv::get [tsv::get shared context] pcc]
#thread::mutex destroy [tsv::get [tsv::get shared context] pcm]
}

proc repeat_command {} \
{
for {set x 1} {$x <= 2} {incr x} \
   {
   puts stderr "$x. run"
   tsv::set shared context $x
   perform_command
   }
}

proc run_analysis {} \
{
repeat_command
global results

if [llength ${results}] \
   {
   proc report {} \
   {
   global results

   foreach k ${results} \
      {
      dict incr counts $k
      }

   set delimiter |
   puts [join {{"still running threads"} incidence} ${delimiter}]

   dict for {k v} ${counts} \
      {
      puts "$k${delimiter}$v"
      }
   }

   report
   } \
else \
   {
   puts stderr {No result for this analysis!}
   }
}

run_analysis

Example for a test result:
"still running threads"|incidence
1|2


elfring added on 2019-05-06 18:31:02:
> What do you mean here as "different"?

I suggest to take another look at the code specification “tsv::set [tsv::get shared context] pcm [thread::mutex create]” from my procedure “perform_command” (which picks an interation number up that was set by my procedure “repeat_command” in the published example).


> Why it should be?

I tried the programming opportunity out for the maximum separation between these data synchronisation objects.


> Value of <code>[tsv::get shared context]</code> is the same across ALL THREADS INSIDE THE PROCESS

It seems that you will need another moment to understand my test cast and corresponding data processing goals better.
The value should be the same for a while (during execution of the procedure “perform_command”).


> and would change,

Yes, this should happen a few times for the shown test.


> if you'd set it somewhere.

Is the intented modification be triggered by the code specification “tsv::set shared context $x” in a for loop of the procedure “repeat_command”?


> This way all your threads would change the "context" to the last one.

I have got an other expectation on how this design approach should work finally.

sebres added on 2019-05-06 16:18:25:
> The identifiers will be stored in different thread shared variables, won't they?

What do you mean here as "different"? Why it should be?

Value of <code>[tsv::get shared context]</code> is the same across ALL THREADS INSIDE THE PROCESS and would change, if you'd set it somewhere.<br/>
This way all your threads would change the "context" to the last one.

elfring added on 2019-05-06 15:57:47:
> Why did you not understand that you are trying to create several mutexes and would overwrite a tsv-value containg mutex handle with another values within your on demand mutex initialization? 

I get the impression from this question that more efforts will be needed to achieve a better common understanding (also for the shown test script).
The created mutex handle should not be overwritten in this test variant.


> it create a new mutex

This is intended here (for each loop iteration).


> and overwrites with this new mutex handle the value in tsv-storage

The identifiers will be stored in different thread shared variables, won't they?


> the later calls of lock/unlock may read another (overwritten) value here from tsv-storage

I suggest to take another look at the desired implementation for the mutual exclusion.


> 1. EXAMPLE illustrates CORRECT dynamic (on demand) initialization of mutex:

Your procedure “test1” refers to the constant identifier “shared” for a thread shared variable.
(I have constructed also a test variant before which works in this way.)


> 2. EXAMPLE illustrates WRONG dynamic (on demand) initialization of mutex:

Why do you find the command from the procedure “test2” wrong when it is identical to the previous one?


> There are 4 threads which simply execute body …

Our software design approaches are different in the way that you would like to show an approach for the handling of the readyness for these threads while I want to check the proper termination of them by the programming means of corresponding condition variables.


> Anyway in all cases it should be exactly the same mutex handle to supply it later to lock/unlock/whatever in order to protect some resources.

I suggest to increase the precision for the appropriate parameter passing.
A single combination of mutexes and condition variables should be sufficient for our simple test examples, shouldn't it?


Did you dare to try my test script out?

sebres added on 2019-05-06 14:23:07:

As already said, you are wrong here.

Why did you not understand that you are trying to create several mutexes and would overwrite a tsv-value containg mutex handle with another values within your on demand mutex initialization?

Still again each time your thread is entering a perform_command and uses tsv::set [tsv::get shared context] pcm [thread::mutex create]:

  • it create a new mutex
  • and overwrites with this new mutex handle the value in tsv-storage
  • the later calls of lock/unlock may read another (overwritten) value here from tsv-storage (this way you introduce a RC here)

Just to explain what you are trying to do (and what exactly you still misunderstand):

package require Thread

set i 0; time { set ::t([incr i]) [thread::create]; thread::send $::t($i) [list set ::MTH [thread::id]] } 4 proc test {body} { catch {tsv::unset shared pcm} set ::ready 0 # eval body within 4 threads 1000 times: set i 0; time {thread::send -async $::t([incr i]) [list time $body 1000]} 4 # output last known mutex handle in each thread and signal ready: set i 0; time { thread::send -async $::t([incr i]) { puts "[thread::id] uses a mutex [tsv::get shared pcm]" thread::send -async $MTH {incr ::ready} }} 4 # wait for ready from all 4 threads: while {$::ready < 4} {vwait ::ready} puts "MAIN: created mutex [tsv::get shared pcm]" }

# 1. EXAMPLE illustrates CORRECT dynamic (on demand) initialization of mutex:

proc test1 {} { test { if {![tsv::exists shared pcm]} { tsv::lock shared { if {![tsv::exists shared pcm]} { tsv::set shared pcm [thread::mutex create] puts "[thread::id] created a mutex [tsv::get shared pcm] and set it to tsv" } } } } } test1

# 2. EXAMPLE illustrates WRONG dynamic (on demand) initialization of mutex:

proc test2 {} { test { tsv::set shared pcm [thread::mutex create] } } test2

There are 4 threads which simply execute body in 4 threads 1000 times each thread (e. g. a mutex initialization) in parallel... Just in test1 compared to test2 - there is a protection within lock and with checking of the existence for tsv-value which contain mutex.

As you can see the result of test1 (1. EXAMPLE) - exactly 1 mutex is created and finally stored in tsv:

% test1
tid0000263C created a mutex mid0 and set it to tsv
tid00002B4C uses a mutex mid0
tid0000263C uses a mutex mid0
tid00002AD8 uses a mutex mid0
tid00000774 uses a mutex mid0
MAIN: created mutex mid0
And the by the "wrong" initialization (similar to that what you are trying to do) - the result of test2 (2. EXAMPLE) looks like here (4 * 1000 mutex's are created, each thread assume to have a different mutex, the last thread wins with last created mutex):
% test2
tid00002AD8 uses a mutex mid3670
tid00000774 uses a mutex mid3950
tid0000263C uses a mutex mid3960
tid00002B4C uses a mutex mid4000
MAIN: created mutex mid4000

As already said:

  • either you should create a mutex in main thread (before each worker would use it);
  • or if on demand mutex creation is expected, you should check the mutex is already created by other worker (and this should occur in locked area, to avoid overwriting of tsv-value, like in example 1).
Anyway in all cases it should be exactly the same mutex handle to supply it later to lock/unlock/whatever in order to protect some resources.


Attachments: