I’ve been run in to this kind of problem recently: one day I found out a solution written in python multi-thread approach take more running time than single thread. This is really confusing. I decided to dive into that…and some interesting stuff were found out.
Note that my computer is multi-core processor cpu. i7 2.3Ghz QuardCore CPU, 16G DDR3 Memory. With SSD Hard Drive.
Essentially every thing starts from this little piece of code recursively solving a permutation problem
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Below are how it was wroted in single thread, multi-thread and multi-process. Along with the running time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
Also, I simply increased the task from 10 digit array to 12 digit array, this takes roughly 1 min for the node to solve the problem. so the new result is this: single thread: 50 sec; multi-thread: 72 sec. I also tried implement it using multi-process. surprisingly multi-process cut the time in half. so the basically result is:
single thread RUNTIME T multi-thread RUNTIME T multi-process RUNTIME T/2
This does not explain why there is no improvement from single thread to multi-threads. therefore I took a snapshot of CPU workload.
Note that even single thread it actually involves all my cpu cores. I’m now suspecting maybe CPython interpreter/optimizer or Intel CPU instruction set is pretty smart to handle even single thread solution to force it run on multiple CPUs.