A ping from threaders' prison
Just sending out a ping that I am here... but just that...
I'm being held captive in threaders' prison.
You may know what that means. If you don't, here's an example:
Earlier this week, quite by chance during the coding of a port handler, I noticed the single simple line of C code that pushes a value on the stack:
DS_PUSH(val);
generated this machine code:
004057B5 8B 55 FC mov edx,dword ptr [ebp-4]
004057B8 A1 C4 24 46 00 mov eax,[__tls_index (004624c4)]
004057BD 64 8B 0D 2C 00 00 00 mov ecx,dword ptr fs:[2Ch]
004057C4 8B 04 81 mov eax,dword ptr [ecx+eax*4]
004057C7 8B 0D C4 24 46 00 mov ecx,dword ptr [__tls_index (004624c4)]
004057CD 64 8B 35 2C 00 00 00 mov esi,dword ptr fs:[2Ch]
004057D4 8B 0C 8E mov ecx,dword ptr [esi+ecx*4]
004057D7 8B 89 34 00 00 00 mov ecx,dword ptr [ecx+34h]
004057DD 83 C1 01 add ecx,1
004057E0 8B 35 C4 24 46 00 mov esi,dword ptr [__tls_index (004624c4)]
004057E6 64 8B 3D 2C 00 00 00 mov edi,dword ptr fs:[2Ch]
004057ED 8B 34 B7 mov esi,dword ptr [edi+esi*4]
004057F0 89 8E 34 00 00 00 mov dword ptr [esi+34h],ecx
004057F6 8B 0D C4 24 46 00 mov ecx,dword ptr [__tls_index (004624c4)]
004057FC 64 8B 35 2C 00 00 00 mov esi,dword ptr fs:[2Ch]
00405803 8B 0C 8E mov ecx,dword ptr [esi+ecx*4]
00405806 8B 89 34 00 00 00 mov ecx,dword ptr [ecx+34h]
0040580C C1 E1 04 shl ecx,4
0040580F 8B 80 30 00 00 00 mov eax,dword ptr [eax+30h]
00405815 03 C1 add eax,ecx
00405817 8B 0A mov ecx,dword ptr [edx]
00405819 89 08 mov dword ptr [eax],ecx
0040581B 8B 4A 04 mov ecx,dword ptr [edx+4]
0040581E 89 48 04 mov dword ptr [eax+4],ecx
00405821 8B 4A 08 mov ecx,dword ptr [edx+8]
00405824 89 48 08 mov dword ptr [eax+8],ecx
00405827 8B 52 0C mov edx,dword ptr [edx+0Ch]
0040582A 89 50 0C mov dword ptr [eax+0Ch],edx
Even though this is non-optimized, in a perfect world on a prefect CPU, that should be about 4 or 5 instructions.
It sure got me rethinking the usage of TLS variables, at least on x86 Win32 implementations. I decided not to be held captive by the compiler to any degree (on any OS model) and recode large parts of the VM and natives to avoid TLS references (caching them SP relative instead).
I really didn't think I'd need to be doing this in the year 2007.
A human-based global flow analysis!? Makes me homesick for the old A5 CPU register, you know what I mean? Or a CPU with a thread base register, or I'd even take a thread-local remap on a VM base page for TLS globals. Or just maybe... cool stuff like that happens when -O2 is enabled? Please say "yes".)
6 Comments
|