ulysse31 Master Cheater Reputation: 2
Joined: 19 Mar 2015 Posts: 324 Location: Paris
|
Posted: Sat Jan 07, 2017 11:12 am Post subject: Tips to make my scanner more efficient |
|
|
Language is c++.
This is how my scanner is designed :
It uses virtual query to get all the requested memory information and it allocates on the heap buffers that will match the target process memory (hence on 1,5 GB game it will allocate around 1,5 Gb).
The scanning function will allocate 130 000 bytes on the stack, it will read memory from target process (readprocessmemory api) and compare with the requested value to be compared with.
After comparison, the stack block holding the values read from game is moved to the heap buffers using memmove.
This is done repeatidly while iterating through the memory blocks yielded by virtualQuery.
I've already done several performance improvments but it feels like my current bottleneck is due to the heap allocations not being cached for the first scan (Temporal locality) and possibly false sharing (when each individual processor is attempting to use data in another memory region and attempts to store it in the same cache line), I am using as many threads as core for scans, I have 8 cores.
My currents perfs for scanning w/o fast scan writable+executable memory regions of a 1,5 GB sized game for a first scan at an exact int value range from 18 seconds to 1.5 seconds.
If the game is partly paged to disk (background for some time) and the scanner was just launched, the first scan will be around 15 seconds. If i chopse new scan and then first scan right after, while the work to be done is the same the scan times will decrease as such:
15s -> 10s -> 4s -> 1.8s and will remain around 1.8 s as long as the memory is frequently accessed (at this point it is roughly as fast as CE)
So I am assuming (maybe wrong) that the stack allocated 130 kb are cached and not slowing my scan contrary to the heap allocated 1,5 GB.
I am considering using memmove to move the heap bytes into the stack (by reasonable chuncks) before doing the comparisons so that they are "cached" and then moving them back to the heap, but would that actually work ?
I am also considering allocating much less memory (says 300 Mb instead of 1,5GB), use those same 300 Mbs (hoping that they get cached this way due to "Temporal locality") 5 times with saving them to disk between each refill.
Each direction represents lots of coding so I'd like a tip one which one might be better (or yet another design which I haven't though of) since I am especially not sure that moving chunks of data from heap to stack will result in having them cached.
|
|