Cheat Engine Forum Index Cheat Engine
The Official Site of Cheat Engine
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 


What array comparison algorithm does Cheat Engine use?

 
Post new topic   Reply to topic    Cheat Engine Forum Index -> Cheat Engine
View previous topic :: View next topic  
Author Message
Paprikaskrumpli
Cheater
Reputation: 0

Joined: 19 Dec 2020
Posts: 29

PostPosted: Sun Aug 01, 2021 1:53 pm    Post subject: What array comparison algorithm does Cheat Engine use? Reply with quote

Hi!

What is the most efficient algorithm to perform an AOB scan on a memory page?
Is it possible to go below O(n) complexity?
What algorithm does CE use?

What modifications can I make to improve the speed of the scan?
This is my c++ implementation, I'm doing this internally:

Code:

uint GetPatternInRegion(uint regionBase, std::vector<BYTE>& patternToFind, uint pageSize) {

    BYTE* opPtr = (BYTE*)regionBase;

    size_t nextFirst = 0;

    for (size_t i = 0; i < pageSize-patternToFind.size(); i++)
    {
        for (size_t j = 0; j < patternToFind.size(); j++)
        {
           
            if (nextFirst == 0 && opPtr[i + j] == patternToFind[0]) {
                nextFirst = j;
            }
           

            if (opPtr[i + j] != patternToFind[j]) {
                i += nextFirst;
                nextFirst = 0;
                break;
            }
           
            if (j == patternToFind.size() - 1)
                return regionBase + (uint)i;

        }
    }
    return 0;
}


Current state of the program:
This function and the one that calls it (the one selecting the proper memory pages to scan) are both in a multi-threaded implementation. What I mean by that tldr:
I divide all memory into n (a parameter) equal batches.
Divide all batches to be scanned into the number of processors.
Perform memory validation and aob scan on valid pages.
Repeat on leftover batches. -> (if n = 1 then the first step is discarded and the whole range of memory is divided into the number of processors)

I only scan memory pages with access rights equal to a parameter provided in a wrapper function (for example: only scan PAGE_EXECUTE_READWRITE memory, and of course only committed memory)

What is your expreience with thread creation and destruction overhead?
Where can I find the code for AOB scan in ce's source on github?
Would manually unrolling the comparison steps help?

Benchmark example:
Find this array: 8D 04 40 8D 44 87 08 D9 44 24 0C D9 18 D9 44 24 10 D9 58 04 D9 44 24 14 D9 58 08
Between: 0x0 - 0xffffffff
Where it is located: 0x3d6c9860
Average runtime out of 20 tries, debug: Depending on the number of batxhes: 100-800ms
On: AMD Ryzen 7 4800H


Please share your thoughts and ideas on the topic, and on how i could improve the performance.
I'm not looking to do this with gpu processing.


Last edited by Paprikaskrumpli on Sun Aug 01, 2021 4:50 pm; edited 1 time in total
Back to top
View user's profile Send private message
pursuited357
Cheater
Reputation: 0

Joined: 23 Sep 2020
Posts: 26

PostPosted: Sun Aug 01, 2021 3:42 pm    Post subject: Reply with quote

First thing I would suggest is something harder on your machine.

A scan is ok..

But CE can get more "aggressive" in terms of its impact on the performance of your machine.



Try benchmarking something like - pointer scanner.
Just remain consistent each time you time it.

THEN Id try things in your OS, like eliminating running process's you dont need running as well as useless services...

Thats about when Id start to look at the source to gain more in speed....

Just a thought...
Back to top
View user's profile Send private message
Paprikaskrumpli
Cheater
Reputation: 0

Joined: 19 Dec 2020
Posts: 29

PostPosted: Sun Aug 01, 2021 4:44 pm    Post subject: Reply with quote

pursuited357 wrote:

THEN Id try things in your OS, like eliminating running process's you dont need running as well as useless services...


That...Indeed would speed up the process, but not in a way im looking to do it Laughing
Back to top
View user's profile Send private message
pursuited357
Cheater
Reputation: 0

Joined: 23 Sep 2020
Posts: 26

PostPosted: Sun Aug 01, 2021 6:28 pm    Post subject: Reply with quote

Quote:
That...Indeed would speed up the process, but not in a way im looking to do it Laughing


Ok - I have to ask now....

Two questions...

What is your end goal your trying to get to...

and

If that was "not in a way your looking" -
-- ok I dont even know how to ask on that...

What are you trying to achieve? Obviously if you clean the OS side of things or Hardware - it will speed things up. Which if you compare that to modifying a source code : I think the OS side is a bit easier at the end of the day...
Back to top
View user's profile Send private message
atom0s
Moderator
Reputation: 198

Joined: 25 Jan 2006
Posts: 8517
Location: 127.0.0.1

PostPosted: Sun Aug 01, 2021 11:13 pm    Post subject: Reply with quote

Paprikaskrumpli wrote:

What is the most efficient algorithm to perform an AOB scan on a memory page?


There isn't really a single 'most efficient means of scanning for an AOB. There are multiple factors that will play a role in deciding which can be the best/fastest for a given situation.

For one, depending on the AOB size, it may be more optimal to implement your scans using SIMD ASM instructions. (However, that is pretty complex if you aren't an ASM coder.)

Another example would be if you require wildcards or not. Some things that factor into then would be if not, and the size is divisible by 4 or 8, you could then compare in 4 byte/8 byte chunks for faster searching instead of doing byte by byte etc. If you also require nibble-level scanning that'll play into the speed and algo best used.

In regards to some algo's used, there is:
- Basic byte per byte matching.
- Regular expressions, if you want that kind of overhead.
- Boyer-Moore Horspool.
- Rabin–Karp
- Aho-Corasick

In most cases, any algo is fast enough on modern hardware, but things that also help would be to do some extra prep work before running your AOB scan. Things such as:

- Optimize your AOB as much as possible to best fit into a scanning method you find works best. If you can fit the pattern into 4/8 byte chunks then you can greatly improve the speed of scanning multiple ways. (Including SIMD instructions etc.)

- Avoid scanning memory pages you know will never hold the pattern. If you know that the pattern will never be in a MEM_RESERVE page, skip them when trying to scan within them. If you know the pattern never exists in a guarded page (PAGE_GUARD protection flag) then ignore them as well.

- Make use of async/threading to scan multiple pages at once if you know that it will require walking the pages of the process to find the desired pattern.


Paprikaskrumpli wrote:

What is your expreience with thread creation and destruction overhead?


On modern hardware, you won't feel/notice it at all, to be honest. If you do find that it is a problem, then don't keep killing and recreating threads. Let them spin (with a valid sleep/lock) and wait for being assigned work to do and only kill them off when your work is fully done.


Paprikaskrumpli wrote:

Where can I find the code for AOB scan in ce's source on github?


https://github.com/cheat-engine/cheat-engine/blob/master/Cheat%20Engine/memscan.pas holds the main memory scanning code. The TScanner.FirstScanmem method will probably be what you want to look at to get started in following the code to where its doing its AOB stuff.

_________________
- Retired.
Back to top
View user's profile Send private message Visit poster's website
ParkourPenguin
I post too much
Reputation: 140

Joined: 06 Jul 2014
Posts: 4289

PostPosted: Mon Aug 02, 2021 12:06 am    Post subject: Reply with quote

I would highly recommend you read and understand chapter 2 of these lecture notes if you really care about performance:
https://ppc.cs.aalto.fi/ch2/

My thoughts: the speed of any algorithm that solves this problem would seem to be obviously memory-bound in the best case. Once you reach that, improving the algorithm itself might not have any significant impact on performance except with certain patterns: e.g. a bunch of 0 bytes at the beginning of the pattern might make your naive algorithm cpu-bound (if it isn't already cpu-bound w/o SIMD). This is speculation; I haven't benchmarked any of this; don't take my word for it.

Your bruteforce approach is naive, but if the pattern is short enough, not maintaining any other state (e.g. for lookup tables) might result in better performance. Something like the Knuth–Morris–Pratt algorithm or the Boyer–Moore algorithm would be better in general. I'm sure there are tons of other algorithms too.
SIMD will greatly help. Don't trust the compiler to automatically do it for you.
Preprocessing the bytes being searched in might improve it further: e.g. gather statistics on frequency of single and/or double byte patterns in the game and find rare ones first.

_________________
I don't know where I'm going, but I'll figure it out when I get there.
Back to top
View user's profile Send private message
Dark Byte
Site Admin
Reputation: 458

Joined: 09 May 2003
Posts: 25281
Location: The netherlands

PostPosted: Mon Aug 02, 2021 4:14 am    Post subject: Reply with quote

CE's aobscan just scans for the first byte, and if that one matches, checks the second byte, and it it matches, go to the next else continue searching for the first

You could perhaps use value masks to speed it up but my experience is that it tends to be negligible as often the bottleneck isn't the cpu speed but the memory access time, and using masks might actually slow things down as ALL bits in a mask get compared instead of just the first 8 bits

_________________
Do not ask me about online cheats. I don't know any and wont help finding them.

Like my help? Join me on Patreon so i can keep helping
Back to top
View user's profile Send private message MSN Messenger
Paprikaskrumpli
Cheater
Reputation: 0

Joined: 19 Dec 2020
Posts: 29

PostPosted: Tue Aug 03, 2021 3:05 pm    Post subject: Reply with quote

Thanks everybody, this is very useful information, definetly will study the recommended topics more. ParkourPenguin's recommendation of that lecture is especially mind blowing.

Two things that stood out from your replies that i can straight up ask about:

1. ParkourPenguin mentioned that the above algorithm is naive. I agree, it probably is, but not in a crazy way right? What I mean by this is, Its still linear complexity right? -> Looping trough the page is linear, takes n = page size (mine 4092) steps. -> In the inner loop first i compare the first element, and if it is a mismatch i break. The only time the inner loop runs fully is when i find the pattern. -> Most of the time it takes <4 iterations for the inner loop to break because of a mismatch, and then go to the next byte in the outher loop. -> I am checking for the lead byte of the pattern when in the inner loop and if i encounter it, the outer loop skips to that byte, when the inner breaks.

1.1: Would putting an if statement around the inner loop, checking the first ( or first few) bytes of the pattern if they match help? - i could save some loop operations

2. The other is mentioned a few times, its SIMD operations and compare more bytes at a time. Is there actually a way to parallelise comparisons this way? -> If the pattern's memory location i was scanning for was always divisible by lets say 8 from the module base ( or from some other address, like other module or from 0x0) that would speed up the process 8 folds, but it usually dont seem be that way. -> Therefore i cannot use simd operations to search for the FIRST byte in the pattern.
I now can see how simd operations would apply in a problem that parkourpenguin mentioned, like vector addition or matrix multiplixation, but can't how they apply here. -> You have to check every (?) byte in the target memory whether the pattern starts there.

2.1: Where i can see a big improvement using simd operations is when the first byte of the pattern is found. From there I could compare bytes of the pattern and the memory more at a time, which begs the question:

---If i find the leading byte of the pattern, and set the byte* i use, to for example an unsigned int* or some bigger datatype in size, and do the same to the pattern and compared them as uints or something bigger, would that be faster? Is compareing two ints a single operation, therefore faster then comparing 4 bytes one by one in a loop. If yes, what is the optimal size of the datatype? The process is 32 bit. Can i go above 32 bits with custom datatypes? Would that resoult in a performance drop?

---Is there a way to find a byte in an array of bytes in less steps than its actual index in the array, IN THIS CONTEXT. What i was wondering is a "skip" table: If i encounter any other byte than the first of the pattern, a skip table would tell me what is the minimum number of bytes i can safely skip because of that operation. Example:

During the scan i encounter a JMP operation. I can safely skip 4 bytes foward, because the next 4 bytes will be the relative offset to jump to. Or a MOV, then i can skip 1 byte at least. (Not sure about the numbers but u probably get the concept) - is that plausable?

Edit: This wouldnt work right? What if i skipped on a literal or on a hardcoded address having one if its bytes same as a jmp instruction for example, and my actual pattern was right after that.
Back to top
View user's profile Send private message
ParkourPenguin
I post too much
Reputation: 140

Joined: 06 Jul 2014
Posts: 4289

PostPosted: Tue Aug 03, 2021 8:37 pm    Post subject: Reply with quote

1:
Naive doesn't necessarily mean bad. See "premature optimization is the root of all evil".

That aobscan algorithm is likely going to be one of the hottest spots in your program, which makes it worth optimizing. However, there's only so much you can do before it gets limited by the speed memory can be read from ram. Beyond that, again, there's not much you can do to optimize it further.

In your code, "nextFirst" doesn't seem like it does much. It helps in a specific case where the first byte repeats itself later on in the pattern and every byte up to that point matches the pattern. IMO this would rarely happen in practice.

My advice is to keep it as simple as possible until you need it to go faster.

2:
Using SIMD operations is hard. Take the C library's memchr function for example. It simply finds the location of a byte with a certain value in a region of memory (like an aobscan with pattern size of 1).

I've been interested in Rust recently. Here's an implementation of memchr in rust using SIMD:
https://github.com/BurntSushi/memchr/blob/1233467fa645b8536834801a24d101401b848f29/src/memchr/x86/sse2.rs#L16

That same project has an implementation of memmem, which is pretty much what you're trying to do. If you want an example this seems like a good one.

_________________
I don't know where I'm going, but I'll figure it out when I get there.
Back to top
View user's profile Send private message
atom0s
Moderator
Reputation: 198

Joined: 25 Jan 2006
Posts: 8517
Location: 127.0.0.1

PostPosted: Wed Aug 04, 2021 2:25 am    Post subject: Reply with quote

Here's a few benchmark projects on GitHub that aimed at finding the 'fastest' pattern scanning for game hacking:
https://github.com/learn-more/findpattern-bench
https://github.com/0x1F9F1/pattern-bench

Keep in mind, 'fastest' here is fairly subjective as the setup for this test suite is easily target-coded for to get the best speed based on what the test does. Again, speed will greatly depend on a lot of factors that lead up to what you are scanning for, and what parameters are involved as mentioned before.

There are a couple of SIMD examples included as well.

_________________
- Retired.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Cheat Engine Forum Index -> Cheat Engine All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group

CE Wiki   IRC (#CEF)   Twitter
Third party websites