Cheat Engine

Paprikaskrumpli · Cheater Reputation: 0 Joined: 19 Dec 2020 Posts: 29

Hi!

What is the most efficient algorithm to perform an AOB scan on a memory page?
Is it possible to go below O(n) complexity?
What algorithm does CE use?

What modifications can I make to improve the speed of the scan?
This is my c++ implementation, I'm doing this internally:

pursuited357 · Cheater Reputation: 0 Joined: 23 Sep 2020 Posts: 26

First thing I would suggest is something harder on your machine.

A scan is ok..

But CE can get more "aggressive" in terms of its impact on the performance of your machine.

Try benchmarking something like - pointer scanner.
Just remain consistent each time you time it.

THEN Id try things in your OS, like eliminating running process's you dont need running as well as useless services...

Thats about when Id start to look at the source to gain more in speed....

Just a thought...

Paprikaskrumpli · Cheater Reputation: 0 Joined: 19 Dec 2020 Posts: 29

pursuited357 · Cheater Reputation: 0 Joined: 23 Sep 2020 Posts: 26

atom0s · Posted: Sun Aug 01, 2021 11:13 pm Post subject:

ParkourPenguin · Posted: Mon Aug 02, 2021 12:06 am Post subject:

I would highly recommend you read and understand chapter 2 of these lecture notes if you really care about performance:
https://ppc.cs.aalto.fi/ch2/

My thoughts: the speed of any algorithm that solves this problem would seem to be obviously memory-bound in the best case. Once you reach that, improving the algorithm itself might not have any significant impact on performance except with certain patterns: e.g. a bunch of 0 bytes at the beginning of the pattern might make your naive algorithm cpu-bound (if it isn't already cpu-bound w/o SIMD). This is speculation; I haven't benchmarked any of this; don't take my word for it.

Your bruteforce approach is naive, but if the pattern is short enough, not maintaining any other state (e.g. for lookup tables) might result in better performance. Something like the Knuth–Morris–Pratt algorithm or the Boyer–Moore algorithm would be better in general. I'm sure there are tons of other algorithms too.
SIMD will greatly help. Don't trust the compiler to automatically do it for you.
Preprocessing the bytes being searched in might improve it further: e.g. gather statistics on frequency of single and/or double byte patterns in the game and find rare ones first.

Dark Byte · Posted: Mon Aug 02, 2021 4:14 am Post subject:

CE's aobscan just scans for the first byte, and if that one matches, checks the second byte, and it it matches, go to the next else continue searching for the first

You could perhaps use value masks to speed it up but my experience is that it tends to be negligible as often the bottleneck isn't the cpu speed but the memory access time, and using masks might actually slow things down as ALL bits in a mask get compared instead of just the first 8 bits

Paprikaskrumpli · Cheater Reputation: 0 Joined: 19 Dec 2020 Posts: 29

Thanks everybody, this is very useful information, definetly will study the recommended topics more. ParkourPenguin's recommendation of that lecture is especially mind blowing.

Two things that stood out from your replies that i can straight up ask about:

1. ParkourPenguin mentioned that the above algorithm is naive. I agree, it probably is, but not in a crazy way right? What I mean by this is, Its still linear complexity right? -> Looping trough the page is linear, takes n = page size (mine 4092) steps. -> In the inner loop first i compare the first element, and if it is a mismatch i break. The only time the inner loop runs fully is when i find the pattern. -> Most of the time it takes <4 iterations for the inner loop to break because of a mismatch, and then go to the next byte in the outher loop. -> I am checking for the lead byte of the pattern when in the inner loop and if i encounter it, the outer loop skips to that byte, when the inner breaks.

1.1: Would putting an if statement around the inner loop, checking the first ( or first few) bytes of the pattern if they match help? - i could save some loop operations

2. The other is mentioned a few times, its SIMD operations and compare more bytes at a time. Is there actually a way to parallelise comparisons this way? -> If the pattern's memory location i was scanning for was always divisible by lets say 8 from the module base ( or from some other address, like other module or from 0x0) that would speed up the process 8 folds, but it usually dont seem be that way. -> Therefore i cannot use simd operations to search for the FIRST byte in the pattern.
I now can see how simd operations would apply in a problem that parkourpenguin mentioned, like vector addition or matrix multiplixation, but can't how they apply here. -> You have to check every (?) byte in the target memory whether the pattern starts there.

2.1: Where i can see a big improvement using simd operations is when the first byte of the pattern is found. From there I could compare bytes of the pattern and the memory more at a time, which begs the question:

---If i find the leading byte of the pattern, and set the byte* i use, to for example an unsigned int* or some bigger datatype in size, and do the same to the pattern and compared them as uints or something bigger, would that be faster? Is compareing two ints a single operation, therefore faster then comparing 4 bytes one by one in a loop. If yes, what is the optimal size of the datatype? The process is 32 bit. Can i go above 32 bits with custom datatypes? Would that resoult in a performance drop?

---Is there a way to find a byte in an array of bytes in less steps than its actual index in the array, IN THIS CONTEXT. What i was wondering is a "skip" table: If i encounter any other byte than the first of the pattern, a skip table would tell me what is the minimum number of bytes i can safely skip because of that operation. Example:

During the scan i encounter a JMP operation. I can safely skip 4 bytes foward, because the next 4 bytes will be the relative offset to jump to. Or a MOV, then i can skip 1 byte at least. (Not sure about the numbers but u probably get the concept) - is that plausable?

Edit: This wouldnt work right? What if i skipped on a literal or on a hardcoded address having one if its bytes same as a jmp instruction for example, and my actual pattern was right after that.

ParkourPenguin · Posted: Tue Aug 03, 2021 8:37 pm Post subject:

1:
Naive doesn't necessarily mean bad. See "premature optimization is the root of all evil".

That aobscan algorithm is likely going to be one of the hottest spots in your program, which makes it worth optimizing. However, there's only so much you can do before it gets limited by the speed memory can be read from ram. Beyond that, again, there's not much you can do to optimize it further.

In your code, "nextFirst" doesn't seem like it does much. It helps in a specific case where the first byte repeats itself later on in the pattern and every byte up to that point matches the pattern. IMO this would rarely happen in practice.

My advice is to keep it as simple as possible until you need it to go faster.

2:
Using SIMD operations is hard. Take the C library's memchr function for example. It simply finds the location of a byte with a certain value in a region of memory (like an aobscan with pattern size of 1).

I've been interested in Rust recently. Here's an implementation of memchr in rust using SIMD:
https://github.com/BurntSushi/memchr/blob/1233467fa645b8536834801a24d101401b848f29/src/memchr/x86/sse2.rs#L16

That same project has an implementation of memmem, which is pretty much what you're trying to do. If you want an example this seems like a good one.

atom0s · Posted: Wed Aug 04, 2021 2:25 am Post subject:

Here's a few benchmark projects on GitHub that aimed at finding the 'fastest' pattern scanning for game hacking:
https://github.com/learn-more/findpattern-bench
https://github.com/0x1F9F1/pattern-bench

Keep in mind, 'fastest' here is fairly subjective as the setup for this test suite is easily target-coded for to get the best speed based on what the test does. Again, speed will greatly depend on a lot of factors that lead up to what you are scanning for, and what parameters are involved as mentioned before.

There are a couple of SIMD examples included as well.