|
Cheat Engine The Official Site of Cheat Engine
|
View previous topic :: View next topic |
Author |
Message |
KryziK Expert Cheater Reputation: 3
Joined: 16 Aug 2009 Posts: 199
|
Posted: Sat Sep 12, 2015 9:06 pm Post subject: Assembler Help |
|
|
Hey all, I'm working on writing a small assembler for a subset of the Intel instructions, and I had a few questions:
1. In the CE source, in assemblerunit.pas, I noticed the following line:
Code: | (mnemonic:'ADD';opcode1:eo_iw;paramtype1:par_AX;paramtype2:par_imm16;bytes:2;bt1:$66;bt2:$05) |
I've read through some of the manual and found that this seemingly misplaced 0x66 byte might correspond to VEX. Is this true? I've read that VEX extends the instruction set and allows for more operands. How important are these instructions right now? Could I probably get by without including them for the time being? I just want to start with the most used instructions, like an add, jump, mov, etc..
2. How does the assembler know which instruction to choose? For example, if I had a jmp instruction, how does the assembler know which jump rule to pick? Does it calculate the size of the immediate operand and see if it's larger than an 8-bit immediate (and if so, use the 32-bit immediate)? Or perhaps it depends on how large the offset between the jump instruction and the jump target?
These are just two instruction examples, but the concepts apply for many more instructions, so any information would be helpful. I've written a simplified MIPS assembler, but that one didn't have any duplicate mnemonics with different operands, so I am a bit lost in the Cheat Engine source code trying to find how it takes a set of tokens and grabs the correct opcodes array in order to build the output bytes.
3. Some instructions in the Intel manual are not in the Cheat Engine source. I was wondering what the reason for only including specific ones was. For example, Cheat Engine has 3 definitions for jmp, but the Intel manual has 11. I'm trying not to copy from Cheat Engine, but how would I know which ones I need or not?
Thank you!
Update: This link is very helpful for understanding the Mod R/M and SIB bytes: http://www.c-jump.com/CIS77/CPU/x86/index.html
Last edited by KryziK on Wed Oct 21, 2015 10:00 pm; edited 1 time in total |
|
Back to top |
|
|
Dark Byte Site Admin Reputation: 458
Joined: 09 May 2003 Posts: 25296 Location: The netherlands
|
Posted: Sun Sep 13, 2015 2:28 am Post subject: |
|
|
1 the 0x66 prefix switches between 16 and 32 bit operation (0x67 does the same but then for addressing, which ce does not support)
2 the prefered instruction is at top, but if the instructiin can’t be encoded it’ll for an alternate one (e g when you wish to encode a value bigger than 128 it can’t use the 2 byte jmp
3 a lot of those entries are duplicates with a minot diference (e.g it shows a different entry for 32 and 64 bit mode. CE just deals with that afterwards by setting a rex prefix)
besides that, some newer instructions aren't handled yet like VEX
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
Back to top |
|
|
KryziK Expert Cheater Reputation: 3
Joined: 16 Aug 2009 Posts: 199
|
Posted: Sun Sep 13, 2015 11:38 am Post subject: |
|
|
2. So, for my second question, I see that CE finds the range of instructions in the array opcodes with the matching mnemonic, but what I really wanted to know was how CE determines whether or not the entry in opcodes is the right one. So, it starts at the first one, which is preferred, how does it determine whether the operands match what the user has typed? My guess was in the original post: It checks the size of the immediate that the user has typed, and then just makes sure that it fits into the size specified in opcodes.
Hopefully that was clear enough to understand. I'm just trying to figure out how CE "weeds out" the opcodes that don't match what the user typed. It's hard to read through so many if statements and such.
3. I see, in the Intel manual, some instructions that are 64 bit, such as:
Code: | REX.W + FF /5 JMP m16:64 |
I understand that these can be omitted because the REX prefix can be added later on. However, for jmp, there are a total of 10 entries in the Intel manual that do not have this REX prefix in front of them. So, I am just trying to figure out why there are only 3 entries total in CE's opcodes array. Without copying CE's opcodes array, how would I determine which entries from the Intel manual I really needed?
Thanks for your reply!
|
|
Back to top |
|
|
Dark Byte Site Admin Reputation: 458
Joined: 09 May 2003 Posts: 25296 Location: The netherlands
|
Posted: Sun Sep 13, 2015 1:12 pm Post subject: |
|
|
2 yes, you're right
it checks the size of the immediate and then goes through the list
first the 2 byte jmp will get checked if it matches, and then the 5 byte one
if the immediate is bigger than 128 (1 byte) the 2 byte jmp fails, so it goes for the 5 byte jmp
the assembler array contains the parameters they expect
3 take that example you post there. CE has no real use for that one (with or without REX prefix) as changing the code segment is grnerally a bad idea. (sure, you could do tricks like executing 64 bit code in a 32 process, but come on... for ce?)
as for jmp rel16, for some reason it zeroes the upper 16 bits of EIP so for all purpose, useless
the jmp r/m# instructions are just 1 instruction. it may be shown 3 times, but they are just the same instruction. It depends on the cpu state how it gets handled (e. g. in 64 execution mode modrm can be encoded as rip relative)
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
Back to top |
|
|
KryziK Expert Cheater Reputation: 3
Joined: 16 Aug 2009 Posts: 199
|
Posted: Sun Sep 13, 2015 5:59 pm Post subject: |
|
|
Thanks for your replies! I'll take what you have said into account when coding and reading the Intel manual.
Unfortunately, with classes and a job, who knows if I'll have enough time to work on this for very long. But, for now, I'll try!
Thanks again, DB. You're the best. <3
|
|
Back to top |
|
|
KryziK Expert Cheater Reputation: 3
Joined: 16 Aug 2009 Posts: 199
|
Posted: Wed Sep 23, 2015 10:51 pm Post subject: |
|
|
DB,
Could you explain modR/M and how the /digit opcode works?
For example, the PUSH instruction (on 4-271 of Intel Manual):
I was under the impression that /0 through /7 directly related to a register. So are these 3 "FF /6" instructions referring to DH/SI/ESI? If not, what do they refer to?
I am trying to create a function to build the modR/M value based on an input. It looks like you check if there are any "[]"s to determine whether or not the input is a register or not. Is this true?
Going back to the PUSH instruction, if I wanted to say the following:
Then the bytes turn out to be "FF 36". So, that means that I first look for ESI in the top section (for a value of 110) and then find where it matches on the left side ([ESI] for a value of mod 00 and r/m 110). Is this sort of how it works? What is the difference between all of the following (2-operand examples):
Code: |
RM ModRM:reg (r, w) ModRM:r/m (r)
MR ModRM:r/m (r, w) ModRM:reg (r)
|
Does the "reg" or "r/m" part say whether to look up that operand as a column or row in the mentioned table? What is the "r" and "r, w" part for?
Also, what is the difference between an opcode type of /digit (/0 - /7) and /r? Both are registers, are they not? Does the numbered type just force it to be a specific register rather than leaving it open?
Thank you for your time. I was making progress until I realized that modR/M was a special thing that had to be "built".
|
|
Back to top |
|
|
Dark Byte Site Admin Reputation: 458
Joined: 09 May 2003 Posts: 25296 Location: The netherlands
|
Posted: Thu Sep 24, 2015 4:00 am Post subject: |
|
|
for instructions that don't need 2 parameters, the reg field of the modr/m byte can be part of the instruction
in case of push that's 110 , so the real instruction is 11111111 110
modR/M is too complex to explain, but the intel guide with all the instructions has a chapter on how the modr/m and sib bytes are build (and the offsets, and several other special cases)
as for the r,w stuff, no idea, i never looked at those
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
Back to top |
|
|
KryziK Expert Cheater Reputation: 3
Joined: 16 Aug 2009 Posts: 199
|
Posted: Thu Sep 24, 2015 12:14 pm Post subject: |
|
|
Could you explain that first part again? What do you mean the reg field can be part of the instruction? Why is push "11111111 110"? That comes out to "FF 06", which is an inc instruction according to CE. I'm sure that I'm misunderstanding something here.
Also, could you answer my last question about /digit vs /r (eo_reg vs eo_reg0/1/2/3/4/5/6/7)? I can tell that this difference is important in understanding what you just explained, because:
Code: | • /digit — A digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register
or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode. |
is basically what it sounds like you are saying. From the table, I can't tell how you would get 0x36 as the ModR/M byte from just "[ESI]" as one parameter unless you look up the row with "[ESI]" and the column with "ESI".
Idea: Is it the eo_reg6 part that tells you to go to the column of value 110 (DH/SI/ESI/MM6/XMM6)? If so, does that mean eo_reg would be any register, not a specific one, and so you'd have to parse that register for the register (column) to go to? So something like "add edx,ebx" would be eo_reg with params par_r32 and par_rm32, and so you'd go to the column of EDX (010) and the row of EBX (011) to get the ModR/M byte of 11 010 011 aka 0xD3 (for a total of 03 D3)?
Let me know. I really am not asking for spoon-fed answers; I am trying to grasp these concepts on my own (as you can tell by my thought process in this post, assuming it's correct).
Thank you so much!
|
|
Back to top |
|
|
Dark Byte Site Admin Reputation: 458
Joined: 09 May 2003 Posts: 25296 Location: The netherlands
|
Posted: Sat Sep 26, 2015 3:11 am Post subject: |
|
|
bit 3,4 and 5 make up the reg field, so if those are 110 (6) it'd be a push
well, i guess an alternate way of looking at it is assume FF is a single instruction (e.g. WRT) where the register specifies what kind of operation is done
e. g. :
WRT [1234], EAX = INC [1234]
WRT [1234], ESI = PUSH [1234]
eo_reg6 means that bits 3-5 should be made 6
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
Back to top |
|
|
KryziK Expert Cheater Reputation: 3
Joined: 16 Aug 2009 Posts: 199
|
Posted: Wed Oct 21, 2015 9:57 pm Post subject: |
|
|
Dark Byte,
Thanks for the reply. I have been working on my code ever since your reply. I found this resource which ended up being very helpful:
http://www.c-jump.com/CIS77/CPU/x86/index.html
It provides an easy to understand look at the MOD R/M and SIB bytes, with examples and everything. I'll add the link to the OP, too.
I'm getting closer to finishing! Thanks again.
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
|