Post

Basics of Assembly for Makers and Hackers

Basics of Assembly for Makers and Hackers

Basics of Assembly for Makers and Hackers

Assembly language is a low-level programming language that serves as a direct link between high-level languages and machine code, which provides instructions for the CPU. It is a crucial tool for tasks that require low level hardware control. Understanding the basics of ASM is extremely useful for reverse engineering and debugging/developing low level software.

x86/64 and ARM architecture will be discussed.

CPU Registers:

In order to understand how assembly code works it is a good idea to have a background of how computer memory works.

Registers are memory spaces located internally on the processor. Some main registers are as follows:

  • General-purpose:
    • These registers are used to store temporary data for various data operations.
  • Instruction pointer(program counter):
    • Contains the address to the next instruction to be executed. It also serves as a counter for the number of instructions.
  • Instruction Register:
    • Holds the address of the instruction that is being executed. Once the address from the IP/PC is stored in IR the CPU executes it.
  • Stack and frame pointers:
    • The stack pointer points to the top of the stack. Frame pointer always points to a fixed position in the stack.

Memory:

Memory is a crucial aspect of computers and programs. Memory provides storage space for various data and allows CPU’s to process more information and perform more complex operations.

Virtual Memory:

Virtual memory is an isolated memory space which maps to a physical memory space. A program is allocated a section of virtual memory and is supposed to only access memory within that virtual memory space that is allocated. Virtual memory is often larger than the physical memory on a system and can be used to allow systems with a low amount of physical RAM to run more complex programs, this is accomplished by utilizing disk space. Within these virtual memory spaces there are permission flags which determine what operations the application is allowed to preform. These include READ,WRITE and EXECUTE, these can be combined as well.

In order to access values in memory a program needs its virtual memory address which is mapped to its location in physical hardware. These addresses in these virtual spaces are stored in a stack which is a block of addresses.

Heres an additional article: https://www.geeksforgeeks.org/virtual-memory-in-operating-system/

Memory Stack

The memory stack follows the same principle of a traditional computer science stack, last in first out. The top of this stack is pointed to by the stack pointer which was briefly discussed above. The stack essentially builds up a stack of building blocks which are all executed in a program in a certain order and can only be pushed onto the stack or popped of the stack.

Heres a good blog post about the stack (with some debugger action as well): https://www.varonis.com/blog/stack-memory-3

Architectures

Two main CPU architectures exist ARM and x86_64 these two architectures run off of different instruction sets. These instruction sets are as follows:

  • Complex Instruction Set Computer (CISC)
    • CISC uses complex instructions this set of instructions provides a way of writing small code files in a minimal number of lines. Instructions in CISC can execute multiple operations in one. This can simplify the writing process but can be slower.
  • Reduced Instruction Set Computer (RISC)
    • RISC uses simple instruction that usually preform one operation per instruction. This leads to longer code files but can be faster due to not executing any unneeded operations.

x86/64 ASM Crash Course:

Registers and Flags:

Below is a table of common GPR’s (General Purpose Registers) as well as some special registers and their sizes in x86(32bit) and x64(64bit)

Register Namex86 Size (Bytes)x64 Size (Bytes)
General-Purpose Registers  
AL / AX / EAX / RAX1 / 2 / 48
AH / AX / EAX / RAX1 / 2 / 48
BL / BX / EBX / RBX1 / 2 / 48
BH / BX / EBX / RBX1 / 2 / 48
CL / CX / ECX / RCX1 / 2 / 48
CH / CX / ECX / RCX1 / 2 / 48
DL / DX / EDX / RDX1 / 2 / 48
DH / DX / EDX / RDX1 / 2 / 48
SI / ESI / RSI2 / 48
DI / EDI / RDI2 / 48
BP / EBP / RBP2 / 48
SP / ESP / RSP2 / 48
Instruction Pointer  
IP / EIP / RIP2 / 48
Flags / Status Register  
FLAGS / EFLAGS / RFLAGS2 / 48

Here are some common uses for these registers:

  • eax/rax:
    • Commonly used to store results of operations and return values of functions.
  • ecx/rcx:
    • Used in repeating actions as a counter.
  • edx/rdx:
    • Used to extent mul/div operations to extend the result.
  • edi/rdi, esi/rsi:
    • These are used when copying bytes in memory, rsi/esi is the source and rdi/edi is the destination.
    • rdi is also primarily used to pass the first arg to a function call.
  • esp/rsp:
    • These are stack pointers which alway point to the top of the stack.
  • ebp/rbp:
    • These are base pointers that point to a fixed place in the stack. This can be used with function variables and return values. ebp/rbp is typically left unchanged by a functions operations unlike the stack pointer.
  • eip/rip:
    • These are the special purpose registers that hold the memory address of the next instruction to be executed.
  • EFLAGS/RFLAGS:
    • These are special registers that hold the status of the CPU. The values in these registers are affected by logic and mathematical operations and indicate various outcomes such as “equal”, “not equal”, “overflow”, etc…
    • Here is a list of some flags:
FlagNameDefinitionValue Returned
CFCarry FlagSet if there was a carry out of the most significant bit1 (Set), 0 (Clear)
PFParity FlagSet if the number of set bits in the result is even1 (Even), 0 (Odd)
AFAuxiliary Carry FlagSet if there was a carry from bit 3 to bit 41 (Set), 0 (Clear)
ZFZero FlagSet if the result of an operation is zero1 (Zero), 0 (Non-zero)
SFSign FlagSet if the result of an operation is negative1 (Negative), 0 (Positive)
TFTrap FlagEnables single-step debugging1 (Enabled), 0 (Disabled)
IFInterrupt Enable FlagWhen set, interrupts are enabled1 (Enabled), 0 (Disabled)
DFDirection FlagControls direction for string operations1 (Decrement), 0 (Increment)
OFOverflow FlagSet if the result of an operation is too large to fit1 (Overflow), 0 (No Overflow)
IOPLI/O Privilege LevelIndicates privilege level for I/O operations0-3 (Privilege Levels)
NTNested TaskIndicates if the processor is in a nested task1 (Nested), 0 (Not Nested)
RFResume FlagAllows resuming execution after a debugging exception1 (Enabled), 0 (Disabled)
VMVirtual 8086 ModeIndicates if Virtual 8086 mode is enabled1 (Enabled), 0 (Disabled)
ACAlignment CheckEnables alignment checking for memory accesses1 (Enabled), 0 (Disabled)
VIFVirtual Interrupt FlagIndicates if interrupts are enabled in Virtual 8086 mode1 (Enabled), 0 (Disabled)
VIPVirtual Interrupt PendingIndicates if an interrupt is pending in Virtual 8086 mode1 (Pending), 0 (Not Pending)
IDID FlagAllows the CPUID instruction to be executed1 (Enabled), 0 (Disabled)

Instructions/Operations:

To add to the confusion different syntax versions exist for ASM the main ones are Intel and AT&T. Most disassemblers and assemblers use Intel syntax which is what will be used for the following instructions (unless specified). Some assemblers use AT&T syntax the main difference between the two is that the src and dest are swapped.

Example:

  • Intel:
    • mov dest, src
    • mov eax, 7
  • AT&T
    • mov src, dest
    • movl $7, %eax
    • In at&t syntax registers are denoted with a ‘%’ and immediate values are denoted with a ‘$’

Opcode,src and dest:

  • Opcode refers to the operation/instruction name such as mov,div,mul, etc…
  • Opcode is typically followed by a ‘src’ and ‘dest’
  • Dest is the destination of the result produced, sometimes it is also used in the arithmetic itself.
    • add eax, ebx ; eax = (eax+ebx)
  • Src is the source of the value in the operation.
  • Some operations only have one value used. In these cases it represents either the source and destination or simply just the destination or source.
    • inc ecx ; ecx = (ecx + 1)
    • push "src" ; push source on the stack

Here is a list of some common instructions:

OperationAT&T SyntaxIntel SyntaxDescription
Move immediatemovl $7, %eaxmov eax, 7Move the immediate value 7 into eax
Move registermovl %ebx, %ecxmov ecx, ebxMove the value from ebx to ecx
Move memory to regmovl 8(%ebp), %eaxmov eax, [ebp+8]Move value from memory at 8 bytes offset from ebp to eax
Move reg to memorymovl %eax, 16(%ebp)mov [ebp+16], eaxMove value from eax to memory at 16 bytes offset from ebp
Addaddl $5, %ebxadd ebx, 5Add value 5 to ebx
Subtractsubl $4, %ecxsub ecx, 4Subtract value 4 from ecx
Multiplyimull %ebx, %eaximul eax, ebxMultiply eax by ebx, result in eax
Dividedivl %ebxdiv ebxDivide eax by ebx, result in eax and edx (for quotient and remainder)
Comparecmpl $10, %ecxcmp ecx, 10Compare ecx with value 10
Jump if zerojz labelje labelJump to label if zero flag is set
Jump if not zerojnz labeljne labelJump to label if zero flag is not set
Jumpjmp destjmp destJump to dest
Call procedurecall functioncall functionCall the procedure at function address
ReturnretretReturn from the procedure
Pushpushl %eaxpush eaxPush eax onto the stack
Poppopl %eaxpop eaxPop value from stack into eax
Testtestl %ebx, %eaxtest eax, ebxPerform a bitwise AND between ebx and eax, setting flags
Multiply (unsigned)mull %ebxmul ebxMultiply eax by ebx, result in eax and edx (for result)
Divide (unsigned)divl %ebxdiv ebxDivide edx:eax by ebx, result in eax (quotient) and edx (remainder)
Incrementincl %eaxinc eaxIncrement eax by 1
Decrementdecl %eaxdec eaxDecrement eax by 1
ORorl %ebx, %eaxor eax, ebxPerform a bitwise OR between ebx and eax
ANDandl %ebx, %eaxand eax, ebxPerform a bitwise AND between ebx and eax
XORxorl %ebx, %eaxxor eax, ebxPerform a bitwise XOR between ebx and eax
NOTnotl %eaxnot eaxPerform a bitwise NOT on eax
Load Effective Addressleal 8(%ebp), %eaxlea eax, [ebp+8]Load the effective address of 8(%ebp) into eax
Move with Zero-Extendmovzbl 8(%ebp), %eaxmovzx eax, byte ptr [ebp+8]Move value from memory at 8 bytes offset from ebp into eax, zero-extended

Calling Conventions:

Passing arguments to functions in ASM can be done in different ways using different calling conventions. These calling conventions vary by use case and or compiler their is a list at the bottom of this section.

stdcall is the convention this will be focused on which is used in the WIN32 api.

When using stdcall arguments are pushed onto the stack right to left.

based on a helloworld message box program we can examine how the function args are treated in assembly instructions:

1
2
3
4
5
6
7
#include <windows.h>

int APIENTRY WinMain(HINSTANCE hInst, HINSTANCE hInstPrev, PSTR cmdline, int cmdshow) {
    MessageBoxA(NULL, "hello world", "Window Title", MB_OK);
    return 0;
}

First the WinMain function is created which is what will get called when the program executes (this is just like a normal main func) the APIENTRY resolves to __stdcall which specifys the calling convention. Within the main function we call the messagebox function with 4 parameters in the assembely it will be shown that these arguments get pushed first from right to left due to the calling convention.

WinMain function docs:

https://learn.microsoft.com/en-us/windows/win32/learnwin32/winmain--the-application-entry-point

MessageBox function docs:

https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-messagebox

This is the program output: alt text

Disassembling the program it looks like this. alt text

It is shown that the arguments are pushed to the stack right to left starting with 0 then Window Title then Hello World then 0. Then the function is called and then the program returns. The return value is stored in eax and the xor eax eax is setting that value to zero. stdcall puts the task of stack cleanup on the callee which in this case is the messagebox function.

Here is a list and some info on common calling conventions:

  • cdecl (C Declaration)
    • Stack Cleanup: Caller cleans up the stack.
    • Argument Passing: Arguments pushed onto the stack from right to left.
    • Return Value: Placed in EAX (x86) or RAX (x64).
    • Usage: Default for C/C++ on x86 systems.
  • stdcall (Standard Call)
    • Stack Cleanup: Callee cleans up the stack.
    • Argument Passing: Arguments pushed onto the stack from right to left.
    • Return Value: Placed in EAX (x86) or RAX (x64).
    • Usage: Commonly used in WinAPI functions.
  • fastcall
    • Stack Cleanup: Callee cleans up the stack.
    • Argument Passing: First two arguments in registers (ECX, EDX on x86; RCX, RDX, R8, R9 on x64); others on the stack.
    • Return Value: Placed in EAX (x86) or RAX (x64).
    • Usage: Optimized for frequently called functions.
  • thiscall
    • Stack Cleanup: Callee cleans up the stack.
    • Argument Passing: this pointer in ECX (x86); other arguments on the stack.
    • Return Value: Placed in EAX (x86) or RAX (x64).
    • Usage: For calling member functions of C++ classes.
  • vectorcall
    • Stack Cleanup: Callee cleans up the stack.
    • Argument Passing: Integer arguments in RCX, RDX, R8, R9; vector arguments in XMM0, XMM1, etc.
    • Return Value: Placed in RAX or XMM0.
    • Usage: Optimized for functions using vector types like float or double.
  • syscall
    • Stack Cleanup: Not directly involved; parameters passed in registers.
    • Argument Passing: Parameters in registers (RDI, RSI, RDX, R10, R8, R9 on x64).
    • Return Value: Placed in RAX.
    • Usage: Used for making system calls in operating systems like Linux.
  • ms_abi / sysv_abi (Microsoft vs. System V x64 Calling Conventions)
    • Stack Cleanup: Callee cleans up the stack.
    • Argument Passing:
      • Microsoft ABI: Integer/pointer arguments in RCX, RDX, R8, R9; floating-point arguments in XMM0-XMM3.
      • System V ABI: Integer/pointer arguments in RDI, RSI, RDX, RCX, R8, R9; floating-point arguments in XMM0-XMM7.
    • Return Value: Placed in RAX or XMM0.
    • Usage: ms_abi for Windows x64, sysv_abi for Unix-like systems on x64.

ARM Assembly Crash Course:

This will be somewhat similar to the above but ARM uses RISC. The benefit of this is efficiency and energy/heat management. Processors with less transistors can run using RISC making them ideal for IoT, mobile and embedded devices.

ARM has three main architecture profiles A,R and M they are as follows:

  • A (Application profiles, Cortex A Family of Chips):
    • “Application profiles implement a traditional ARM architecture with multiple modes and support a virtual memory system architecture based on an MMU. These profiles support both ARM and Thumb instruction sets.”
  • R (Real-time profiles, Cortex-R Family of Chips):
    • “Real-time profiles implement a traditional ARM architecture with multiple modes and support a protected memory system architecture based on an MPU.”
  • M (Microcontroller profiles, Cortex-M Family of Chips):
    • “Microcontroller profiles implement a programmers’ model designed for fast interrupt processing, with hardware stacking of registers and support for writing interrupt handlers in high-level languages. The processor is designed for integration into an FPGA and is ideal for use in very low power applications.”

    https://developer.arm.com/documentation/dui0471/m/key-features-of-arm-architecture-versions/arm-architecture-profiles

ARM just like x86_64 architecture uses registers to store data and perform computations.

The registers in ARM are as follows:

ARM 32-bit Registers (ARMv7)

RegisterDescription
R0-R12General-purpose registers
R13Stack Pointer (SP)
R14Link Register (LR)
R15Program Counter (PC)
CPSRCurrent Program Status Register
SPSRSaved Program Status Register
  • R0-R12: Used for general-purpose storage.
  • R13 (SP): Points to the current stack location.
  • R14 (LR): Stores the return address for function calls.
  • R15 (PC): Holds the address of the next instruction to be executed.
  • CPSR: Contains flags and control bits that affect processor operation, including APSR and interrupt enable bits.
  • SPSR: Used to save the CPSR value when handling exceptions, allowing the processor to return to the previous state after an exception.
  • APSR: Used to store condition code flags.

ARM 64-bit Registers (ARMv8)

RegisterDescription
X0-X30General-purpose registers
X31Zero Register (ZR)
SPStack Pointer
LRLink Register
PCProgram Counter
PSTATEProcessor State Register
  • X0-X30: Used for general-purpose storage. X30 is also the Link Register (LR) in the ARMv8 architecture.
  • X31 (ZR): Used as a representation of the zero register or the sp stack pointer (x31 doesn’t really exist it is a representation).
  • SP: Points to the current stack location.
  • LR: Holds the return address for function calls (same role as X30).
  • PC: Holds the address of the next instruction to be executed.
  • PSTATE: Contains flags and control bits, including condition codes and system status information.

Instruction set ARM:

ARM instruction sets can be broken into the following groups: instruction setarm

https://developer.arm.com/documentation/100076/0100/Instruction-Set-Overview/Overview-of-AArch64-state/A64-instruction-set-overview?lang=en

Here is a list of all the A64 instructions for ARM(there is a lot):

https://developer.arm.com/documentation/100076/0100/A64-Instruction-Set-Reference/A64-General-Instructions/A64-instructions-in-alphabetical-order?lang=en

Here are some basic/common ones:

InstructionDescriptionExample
MOVMove dataMOV R0, #1 (Move the value 1 into register R0)
ADDAdd two registersADD R1, R2, R3 (Add the values in R2 and R3, store the result in R1)
SUBSubtract one register from anotherSUB R4, R5, #10 (Subtract 10 from the value in R5, store the result in R4)
MULMultiply two registersMUL R6, R7, R8 (Multiply the values in R7 and R8, store the result in R6)
DIVDivide one register by anotherDIV R9, R10, R11 (Divide the value in R10 by the value in R11, store the result in R9)
ANDLogical AND between two registersAND R12, R13, R14 (Perform a bitwise AND between R13 and R14, store the result in R12)
ORRLogical OR between two registersORR R15, R16, R17 (Perform a bitwise OR between R16 and R17, store the result in R15)
EORLogical XOR between two registersEOR R18, R19, R20 (Perform a bitwise XOR between R19 and R20, store the result in R18)
CMPCompare two registersCMP R21, R22 (Compare the value in R21 with the value in R22, affecting flags)
BBranchB label (Branch to ‘label’ and dont return)
BLBranch with link (call function)BL function (Branch to ‘function’ and link the next instruction to LR expecting to return)
LDRLoad register from memoryLDR R23, [R24] (Load the value from the address in R24 into R23)
STRStore register to memorySTR R25, [R26] (Store the value in R25 into the address in R26)

syscalls can also be used in ARM and are accessed using the SVC (Supervisor Call). The syntax looks like svc #0

an example of calling a write syscall in arm would look like this:

1
2
3
4
5
6
7
8
9
10
11
mov r7, #4 @ 4 is the write syscall


svc #0 @ hand control to kernel


mov r0, #0 @ status code to 0
mov r7, #1 @ syscall 1 is exit
svc #0 @ invoke the call


A good way to find how to invoke syscalls is by using the man syscall command on linux.

This is a table from the result of that command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
  Architecture calling conventions
      Every architecture has its own way of invoking and passing arguments to the kernel.  The details for various architec‐
      tures are listed in the two tables below.


      The first table lists the instruction used to transition to kernel mode (which might not be the fastest or best way to
      transition  to  the kernel, so you might have to refer to vdso(7)), the register used to indicate the system call num‐
      ber, the register(s) used to return the system call result, and the register used to signal an error.


      Arch/ABI    Instruction           System  Ret  Ret  Error    Notes
                                        call #  val  val2
      ───────────────────────────────────────────────────────────────────
      alpha       callsys               v0      v0   a4   a3       1, 6
      arc         trap0                 r8      r0   -    -
      arm/OABI    swi NR                -       r0   -    -        2
      arm/EABI    swi 0x0               r7      r0   r1   -
      arm64       svc #0                w8      x0   x1   -
      blackfin    excpt 0x0             P0      R0   -    -
      i386        int $0x80             eax     eax  edx  -
      ia64        break 0x100000        r15     r8   r9   r10      1, 6
      m68k        trap #0               d0      d0   -    -
      microblaze  brki r14,8            r12     r3   -    -
      mips        syscall               v0      v0   v1   a3       1, 6
      nios2       trap                  r2      r2   -    r7
      parisc      ble 0x100(%sr2, %r0)  r20     r28  -    -
      powerpc     sc                    r0      r3   -    r0       1
      powerpc64   sc                    r0      r3   -    cr0.SO   1
      riscv       ecall                 a7      a0   a1   -
      s390        svc 0                 r1      r2   r3   -        3
      s390x       svc 0                 r1      r2   r3   -        3
      superh      trap #0x17            r3      r0   r1   -        4, 6
      sparc/32    t 0x10                g1      o0   o1   psr/csr  1, 6
      sparc/64    t 0x6d                g1      o0   o1   psr/csr  1, 6
      tile        swint1                R10     R00  -    R01      1
      x86-64      syscall               rax     rax  rdx  -        5
      x32         syscall               rax     rax  rdx  -        5
      xtensa      syscall               a2      a2   -    -






A helpful site for finding syscalls is:

https://syscall.sh/

ARM CPU modes:

A final thing to note is how ARM processors can run in various modes. They are defined as follows: cpu modes

https://developer.arm.com/documentation/ddi0406/b/System-Level-Architecture/The-System-Level-Programmers--Model/ARM-processor-modes-and-core-registers/ARM-processor-modes?lang=en

Putting all that information together here is an example of a disassembled 32-bit hello world program compiled for ARM. armexample

Conclusion:

Being “ARM”ed with this knowledge of low level systems and assembly will create a solid base of understanding which will be crucial for reverse engineering, hacking or creating.

References/Resources:

This post is licensed under CC BY 4.0 by the author.