3

I was playing around in assembly and noticed that it's possible to overwrite a string in a variable as long as the new string does not exceed the size of the original string:

MESSAGE DB 'Hello World', 0Dh, 0Ah

[…]

MOV [MESSAGE], 'Test'

However, when I try to move the same string into an uninitialized variable, I receive a compilation error that says the quoted string is too large for size of data. This even though the reserved space is 16 bytes which should be more than enough for said sentence:

BUFFER DD 4 DUP ?

[…]

MOV [BUFFER], 'Tablespoons'

The only explanation I can find for this is that each letter must be copied one by one and as soon as the first letter has been copied the size of the variable "locks" at 1 byte. Or in other words, the uninitialized variable is now initialized with 1 byte.

Have I understood the problem correctly? Is there some other way one can move text into a variable?

6
  • 3
    Your first example does not work either. Immediates are 32 bits at most (except when loading a register) so you can only write 4 characters at a time. Note that the error message does not refer to the variable's size. Commented Sep 2 at 13:48
  • 2
    Each source line of assembly must assemble to at most 1 machine instruction, so you need to look at what x86 mov can do (felixcloutier.com/x86/mov): with a memory destination, at most a 32-bit immediate (sign-extended to 64 if you use 64-bit operand-size). With a register destination, up to a 64-bit immediate is possible, so mov rcx, 'Tablespo' / mov [rel BUFFER], rcx is possible, then mov dword [rel BUFFER+7], 'oons' (overlapping by 1 byte) can store the last 4 characters. Or without overlap, you could make the 12th byte a 0 terminator so it's a C string. Commented Sep 2 at 13:58
  • Jester: It has now been fixed. Commented Sep 2 at 14:32
  • 2
    @Lavonen It is this way because that's how the architecture has been designed. It could have also been designed another way and then it might have been supported. Note that there is an instruction (rep movsb) for copying an arbitrary amount of bytes, but it requires some more setup to use. Commented Sep 2 at 14:37
  • 2
    "Variable" is almost not even a useful concept in assembly programming; it's an abstraction which exists at a higher level. There's really just memory, and labels which refer to specific addresses in memory. Commented Sep 2 at 19:06

2 Answers 2

5

Each source line of assembly must assemble to at most 1 machine instruction (except with macros and directives like times), so you need to look at what x86 mov can do (https://www.felixcloutier.com/x86/mov):

  • With a memory destination, at most a 32-bit (4-byte) immediate
    (sign-extended to 64 if you use 64-bit operand-size).
  • With a register destination, up to a 64-bit immediate (8 bytes) is possible.
    See also why we can't move a 64-bit immediate value to memory? for CPU-architecture design discussion.

(On some RISC ISAs, pseudo-instructions are common, like MIPS li $t0, 0x12345678 uses two 32-bit instructions to materialize both halves of the 32-bit constant into a register. Mainstream x86 assembly syntaxes don't have pseudo-instructions for storing long strings or anything else.)

As Jester commented, this is what NASM is complaining about, not the size you reserved at BUFFER. Even though you're using MASM dup ? syntax to reserve space in BUFFER (which NASM recently added compatibility support for), it's still NASM; declarations elsewhere don't magically imply an operand-size for an instruction like they do in MASM. (Where BUFFER DD 4 DUP ? would imply 32-bit dword operand-size for mov BUFFER, 1234. But for MASM, see When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order? - multi-character literals as integer constants get reversed by MASM, unlike NASM. Seems like a total mess one should avoid if they're using MASM.)


So mov rcx, 'Tablespo' / mov [rel BUFFER], rcx is possible, then
mov dword [rel BUFFER+7], 'oons' (overlapping by 1 byte) can store the last 4 ASCII characters.

Or without overlap, you could make the 12th byte a 0 terminator so it's a C string: mov dword [rel BUFFER+8], 'ons'. Note that I need the dword size specifier because neither operand is a register to imply an operand-size, unlike with mov r64, imm64 / mov r/m64, r64.

Without dword, just mov [BUFFER+8], 'ons' I get foo.asm:6: error: operation size not specified because the assembler doesn't know if I wanted mov byte [], imm8, mov word [], imm16, or mov dword [], imm32. And it's a good thing that it doesn't infer the operand-size from the constant not fitting in 16 bits; that would get weird when changing constants made your code fail to build, or worse silently pick a different operand-size.

NASM treats multi-character literals as integers, with a byte-order such that storing it to memory on x86 (which is little-endian) will put the string bytes into memory in source order. The examples I wrote above are exactly equivalent to writing it numerically, e.g.

default rel    ; makes [symbol] assume [rel symbol] instead of [abs symbol]

section .text
  mov   rcx, 'Tablespo'           ; 8 bytes
    mov rcx,0x6f7073656c626154    ; same instruction
  mov   [BUFFER], rcx
  mov   dword [BUFFER+8], 'ons'   ; 4th byte of the immediate = 0, as a terminator for the string
  mov   dword [BUFFER+8], `ons\0` ; we can make that explicit, using backquotes to process C string escapes

section .bss
BUFFER: resb 16      ; reserve 16 bytes using standard NASM syntax, not MASM

section .data
MESSAGE: DB 'Hello World', 0

section .rodata      ; read-only data.  On Windows, this section is I think .rdata
const_string: DB 'Hello World', 0Ah, 0

This is of course 64-bit code since this is 2025 and you didn't specify legacy 32-bit mode. In 32-bit and 16-bit mode, the widest integer mov you can do is 4 bytes, with a memory destination supported. See @fuz's answer for examples that work in 32-bit mode. (And also for an example of using 2 and 1-byte stores to handle the tail of an odd number of bytes, instead of overlapping dwords.)

Assembling + disassembling this (on Linux with NASM to assemble, GNU Binutils ld to link, objdump -drwC -Mintel to disassemble into MASM-like syntax), I get:

$ nasm -felf64 foo.asm
$ ld -o foo foo.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
$ objdump -drwC -Mintel foo
...
0000000000401000 <.text>:
  401000:       48 b9 54 61 62 6c 65 73 70 6f   movabs rcx,0x6f7073656c626154
  40100a:       48 89 0d 0b 20 00 00    mov    QWORD PTR [rip+0x200b],rcx        # 40301c <__bss_start>
  401011:       c7 05 09 20 00 00 6f 6e 73 00   mov    DWORD PTR [rip+0x2009],0x736e6f        # 403024 <__bss_start+0x8>
  40101b:       c7 05 ff 1f 00 00 6f 6e 73 00   mov    DWORD PTR [rip+0x1fff],0x736e6f        # 403024 <__bss_start+0x8>

As you can see, the multi-character string literals are just bytes, and the disassembler didn't know they were meant to be strings. 0x736e6f is exactly equivalent to 'ons' as a dword value; I could have used that in my NASM source code.

In GDB, after starti (to stop before the first user-space instruction), and stepi until after the qword store, we can see those 8 bytes in memory. Then single-stepping again to execute the dword store, we can see it there as well.

(gdb) x /16c &BUFFER
0x40301c:       84 'T'  97 'a'  98 'b'  108 'l' 101 'e' 115 's' 112 'p' 111 'o'
0x403024:       0 '\000'        0 '\000'        0 '\000'        0 '\000'        0 '\000'        0 '\000'        0 '\000'        0 '\000'

(gdb) si
0x000000000040101b in ?? ()
(gdb) x /16c &BUFFER
0x40301c:       84 'T'  97 'a'  98 'b'  108 'l' 101 'e' 115 's' 112 'p' 111 'o'
0x403024:       111 'o' 110 'n' 115 's' 0 '\000'        0 '\000'        0 '\000'        0 '\000'        0 '\000'

(gdb) p (char*)&BUFFER
$2 = 0x40301c "Tablespoons"

(In this hand-written asm, there are no type-info directives so GDB doesn't know BUFFER should be like char BUFFER[], so I just take its address and tell GDB how to interpret it.)


If you want to "assign" a whole long string with one instruction, you need a level of indirection so you can just store a pointer. With the string bytes in .rodata / .rdata or something, or in .data if you want to modify them later. Like mov ecx, mystring2 or lea rcx, [rel mystring2], or even mov dword [bufptr], mystring2 if bufptr is a 32-bit pointer in memory. (How to load address of function or label into register)

Sign up to request clarification or add additional context in comments.

Comments

4

The misconception is that you think of mov as being able to move a whole string at once. What this instruction actually does is move a fixed amount of bytes from one place (immediate, register, or memory) to another (register or memory, “memory to memory” is not supported). On the i386 architecture, this amount is 1, 2, or 4 bytes. With x86-64, it can also be 8 bytes, though it is not supported to move an 8 byte immediate to memory.

Your first example happens to work as 'Test' is 4 characters (i.e. 4 bytes), so the assembler can assemble the instruction. If you want to move an amount of data that is not 1, 2, or 4 bytes, you'll have to split the data move into multiple instructions. For example, you could do

mov     dword [buffer], 'Tabl'
mov     dword [buffer+4], 'espo'
mov     dword [buffer+8], 'ons'

Here the keyword dword specifies that you wish to move 4 bytes at a time. Note that the last move uses a character constant with only three characters. It'll still move four characters, with a NUL byte used for the fourth character. If you want to avoid this, use e.g.

mov     dword [buffer], 'Tabl'
mov     dword [buffer+4], 'espo'
mov     word [buffer+8], 'on'
mov     byte [buffer+10], 's'

Here the keyword word and byte indicate that you wish to move only 2 or 1 bytes respectively.

1 Comment

The 32-bit operations ought to also specify an operand size on the memory operands.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.