This is an example where a0 is an in/out integer. For memory, change long a0 to a pointer to some struct and add a "m" input or "+m" in/out. It's even easier in other languages like Rust and Zig.
> The inline assembly is not idiomatic. Today you should be using register asm.
Who's writing these idioms? I've always seen register variables as an alternative, not the default option, except for those registers which have no constraint code.
Register variables are idiomatic when you're locking down every single register anyway, which is always the case for system calls. If you're writing assembly where you want the register allocators help, by all means use regular inline assembly.
Doesn't GCC have long-standing bugs where it a) doesn't support assigning some select registers, and b) when you try to use them anyway, it instead completely ignores your annotations with no diagnostics, and c) when it gets reported on GCC's bug tracker, a maintainer would reply with "well, it's always been like that and everyone knows you can't use this register like that anyway" and close it as WONTFIX? I distinctly remember reading about something like that...
clobbered register list will give same hints to compiler... register keyword is also a hint not a fixed thing so i cant imagine its really handled differently?
might be wrong, but it seem earier to me to use all the features of the asm call itself rather than outside of it pre-specifying these preferences.
Seems like an interesting if maybe not practical protection to implement in eBPF for programs that never make a naked syscall.
Step one would be to ensure that every syscall has a wrapper. Place a uprobe at the start of that wrapper which, when hit, sets a per-thread permission bit and a per-thread-per-syscall permission bit in an eBPF map. Place a corresponding uretprobe that clears the per-thread-per-syscall bit. For each syscall place a kprobe which checks the per-thread table to make sure the thread is one which has enabled the feature, and which then checks to make sure the per-thread-per-syscall bit is set for that syscall. If not, sigkill.
Performance would probably suck but it seems like it would protect the syscall entrypoints enough to do some potentially interesting attack surface reduction. The question is really why you would do that there instead of by attaching to, say, the LSM hooks where you have stronger guarantees vis a vis userspace.
The point of pinsyscall is that they have to jump to the single entry point for that syscall, rather than any of the ~200+ syscall instructions littering the address space. ALSR makes finding an entry point difficult, but that's easier if you only need to find any syscall instruction, rather than the specific one for the syscall you're invoking. The rationale is explained here: https://undeadly.org/cgi?action=article;sid=20230222064027
The point of what I spelled out above is that they can jump to the instruction but the kernel will kill the program if they don't go through the function up to that point. That allows you to restrict the arguments to the syscall at the point of call.
If I'm reading the commits correctly, the OpenBSD kernel will skip two instructions after a "svc 0" when returning to userspace, on the assumption that any syscall comes from libc and therefore has "dsb nsh; isb" after it.
Yeah, probably (although I am not sure) because QEMU doesn't fully emulate hardware behavior, but on real hardware some CPUs may internally handle these state transitions better. I suppose its inclusion is recommended to ensure correctness across all CPUs.
It may happen because without "dsb nsh" (Data Synchronization Barrier) and "isb" (Instruction Synchronization Barrier), QEMU may continue execution before the syscall fully completes, which causes a segfault or UB, but I am not entirely sure.
"dsb nsh" and "isb" ensures correct execution order which prevents speculative execution issues.
That doesn't sound right -- QEMU always works one instruction at a time, so we never start executing a following insn until the previous one is completed.
How is that an issue? You inspect the generated assembler. Godbolt is the industry standard tool for that, but of course just use objdump on the command line if a browser UI isn't your fancy.
Mixing C is helpful because most code doesn't need to be written in raw assembly.
You also need to re-inspect it at every toolchain change/upgrade. In my experience, most programmers don't, and then we end up spending two day chasing some impossible-to-happen bug which has happened nonetheless because new Clang version had a regression in some obscure and rarely exercised part of its codegen.
No. You have a unit test to inspect it, if your other unit tests cannot cover this codegen bug. Asking programmers to re-inspect by hand is like asking programmers to run all tests manually after each commit. Of course most programmers don't.
Yeah but then you have to maintain function interfaces between them in order to link them. The case in this article is for inserting one single asm instruction in an otherwise C codebase.
platform ABI is usually better and more clearly defined than whatever GCC & Clang feel like doing to inline ASM across versions and optimization levels
and inspecting the output is obviously easy. the hard part is figuring out which flags, pragmas & directives are required to get gcc to emit what you wanted it to emit when it doesn’t.
I like how Go provides the "syscall" package in the standard library. It's OS/ARCH specific, and takes every precaution to call the raw thing safely - notably syscall.ForkExec on UNIX platforms (digging thru the source made me really appreciate the scope of their work).
It "feels" very low-level in an otherwise very high-level language, but provides enough power to implement an entire userland from initrd up, libc- and asm-free (check out Gokrazy and u-root).
OpenBSD and Go have always been at odds here. Go really wants to produce static executables whenever possible, and do syscalls directly; OpenBSD really wants to prevent user programs from accidentally creating gadgets. I guess they've settled on dynamically linking ld.so?
That's a very fair assessment of Apple – IIRC a lot of early idiosyncratic hardening work in XNU world was done for the launch of the iPhone OS App Store, but it's only been (relatively) recently that they've started long-term initiatives systematically harden macOS on top.
I think it's also because a desktop is broadly considered a "power tool", you absolutely need it for systems development, and that is forever at odds with "iPhone-level" lockdown/security. Even if you'd start from scratch, you'd need a solid theoretical foundation (e.g. capabilities), and probably build up new metaphors on top.
I like what OpenBSD did with pledge&unveil. It gets "the first 80%" of the work done, it's easy for the developer, and invisible to the user.
maybe to make it more concise binary you can do something like nostdinc nostdlib or freestanding and add own linker file to be more explicit about how binary is build and what to /discard/. also max page size and other linker flags can impact. its sometimes a bit trial and error to get it to spit out a small binary with only the code u really wrote.
own linker file and manual linking step imo is key. (i use gcc/ld).
if u let gcc spit linker file for modern platform, u can see its full of clutter... most of it unneeded.
u can also strip that one down, but i am sure u know what elf sections to put and omit, and you found all the bsd specific ones.
in the linker step u can also add symbols / calculate offsets.
in gcc u can also in the c code use attribute section('bla'). not sure if its handy in this case but maybe it'll ease somewhat these things or bring it back more into C :).
cool example :) remebering struggle tirleesly tryin to find out how to run a raw syscall on openbsd. a lot of man pages, readelfs and headaches i was so happy to get my exit code hahah
I'm sorry but I got stuck on the first sentence "Ted Unangst published dude, where are your syscalls? on flak yesterday" and as a long time fediverse operator I got insanely curious about "flak".
So I ended up on the flak tag of this blog[1], but I still can't figure out what it is. I can find no links to any source code, or any service description. Even though the blogger mentions flak being their "signature service".
I'm guessing it's a blogging platform, with ActivityPub support, but I can't find any info about how it's used.
The inline assembly is not idiomatic. Today you should be using register asm. Here is a RISC-V example:
This is an example where a0 is an in/out integer. For memory, change long a0 to a pointer to some struct and add a "m" input or "+m" in/out. It's even easier in other languages like Rust and Zig.> The inline assembly is not idiomatic. Today you should be using register asm.
Who's writing these idioms? I've always seen register variables as an alternative, not the default option, except for those registers which have no constraint code.
Register variables are idiomatic when you're locking down every single register anyway, which is always the case for system calls. If you're writing assembly where you want the register allocators help, by all means use regular inline assembly.
Doesn't GCC have long-standing bugs where it a) doesn't support assigning some select registers, and b) when you try to use them anyway, it instead completely ignores your annotations with no diagnostics, and c) when it gets reported on GCC's bug tracker, a maintainer would reply with "well, it's always been like that and everyone knows you can't use this register like that anyway" and close it as WONTFIX? I distinctly remember reading about something like that...
Probably the current state that GCC for Aarch64 and RISC-V simply don't offer individual register constraints like x86.
you can also out register preferences directly into the asm volatile line.
asm ( assembler template : output operands (optional) : input operands (optional) : clobbered registers list (optional) );
clobbered register list will give same hints to compiler... register keyword is also a hint not a fixed thing so i cant imagine its really handled differently?
might be wrong, but it seem earier to me to use all the features of the asm call itself rather than outside of it pre-specifying these preferences.
Actually nowadays ideally we would already caught up with ESPOL/NEWP from 1961 and use only intrisics.
Apparently that is a VC++ only thing.
MCP and the rest of the Burroughs Large Systems are an interesting road not taken.
I’d really prefer using inline assembly tbh
This is inline assembly.
Normal inline assembly not the register pinning you’re doing
Idiomatic to you perhaps, to me this looks ridiculous and unnecessary, bordering on masochistic.
Seems like an interesting if maybe not practical protection to implement in eBPF for programs that never make a naked syscall.
Step one would be to ensure that every syscall has a wrapper. Place a uprobe at the start of that wrapper which, when hit, sets a per-thread permission bit and a per-thread-per-syscall permission bit in an eBPF map. Place a corresponding uretprobe that clears the per-thread-per-syscall bit. For each syscall place a kprobe which checks the per-thread table to make sure the thread is one which has enabled the feature, and which then checks to make sure the per-thread-per-syscall bit is set for that syscall. If not, sigkill.
Performance would probably suck but it seems like it would protect the syscall entrypoints enough to do some potentially interesting attack surface reduction. The question is really why you would do that there instead of by attaching to, say, the LSM hooks where you have stronger guarantees vis a vis userspace.
What’s the threat model this protects against?
"Attacker has ROP and shouldn't be able to make arbitrary syscalls".
Seems mildly useful if you have a really flexible syscall you can't forbid (ioctl, say) but which you only use for a specific narrow purpose.
If they can ROP they can jump to a syscall instruction with controlled arguments
The point of pinsyscall is that they have to jump to the single entry point for that syscall, rather than any of the ~200+ syscall instructions littering the address space. ALSR makes finding an entry point difficult, but that's easier if you only need to find any syscall instruction, rather than the specific one for the syscall you're invoking. The rationale is explained here: https://undeadly.org/cgi?action=article;sid=20230222064027
The point of what I spelled out above is that they can jump to the instruction but the kernel will kill the program if they don't go through the function up to that point. That allows you to restrict the arguments to the syscall at the point of call.
Ah, so it’s like a poor man’s BTI
Why involve C at all? This is much cleaner in assembly
Isn't "dsb nsh" or "isb" redundant for simple syscalls like write and exit?
Without them it'll segfault on QEMU, IIRC OpenBSD libc uses them invariably as well. I don't know how it fares on real hardware
The "dsb nsh; isb" sequence after "svc 0" is part of OpenBSD's mitigations for Spectre.
https://github.com/openbsd/src/commit/bbeaada4689520859307d5...
https://github.com/openbsd/src/commit/0c401ffc2a2550c32105ce...
https://github.com/openbsd/src/commit/5ecc9681133f1894e81c38...
If I'm reading the commits correctly, the OpenBSD kernel will skip two instructions after a "svc 0" when returning to userspace, on the assumption that any syscall comes from libc and therefore has "dsb nsh; isb" after it.
Yeah, probably (although I am not sure) because QEMU doesn't fully emulate hardware behavior, but on real hardware some CPUs may internally handle these state transitions better. I suppose its inclusion is recommended to ensure correctness across all CPUs.
It may happen because without "dsb nsh" (Data Synchronization Barrier) and "isb" (Instruction Synchronization Barrier), QEMU may continue execution before the syscall fully completes, which causes a segfault or UB, but I am not entirely sure.
"dsb nsh" and "isb" ensures correct execution order which prevents speculative execution issues.
That doesn't sound right -- QEMU always works one instruction at a time, so we never start executing a following insn until the previous one is completed.
So what causes QEMU to segfault without the synchronization? I have not tested it myself, however, so I cannot guarantee that it indeed does segfault.
Yeah the segfault doesn't happen on other OSes, I believe it's an OpenBSD thing
Probably due to the audience.
I can grasp the example, but back in my days, if we wanting something faster than interpreted BASIC, Assembly was the way.
Folks nowadays start with Python and JavaScript.
especially when confronted with a soup of C flags and declarations, and still not being entirely sure what the C compiler is going to emit
How is that an issue? You inspect the generated assembler. Godbolt is the industry standard tool for that, but of course just use objdump on the command line if a browser UI isn't your fancy.
Mixing C is helpful because most code doesn't need to be written in raw assembly.
You also need to re-inspect it at every toolchain change/upgrade. In my experience, most programmers don't, and then we end up spending two day chasing some impossible-to-happen bug which has happened nonetheless because new Clang version had a regression in some obscure and rarely exercised part of its codegen.
No. You have a unit test to inspect it, if your other unit tests cannot cover this codegen bug. Asking programmers to re-inspect by hand is like asking programmers to run all tests manually after each commit. Of course most programmers don't.
keeping assembly in asm files, c in c files, compiled by their respective compilers, then linked, has fewer footguns than inline asm
Yeah but then you have to maintain function interfaces between them in order to link them. The case in this article is for inserting one single asm instruction in an otherwise C codebase.
platform ABI is usually better and more clearly defined than whatever GCC & Clang feel like doing to inline ASM across versions and optimization levels
and inspecting the output is obviously easy. the hard part is figuring out which flags, pragmas & directives are required to get gcc to emit what you wanted it to emit when it doesn’t.
I like how Go provides the "syscall" package in the standard library. It's OS/ARCH specific, and takes every precaution to call the raw thing safely - notably syscall.ForkExec on UNIX platforms (digging thru the source made me really appreciate the scope of their work).
It "feels" very low-level in an otherwise very high-level language, but provides enough power to implement an entire userland from initrd up, libc- and asm-free (check out Gokrazy and u-root).
OpenBSD and Go have always been at odds here. Go really wants to produce static executables whenever possible, and do syscalls directly; OpenBSD really wants to prevent user programs from accidentally creating gadgets. I guess they've settled on dynamically linking ld.so?
And the MacOS API is similar.
It's a bit more complicated. Ideally you'd only need libdyld.dylib and libSystem.B.dylib or so, but...
OpenBSD & macOS philosophies are often surprisingly aligned in certain ways, but OpenBSD is simple, macOS is comprehensive (and - TBF quite bloated).That's a very fair assessment of Apple – IIRC a lot of early idiosyncratic hardening work in XNU world was done for the launch of the iPhone OS App Store, but it's only been (relatively) recently that they've started long-term initiatives systematically harden macOS on top.
I think it's also because a desktop is broadly considered a "power tool", you absolutely need it for systems development, and that is forever at odds with "iPhone-level" lockdown/security. Even if you'd start from scratch, you'd need a solid theoretical foundation (e.g. capabilities), and probably build up new metaphors on top.
I like what OpenBSD did with pledge&unveil. It gets "the first 80%" of the work done, it's easy for the developer, and invisible to the user.
maybe to make it more concise binary you can do something like nostdinc nostdlib or freestanding and add own linker file to be more explicit about how binary is build and what to /discard/. also max page size and other linker flags can impact. its sometimes a bit trial and error to get it to spit out a small binary with only the code u really wrote.
own linker file and manual linking step imo is key. (i use gcc/ld). if u let gcc spit linker file for modern platform, u can see its full of clutter... most of it unneeded. u can also strip that one down, but i am sure u know what elf sections to put and omit, and you found all the bsd specific ones.
in the linker step u can also add symbols / calculate offsets.
in gcc u can also in the c code use attribute section('bla'). not sure if its handy in this case but maybe it'll ease somewhat these things or bring it back more into C :).
cool example :) remebering struggle tirleesly tryin to find out how to run a raw syscall on openbsd. a lot of man pages, readelfs and headaches i was so happy to get my exit code hahah
I'm sorry but I got stuck on the first sentence "Ted Unangst published dude, where are your syscalls? on flak yesterday" and as a long time fediverse operator I got insanely curious about "flak".
So I ended up on the flak tag of this blog[1], but I still can't figure out what it is. I can find no links to any source code, or any service description. Even though the blogger mentions flak being their "signature service".
I'm guessing it's a blogging platform, with ActivityPub support, but I can't find any info about how it's used.
1. https://flak.tedunangst.com/t/flak
The post you're looking for is about how they also NIH'd a source code browser: https://flak.tedunangst.com/post/humungus
Flak is a blogging platform... I think. He also has Honk, which is a ActivityPub server?
[dead]