From 28760cdd9787be441197035e8fbeeae724cb824c Mon Sep 17 00:00:00 2001 From: zaaarf Date: Sat, 16 Sep 2023 18:31:37 +0200 Subject: [PATCH] feat: progress, fixed readme thing --- README.md | 5 +++++ book.toml | 3 +-- src/1_introduction/why_lillero.md | 4 +++- src/1_introduction/why_mixin.md | 9 +++++++-- src/2_patching/bytecode.md | 17 +++++++++++++++++ src/2_patching/jump_nodes.md | 1 + src/2_patching/nodes.md | 4 +++- src/2_patching/patching.md | 4 ++-- src/2_patching/patterns.md | 1 + src/SUMMARY.md | 4 ++-- 10 files changed, 42 insertions(+), 10 deletions(-) create mode 100644 src/2_patching/bytecode.md create mode 100644 src/2_patching/jump_nodes.md create mode 100644 src/2_patching/patterns.md diff --git a/README.md b/README.md index 26e3e2f..ca4a0b5 100644 --- a/README.md +++ b/README.md @@ -4,3 +4,8 @@ This is meant as an in-depth guide to ASM manipulation with the [Lillero](https: The [ASM](https://asm.ow2.io/) library that Lillero is built with is capable of so much more than what is described in this book. In fact, the [official ASM guide](https://asm.ow2.io/asm4-guide.pdf) is a must-read for anyone wishing to understand its inner workings more in depth. While very well written, the official ASM guide reads more like a scientific paper than a practical manual, and this is part of my reasons for creating the Lillero Book. Much like Lillero is an intermediary layer meant to simplify working with ASM, this book is an intermediary step meant to give you a passable understanding of patching before plunging deep into the obscure inner workings of Java bytecode. In short, this is no replacement for the ASM manual: think of the Lillero Book as a (relatively) simple introduction to its topics. + +## Building +This is built with [mdbook](https://github.com/rust-lang/mdBook): simply install `mdbook`, clone this, and run `mdbook build` in the root folder. You'll find the compiled and static html in the `book` subfolder. + +You can also find a live version [here](https://lll.fantabos.co/book/), if you prefer. diff --git a/book.toml b/book.toml index 7d3a4a1..cbec0f8 100644 --- a/book.toml +++ b/book.toml @@ -1,5 +1,4 @@ [book] authors = ["zaaarf"] language = "en" -multilingual = false -src = "src" +multilingual = false \ No newline at end of file diff --git a/src/1_introduction/why_lillero.md b/src/1_introduction/why_lillero.md index 5e46c29..5095664 100644 --- a/src/1_introduction/why_lillero.md +++ b/src/1_introduction/why_lillero.md @@ -1,10 +1,12 @@ # Why Lillero? -As you may have gleamed from the previous chapter, I am not a fan of Mixin. I respect its engineering, which is very clever, and acknowledge the problems it attempts to solve. My problem with it is that most of those problems are symptoms of a bigger one that Mixin fails to acknowledge. +As you may have gleamed from the previous chapter, I am not a fan of Mixin. I respect its engineering, which is very clever, and acknowledge the problems it attempts to solve. My issue with it is that most of those problems are symptoms of a bigger one that Mixin fails to acknowledge. +## The problem, the solution Why do people fail at making patches? The answer is lack of checks mixed with general incompetence. Mixin thus set out to make it easy. My belief, though, is that the underlying issue is a general lack of readily available information on ASM patching. The Minecraft Forge forums soon banned discussion of the topic altogether, in a misguided attempt to discourage it. Should we be surprised that people are doing it wrong, if you can't talk about the topic in one of the biggest communities that may be interested in it? [Lillero](https://github.com/zaaarf/lillero) was my alternative answer to those problems. I wrote Lillero with a clear goal in mind: it should allow you to do everything, while keeping it as comfortable as it can get this close to bare metal. When used to its full potential, Lillero is lightweight and flexible, but also easy to write. Coupled with this book, it should empower anyone to write good patches following the best possible practices. +## Design At the heart of Lillero lies a Java interface, which any aspiring patch should implement: it will contain various methods, providing any metadata that may be needed by the loader as well as the one where the patching will happen. As we'll see, you won't have to write most of this boilerplate by hand: the [Lillero-processor](https://github.com/zaaarf/lillero-processor/) will take care of generating it. *Generating* is the keyword here: repetitive tasks aren't abstracted out, they are just made to write by the machine. One can open the generated files and easily see what each annotation does. By design, Lillero's inner workings should be clear and easy to follow for anyone wishing to learn. Should one want to dig deeper, they'll find that all code in the Lillero project is heavily documented, with a Javadoc for every last method and field, so that everything is perfectly clear to anyone wishing to learn from it. \ No newline at end of file diff --git a/src/1_introduction/why_mixin.md b/src/1_introduction/why_mixin.md index 5c694ba..8e8af4e 100644 --- a/src/1_introduction/why_mixin.md +++ b/src/1_introduction/why_mixin.md @@ -1,14 +1,19 @@ # Why (not) Mixin? [Mixin](https://github.com/SpongePowered/Mixin/) is a bytecode manipulation framework that has become very popular in recent years. Though it also relies on the [ASM library](https://asm.ow2.io/), Mixin is not an "ASM patching" framework in the true meaning of the word. Self-described as a "bytecode-weaving" framework, it allows the user to manipulate the bytecode without having to manually write a single instruction. -The user of Mixin will be writing in Java (or any other JVM language), rather than raw bytecode instructions, using annotations to provide any metadata (such as location) your bytecode might need. Working with Mixin is undeniably easier: you're trading the surgical precision of ASM patching for safety and comfort. Mixin tries to provide ways to achieve most things patching can do: as a result, it has become huge - some would say bloated - and in spite of that its replacements are clunky and impractical due to the high amount of abstraction needed. Suppose, for example, that you wish to modify the conditions of an `if()` statement in some way: with raw patching, since `if`s are compiled down to conditional jump instructions, this is a trivial task, possibly one of the easiest you can face; with Mixin, you'll likely be duplicating and overwriting half the method: all the fancy crutches Mixin has given you now are just getting in your way. +The user of Mixin will be writing in Java (or any other JVM language), rather than raw bytecode instructions, using annotations to provide any metadata (such as location) your bytecode might need. Working with Mixin is undeniably easier: you're trading the surgical precision of ASM patching for safety and comfort. Mixin tries to provide ways to achieve most things patching can do: as a result, it has become huge - some would say bloated - and in spite of that its replacements are clunky and impractical due to the high amount of abstraction needed. +Suppose, for example, that you wish to modify the conditions of an `if()` statement in some way: with raw patching, since `if`s are compiled down to conditional jump instructions, this is a trivial task, arguably one of the easiest you can face. With Mixin, you'll likely be duplicating and overwriting half the method: all the fancy crutches Mixin has given you now are just getting in your way. + +## Myths A widespread myth is that Mixin "allows for greater compatibility" with other mods that work to modify the same part of the code. This is is a half-truth at best. Poorly written Mixins can break compatibility as much as any bad ASM patch; conversely, properly made Mixins will work just as well as properly written ASM patches. -The main reason people say this is that the worst Mixin (one that `@Overwrite`s methods when it really isn't needed) is better than the worst ASM patch (one that injects its bytecode in the wrong spot): the former will simply erase any changes made by others, while the latter will crash your program in the best case, and cause weird behaviour in the worst. What I just said is an undeniable truth; it's also true that the best ASM patch is, depending on the task, equal to or better than the best Mixin, due to its superior precision and overall lower impact on the resulting code. Now, knowing this, ask yourself: are you aiming to write the best, or the worst? +The main reason people say this is that the worst Mixin (one that `@Overwrite`s methods when it really isn't needed) is better than the worst ASM patch (one that injects its bytecode in the wrong spot): the former will "simply" erase any changes made by others, while the latter will crash your program in the best case, and cause weird undetectable behaviour in the worst. What I just said is an undeniable truth; it's also true that the best ASM patch is, depending on the task, equal to or better than the best Mixin, due to its superior precision and overall lower impact on the resulting code. Now, knowing this, ask yourself: are you aiming to write the best, or the worst? +## Upsides The one upside Mixin *truly* has is that it's stricter: it performs a number of checks to ensure the validity of what you wrote, and since you're writing plain Java (or whatever other language), the compiler will also check the validity of your code. You have no such safety net in raw ASM. Finally, as I mentioned, Mixin is a rather big library; while most Minecraft mod loaders nowadays bundle it (which is a questionable design choice, but that's a topic for another time), this is not the case in other environments. In many cases, I've seen Mixin binaries bigger than the programs they were supposed to be backing. +## Conclusion Ultimately, whether to use Mixin or ASM patching is a matter of personal preference. Lots of great programmers choose not to bother with the complexities of bytecode and instead entrust that part to Mixin, and lots of incompetent programmers try and fail to do it manually, creating the botched patches that sparked this whole debate. Unfortunately, the latter category has given a terrible reputation to ASM patching. The purpose of this chapter is to disprove such myths, and show that ASM patching can be an effective alternative to high-level frameworks. diff --git a/src/2_patching/bytecode.md b/src/2_patching/bytecode.md new file mode 100644 index 0000000..43ef58c --- /dev/null +++ b/src/2_patching/bytecode.md @@ -0,0 +1,17 @@ +# Bytecode +Before we get into the specifics of bytecode manipulation, you should understand what exactly you will be dealing with. Patching essentially consists in modifying the *bytecode* of a class. If you're familiar with any flavour of assembly language, this will all look very familiar. + +Essentially, any programming language *targeting* the JVM (short for Java Virtual Machine) will be convereted by its compiler into machine code. Except that the machine code isn't going to be the one of *your* computer, as it happens with other programming languages: it will be the machine code of the JVM since it will be the one running your program anyway. + +*Java bytecode* is a human-readable representation of the machine code that the JVM is meant to interpret. With the right tools, it can be manipulated to change the behaviour of a program - which brings us here. Java bytecode is relatively high-level when compared to its native counterpart, including support for more abstract concepts like classes and inheritance, but still requires a way of thinking much closer to the functioning of a machine than what is needed for regular programming. + +Bytecode instructions are made up of various parts; first comes the *opcode*, a numerical ID (though you work with human-readable aliases for these numbers) then come a number of arguments which may vary depending on the opcode. + +## Stack-oriented programming +If you've ever attended any formal programming course, you'll be certainly familiar with the concepts of *stack* and *heap*. While on Java they'll at most be an occasional passing thought, when dealing with bytecode they become central. + +The stack is a quickly-accessible memory region that follows the rule *first in, last out*. It's often compared to a stack of plates: you can only ever add (*push*) new plates on the top, and can only ever take (*pop*) the one on the very top. It's highly efficient, but anything that gets put on the stack must *have* a known memory size at compile time. This makes it suitable for working with primitives, but not quite as much for objects. Those follow different rules. + +Objects are stored on the heap, and only a reference to their memory region - a map of sorts to find where their data is located - is pushed onto the stack. The heap is a messier, but bigger place: it's slower, but it allows retrieval of values from any point and doesn't need to know in advance the size of everything. + +Most bytecode instructions affect the stack in some way, either by taking its arguments from it or by pushing the result of the operation onto it. \ No newline at end of file diff --git a/src/2_patching/jump_nodes.md b/src/2_patching/jump_nodes.md new file mode 100644 index 0000000..37165a3 --- /dev/null +++ b/src/2_patching/jump_nodes.md @@ -0,0 +1 @@ +# Labels and Jump Nodes \ No newline at end of file diff --git a/src/2_patching/nodes.md b/src/2_patching/nodes.md index b644388..13fc6d3 100644 --- a/src/2_patching/nodes.md +++ b/src/2_patching/nodes.md @@ -3,4 +3,6 @@ The [ASM](https://asm.ow2.io/) library represents sequences of bytecode as [doub Each instruction is a node, represented by [various subclasses](https://asm.ow2.io/javadoc/org/objectweb/asm/tree/package-summary.html) of [`AbstractInsnNode`](https://asm.ow2.io/javadoc/org/objectweb/asm/tree/AbstractInsnNode.html); each node contains an opcode, a number of parameters depending on the opcode type, and references to the preceding and following nodes. -You can access the nodes within the the [`MethodNode`](https://asm.ow2.io/javadoc/org/objectweb/asm/tree/MethodNode.html) by accessing the `instructions` field. \ No newline at end of file +The `InsnList` representing the method's nodes is `MethodNode`'s `instructions` field. You can perform all operations you'd expect: append, insert, remove, etcetera. You should aim to leave the smallest possible footprint on the method, so *removing* nodes is almost always a bad idea. You can achieve the same result by *jumping over* the part you wish to remove. + +We'll now check out the various types of instruction nodes; you can find a detailed list of opcodes, with explanations, both on [this Wikipedia page](https://en.wikipedia.org/wiki/List_of_Java_bytecode_instructions) and on the [Java SE Specifications](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html). \ No newline at end of file diff --git a/src/2_patching/patching.md b/src/2_patching/patching.md index 731f073..a18522e 100644 --- a/src/2_patching/patching.md +++ b/src/2_patching/patching.md @@ -1,6 +1,6 @@ # Patching Since you are applying changes to the bytecode of a class, this must necessarily happen before said class is loaded in memory. The component that applies said changes is called a *loader*; don't concern yourself on the inner workings of loaders for now, just know that they are in charge of the initial step: we'll cover them in detail in their own chapter. -Suppose that you already have a working loader in place. This loader calls your *injector method*, and passes it a `ClassNode` and a `MethodNode` as arguments, representing respectively the container class and the target method. This is the most common type of ASM patching, and it's probably why you're here; more advanced subjects may be covered in additional chapters later on. +Suppose that you already have a working loader in place. This loader calls your *injector method*, and passes it a [`ClassNode`](https://asm.ow2.io/javadoc/org/objectweb/asm/tree/ClassNode.html) and a [`MethodNode`](https://asm.ow2.io/javadoc/org/objectweb/asm/tree/MethodNode.html) as arguments, representing respectively the container class and the method you're targeting. This is the most common type of ASM patching, and it's probably why you're here; more advanced subjects may be covered in additional chapters later on. -At a glance, this might seem restrictive. However, do keep in mind that even code outside of methods - in field declarations, in loose blocks, or in `static` blocks - is actually considered to be part of a method by the compiler. Specifically, `` for instance fields and loose blocks, and `` for static fields and `static` blocks. \ No newline at end of file +At a glance, this might seem restrictive. However, do keep in mind that even code outside of methods - in field declarations, in loose blocks, or in `static` blocks - is actually considered to be part of a method by the compiler. Specifically, the constructor (``) for instance fields and loose blocks, and the static constructor (``) for static fields and `static` blocks. \ No newline at end of file diff --git a/src/2_patching/patterns.md b/src/2_patching/patterns.md new file mode 100644 index 0000000..5bf9ce7 --- /dev/null +++ b/src/2_patching/patterns.md @@ -0,0 +1 @@ +# Pattern Matching diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 3188bf2..8a0fd7b 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -1,8 +1,8 @@ # Summary -[The Lillero Book](../README.md) [What is Lillero?](./what_is_lillero.md) - [An introduction to ASM Patching](./1_introduction/asm_patching.md) - [Why (not) Mixin?](./1_introduction/why_mixin.md) - [Why Lillero?](./1_introduction/why_lillero.md) - [Patching Methods](./2_patching/patching.md) - - [Nodes](./2_patching/nodes.md) + - [Bytecode](./2_patching/bytecode.md) + - [Nodes](./2_patching/nodes.md) \ No newline at end of file