From 2741edaa115033ffc78273657d1a6e7c49e77b0e Mon Sep 17 00:00:00 2001 From: zaaarf Date: Tue, 17 Dec 2024 16:02:43 +0100 Subject: [PATCH] feat: revamped intro --- README.md | 6 ++-- src/1_introduction/toolbox.md | 9 ++++++ src/1_introduction/why_lillero.md | 39 +++++++++++++++++++++---- src/1_introduction/why_mixin.md | 8 +++-- src/2_patching/bytecode.md | 7 ----- src/2_patching/bytecode/examples.md | 2 ++ src/2_patching/bytecode/introduction.md | 8 +++++ src/2_patching/bytecode/stack.md | 8 +++++ src/2_patching/bytecode_examples.md | 3 +- src/2_patching/stack.md | 9 +----- src/SUMMARY.md | 9 +++--- src/what_is_lillero.md | 4 +-- 12 files changed, 79 insertions(+), 33 deletions(-) create mode 100644 src/1_introduction/toolbox.md create mode 100644 src/2_patching/bytecode/examples.md create mode 100644 src/2_patching/bytecode/introduction.md create mode 100644 src/2_patching/bytecode/stack.md diff --git a/README.md b/README.md index 322f37e..0ab06b7 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,11 @@ # The Lillero Book This is meant as an in-depth guide to ASM manipulation with the [Lillero](https://github.com/zaaarf/lillero) framework. While that's the main focus, it's written to also be useful to anyone wanting to learn ASM patching in general. Some parts (which will be flagged appropriately) may be applicable to Minecraft alone, but most of the book is meant to be generic. -The [ASM](https://asm.ow2.io/) library that Lillero is built with is capable of so much more than what is described in this book. In fact, the [official ASM guide](https://asm.ow2.io/asm4-guide.pdf) is a must-read for anyone wishing to understand its inner workings more in depth. While very well written, the official ASM guide reads more like a scientific paper than a practical manual, and this is part of my reasons for creating the Lillero Book. Much like Lillero is an intermediary layer meant to simplify working with ASM, this book is an intermediary step meant to give you a passable understanding of patching before plunging deep into the obscure inner workings of Java bytecode. +The [ASM](https://asm.ow2.io/) library that Lillero is built with is capable of so much more than what is described in this book. In fact, the [official ASM guide](https://asm.ow2.io/asm4-guide.pdf) is a must-read for anyone wishing to understand its inner workings more in depth, particularly the part about the Tree API. Nonetheless, the official ASM guide, while very well written, reads more like a scientific paper than a practical manual. -In short, this is no replacement for the ASM manual: think of the Lillero Book as a (relatively) simple introduction to its topics. +Much like Lillero is an intermediary layer meant to simplify working with ASM, this book is an intermediary step meant to give you a passable understanding of patching before plunging deep into the obscure inner workings of the Java bytecode. Or, if you don't want to do that, to get started with the bare minimum you need to know to avoid doing damage. + +In short, this is no replacement for the ASM manual: think of the Lillero Book as a relatively simple, opinionated and hopefully amusing introduction to its topics. ## Building This is built with [mdbook](https://github.com/rust-lang/mdBook): simply install `mdbook`, clone this, and run `mdbook build` in the root folder. You'll find the compiled and static html in the `book` subfolder. diff --git a/src/1_introduction/toolbox.md b/src/1_introduction/toolbox.md new file mode 100644 index 0000000..1c5e6fa --- /dev/null +++ b/src/1_introduction/toolbox.md @@ -0,0 +1,9 @@ +# Your Toolbox +There are any number of tools out there that can aid you in this. The most important thing you'll need is a decompiler/disassembler: something that can take compiled code and show you the bytecode, and ideally a Java approximation of it as well. + +If you use [IntelliJ IDEA](https://www.jetbrains.com/idea/), which I recommend for this task, you have everything you need built into the editor. To access the bytecode viewer, go on any decompiled file, hit "View" and you should find "Show Bytecode" somewhere in there. That's really all you should need for this job. However, if you dislike IntelliJ's UI or have one of many possible reasonable concerns about it, there are other (more complicated) ways to go about this. + +There are a few options for those that want to use more minimal IDEs that don't have their own integrated tooling for this. I'm not going to get in detail about them, but these are also other options I know to be valid: +- [Recaf](https://github.com/Col-E/Recaf). It's an all-in-one decompiler and disassembler, also capable of debugging bytecode, which is a rather neat feature to have. +- [Bytecode-viewer](https://github.com/Konloch/bytecode-viewer/), where the name tells it all. This has the interesting twist of running multiple decompilers and allowing you to compare the outputs. Which is kind of useless for this task, but may have uses in other contexts. + diff --git a/src/1_introduction/why_lillero.md b/src/1_introduction/why_lillero.md index 5095664..65e0ea7 100644 --- a/src/1_introduction/why_lillero.md +++ b/src/1_introduction/why_lillero.md @@ -1,12 +1,41 @@ # Why Lillero? -As you may have gleamed from the previous chapter, I am not a fan of Mixin. I respect its engineering, which is very clever, and acknowledge the problems it attempts to solve. My issue with it is that most of those problems are symptoms of a bigger one that Mixin fails to acknowledge. +As you may have gleamed from the previous chapter, I am not a fan of Mixin. I respect its rather clever engineering, and acknowledge the problems it attempts to solve. My issue with it is that most of those are symptoms of a bigger one that Mixin appears completely blind to. ## The problem, the solution -Why do people fail at making patches? The answer is lack of checks mixed with general incompetence. Mixin thus set out to make it easy. My belief, though, is that the underlying issue is a general lack of readily available information on ASM patching. The Minecraft Forge forums soon banned discussion of the topic altogether, in a misguided attempt to discourage it. Should we be surprised that people are doing it wrong, if you can't talk about the topic in one of the biggest communities that may be interested in it? +Why do people fail at making patches? The answer, I think, is the lack of checks intrinsic to low-level programming, combined with widespread incompetence. Yes, incompetence: it's no secret that modders, and especially Minecraft modders, are often people who are just starting out. It's okay, everyone sucks at first. The truly dreadful thing is the absence of information online on ASM patching. There is a host of poorly realised YouTube tutorials who teach more to imitate than to think, a handful decade-old guides written by newbies for newbies and then... nothing. This may or may not have to do with the choice by many of the major modding forums to ban discussion of ASM patching altogether, in a misguided attempt to discourage the practice. -[Lillero](https://github.com/zaaarf/lillero) was my alternative answer to those problems. I wrote Lillero with a clear goal in mind: it should allow you to do everything, while keeping it as comfortable as it can get this close to bare metal. When used to its full potential, Lillero is lightweight and flexible, but also easy to write. Coupled with this book, it should empower anyone to write good patches following the best possible practices. +Mixin took notice of the difficulties people had, and tried to make modifying Minecraft easy, by *hiding all the complexities* behind a *seemingly* safe-to-use API. This has led to many of the unfortunate myths that surround it, such as the ones discussed in the previous chapter. + +[Lillero](https://github.com/zaaarf/lillero) was my alternative answer to those problems. I wrote Lillero with a clear goal in mind: it should allow its users to use ASM's power to its full extent, while keeping it as comfortable as it can get this close to bare metal. At the end of the day, it's still ASM, minus the repetitive, boilerplate-y parts (for instance, writing descriptors to match existing methods and classes). When used to its full potential, Lillero is lightweight and flexible, but also easy to write. Coupled with this book, it should empower anyone to write good patches following the best possible practices. And - this is the key to it - to actually *learn* about the topic. ## Design -At the heart of Lillero lies a Java interface, which any aspiring patch should implement: it will contain various methods, providing any metadata that may be needed by the loader as well as the one where the patching will happen. As we'll see, you won't have to write most of this boilerplate by hand: the [Lillero-processor](https://github.com/zaaarf/lillero-processor/) will take care of generating it. +At the heart of Lillero lies an interface, [`IInjector`](https://docs.zaaarf.foo/lillero/ftbsc/lll/IInjector.html) which any aspiring patch should implement: it will contain various methods, providing any metadata that may be needed by the loader as well as the one where the patching will happen. As we'll see, you won't have to write most of this boilerplate by hand: the [Lillero-processor](https://github.com/zaaarf/lillero-processor/) will take care of generating it. -*Generating* is the keyword here: repetitive tasks aren't abstracted out, they are just made to write by the machine. One can open the generated files and easily see what each annotation does. By design, Lillero's inner workings should be clear and easy to follow for anyone wishing to learn. Should one want to dig deeper, they'll find that all code in the Lillero project is heavily documented, with a Javadoc for every last method and field, so that everything is perfectly clear to anyone wishing to learn from it. \ No newline at end of file +### Fast +Unlike virtually all similar programs, Lillero's intended flow is based on *code generation*. Repetitive tasks aren't abstracted out, they are just made to write by the machine: one can easily open the generated files folder and see for themselves what's behind the magic. + +I'd also like to mention that Lillero itself makes no direct use of reflection (although a JDK implementation might in the code referenced in it, but let's hope not). Jeva developers in general, and Minecraft developers in particular, have an obsession with reflection. It's a useful language feature, but it has a considerable performance overhead compared to normal operations, and this fact seems to elude many (see my passive-aggressive remark about disk space in the previous chapter). Needless to say, Mixin is pretty much entirely built on reflection. + +### Modular +Lillero is, by design, extremely modular. Any of its individual components can be plugged out if the feature it provides is not needed. Do you not need obfuscation? Then don't use it. Not that it matters, mind you, since none of that gets bundled, but it's important to note that you can write Lillero without the processor, if you so wish. There's just no real reason not to, since it does not come with significant overhead. + +Perhaps more interestingly, anybody can implement a custom loader to suit their environment, and there is no need to depend on the [reference implementation](https://github.com/zaaarf/lillero-loader) which is specific to Minecraft Forge (modern versions of it). + +### Tiny +Lillero is *tiny*. All of it is, really, but the parts that actually matter (the ones you need at runtime) are *especially* tiny. Here are some sizes: +- The [core library](https://github.com/zaaarf/lillero), as of version `0.5.1`, is *28 KBs*. +- The [reference loader](https://github.com/zaaarf/lillero-loader) , as of version `0.1.3`, is *8 KBs*. + +Technically, those two are the only ones you'll want at runtime. But, in case you were curious, these are the sizes of the *compile-time* dependencies: +- The [processor](https://github.com/zaaarf/lillero-processor), as of version `0.7.0`, is *40 KBs*. + - Okay, that's a half-truth. The processor depends on [JavaPoet](https://github.com/square/javapoet/), which as of version `1.13.0` is *103 KBs*. Let me reiterate that these are needed exclusively at compile-time. +- The [mapper](https://github.com/zaaarf/lillero-mapper), as of version `0.4.1`, is *24 KBs*. + +For disclosure, I'm excluding [Lillero-mapping-writer](https://github.com/zaaarf/Lillero-mapping-writer), which bundles Apache CLI and all of its transitive dependencies in order to be executable. I really only made it for debugging, anyway; unless you go out of your way to get it, this is unlikely to end up on your computer. It's not even on Maven. + +Incidentally, its file size makes Lillero far more portable than Mixin (technically, this just isn't true for those modloaders where Mixin is bundled, but that just isn't playing fair). For instance, if you were to bundle it in your mod, you'd only need to bundle the core library; if you were to use the recommended flow for modern Minecraft Forge, you'd need just that plus the refernece loader. On top of this, there are just the classes generated by the processor, which get compiled normally into your mod. You may be tempted to assume that those make up for the huge difference in space from Mixin... but no, not really. Mixin is just that bloated. + +### Simple +A glance at one of the generated classes should be plenty for anyone experienced enough to figure out how the thing works. + +Anyone wishing to read up on how it works (not that I think it's a masterpiece or anything like that) can do so by looking into the repo. I've tried to keep the codebase clean and easy to follow. For anyone wanting to dig deeper, they'll find that all code in the Lillero project is heavily documented, perhaps more than necessary, with a Javadoc for every last method and field and plenty of comments explaining step-by-step particulary long methods. diff --git a/src/1_introduction/why_mixin.md b/src/1_introduction/why_mixin.md index 8e8af4e..f4a54e2 100644 --- a/src/1_introduction/why_mixin.md +++ b/src/1_introduction/why_mixin.md @@ -5,15 +5,17 @@ The user of Mixin will be writing in Java (or any other JVM language), rather th Suppose, for example, that you wish to modify the conditions of an `if()` statement in some way: with raw patching, since `if`s are compiled down to conditional jump instructions, this is a trivial task, arguably one of the easiest you can face. With Mixin, you'll likely be duplicating and overwriting half the method: all the fancy crutches Mixin has given you now are just getting in your way. +My second gripe is how massive it is. Mixin is *ridiculously* bloated. Any problem it may or may not solve pales in front of the simple fact that the average mod binary, excluding assets and bundled code, takes up a tenth of the space that Mixin does, *if not less*. Back in the days of 1.12.X modding, people would *bundle* the Mixin binary in their mod's JAR (using [shadow](https://gradleup.com/shadow/) or something similar), often with ridiculous results. This is less of a problem nowadays, as most modern Minecraft modloaders (unfortunately) bundle Mixin into their binary. + +On occasion, you may hear people that are aware of this and say that it's *no biggie* to waste a few megabytes like that, since modern computers have so much space. If you agree with them, I don't think I can say much to change your mind. Just, please, next time you wonder why your browser seems to use the disk space and resources of the latest Call of Duty, think back on this paragraph. + ## Myths A widespread myth is that Mixin "allows for greater compatibility" with other mods that work to modify the same part of the code. This is is a half-truth at best. Poorly written Mixins can break compatibility as much as any bad ASM patch; conversely, properly made Mixins will work just as well as properly written ASM patches. The main reason people say this is that the worst Mixin (one that `@Overwrite`s methods when it really isn't needed) is better than the worst ASM patch (one that injects its bytecode in the wrong spot): the former will "simply" erase any changes made by others, while the latter will crash your program in the best case, and cause weird undetectable behaviour in the worst. What I just said is an undeniable truth; it's also true that the best ASM patch is, depending on the task, equal to or better than the best Mixin, due to its superior precision and overall lower impact on the resulting code. Now, knowing this, ask yourself: are you aiming to write the best, or the worst? ## Upsides -The one upside Mixin *truly* has is that it's stricter: it performs a number of checks to ensure the validity of what you wrote, and since you're writing plain Java (or whatever other language), the compiler will also check the validity of your code. You have no such safety net in raw ASM. - -Finally, as I mentioned, Mixin is a rather big library; while most Minecraft mod loaders nowadays bundle it (which is a questionable design choice, but that's a topic for another time), this is not the case in other environments. In many cases, I've seen Mixin binaries bigger than the programs they were supposed to be backing. +Frankly, the one upside Mixin *truly* has is that it's stricter: it performs a number of checks to ensure the validity of what you wrote, and since you're writing plain Java (or whatever other language), the compiler will also check the validity of your code. You have no such safety net in raw ASM, and I'm not going to pretend otherwise. ## Conclusion Ultimately, whether to use Mixin or ASM patching is a matter of personal preference. Lots of great programmers choose not to bother with the complexities of bytecode and instead entrust that part to Mixin, and lots of incompetent programmers try and fail to do it manually, creating the botched patches that sparked this whole debate. Unfortunately, the latter category has given a terrible reputation to ASM patching. The purpose of this chapter is to disprove such myths, and show that ASM patching can be an effective alternative to high-level frameworks. diff --git a/src/2_patching/bytecode.md b/src/2_patching/bytecode.md index 5db9572..ed65e8e 100644 --- a/src/2_patching/bytecode.md +++ b/src/2_patching/bytecode.md @@ -1,8 +1 @@ # Bytecode -Before we get into the specifics of bytecode manipulation, you should understand what exactly you will be dealing with. Patching essentially consists in modifying the *bytecode* of a class. If you're familiar with any flavour of assembly language, this will all look very familiar. - -Essentially, any programming language *targeting* the JVM (short for Java Virtual Machine) will be converted by its compiler into machine code. Except that the machine code isn't going to be the one of *your* computer, as it happens with other programming languages: it will be the machine code of the JVM since it will be the one running your program anyway. - -*Java bytecode* is a human-readable representation of the machine code that the JVM is meant to interpret. With the right tools, it can be manipulated to change the behaviour of a program - which brings us here. Java bytecode is relatively high-level when compared to its native counterpart, including support for more abstract concepts like classes and inheritance, but still requires a way of thinking much closer to the functioning of a machine than what is needed for regular programming. - -Bytecode instructions are made up of various parts; first comes the *opcode*, a numerical ID (though you work with human-readable aliases for these numbers) then come a number of arguments which may vary depending on the opcode. diff --git a/src/2_patching/bytecode/examples.md b/src/2_patching/bytecode/examples.md new file mode 100644 index 0000000..cd067a5 --- /dev/null +++ b/src/2_patching/bytecode/examples.md @@ -0,0 +1,2 @@ +# Bytecode examples +TODO \ No newline at end of file diff --git a/src/2_patching/bytecode/introduction.md b/src/2_patching/bytecode/introduction.md new file mode 100644 index 0000000..793fd60 --- /dev/null +++ b/src/2_patching/bytecode/introduction.md @@ -0,0 +1,8 @@ +# An Introduction to Bytecode +Before we get into the specifics of bytecode manipulation, you should understand what exactly you will be dealing with. Patching essentially consists in modifying the *bytecode* of a class. If you're familiar with any flavour of assembly language, this will all look very familiar. + +Essentially, any programming language *targeting* the JVM (short for Java Virtual Machine) will be converted by its compiler into machine code. Except that the machine code isn't going to be the one of *your* computer, as it happens with other programming languages: it will be the machine code of the JVM since it will be the one running your program anyway. + +*Java bytecode* is a human-readable representation of the machine code that the JVM is meant to interpret. With the right tools, it can be manipulated to change the behaviour of a program - which brings us here. Java bytecode is relatively high-level when compared to its native counterpart, including support for more abstract concepts like classes and inheritance, but still requires a way of thinking much closer to the functioning of a machine than what is needed for regular programming. + +Bytecode instructions are made up of various parts; first comes the *opcode*, a numerical ID (though you work with human-readable aliases for these numbers) then come a number of arguments which may vary depending on the opcode. diff --git a/src/2_patching/bytecode/stack.md b/src/2_patching/bytecode/stack.md new file mode 100644 index 0000000..0847e90 --- /dev/null +++ b/src/2_patching/bytecode/stack.md @@ -0,0 +1,8 @@ +# Stack-oriented programming +If you've ever attended any formal programming course, you'll be certainly familiar with the concepts of *stack* and *heap*. While working on regular Java they'll at most be an occasional passing thought, but when dealing with bytecode they become central. In fact, like most assembly languages, Java bytecode is what you'd call a [*stack-oriented* programming language](https://en.wikipedia.org/wiki/Stack-oriented_programming). + +The stack is a quickly-accessible memory region that follows the rule *first in, last out*. It's often compared to a stack of plates: you can only ever add (*push*) new plates on the top, and can only ever take (*pop*) the one on the very top. It's highly efficient, but anything that gets put on the stack must *have* a known memory size at compile time. This makes it suitable for working with primitives, but not quite as much for objects. Those follow different rules. + +When you create a new object, memory is allocated on the heap, and a *reference* to the object is pushed onto the stack. A reference is a hexadecimal number, of known and fixed size, that represents the *memory address* of the location of a certain object. The heap is a messier, but bigger place: it's slower, but it allows retrieval of values from any point and doesn't need to know the size of everything in advance. + +Most bytecode instructions affect the stack in some way. Depending on the opcode, values may be *popped* from the stack and/or a return value may be *pushed* onto it. Understanding how the stack works and how to work with it are necessary steps to gaining a true understanding of bytecode. \ No newline at end of file diff --git a/src/2_patching/bytecode_examples.md b/src/2_patching/bytecode_examples.md index cd067a5..df635b4 100644 --- a/src/2_patching/bytecode_examples.md +++ b/src/2_patching/bytecode_examples.md @@ -1,2 +1 @@ -# Bytecode examples -TODO \ No newline at end of file +# Examples diff --git a/src/2_patching/stack.md b/src/2_patching/stack.md index 0847e90..f944039 100644 --- a/src/2_patching/stack.md +++ b/src/2_patching/stack.md @@ -1,8 +1 @@ -# Stack-oriented programming -If you've ever attended any formal programming course, you'll be certainly familiar with the concepts of *stack* and *heap*. While working on regular Java they'll at most be an occasional passing thought, but when dealing with bytecode they become central. In fact, like most assembly languages, Java bytecode is what you'd call a [*stack-oriented* programming language](https://en.wikipedia.org/wiki/Stack-oriented_programming). - -The stack is a quickly-accessible memory region that follows the rule *first in, last out*. It's often compared to a stack of plates: you can only ever add (*push*) new plates on the top, and can only ever take (*pop*) the one on the very top. It's highly efficient, but anything that gets put on the stack must *have* a known memory size at compile time. This makes it suitable for working with primitives, but not quite as much for objects. Those follow different rules. - -When you create a new object, memory is allocated on the heap, and a *reference* to the object is pushed onto the stack. A reference is a hexadecimal number, of known and fixed size, that represents the *memory address* of the location of a certain object. The heap is a messier, but bigger place: it's slower, but it allows retrieval of values from any point and doesn't need to know the size of everything in advance. - -Most bytecode instructions affect the stack in some way. Depending on the opcode, values may be *popped* from the stack and/or a return value may be *pushed* onto it. Understanding how the stack works and how to work with it are necessary steps to gaining a true understanding of bytecode. \ No newline at end of file +# Stack-oriented Programming diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 94102ff..448b2c7 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -3,10 +3,11 @@ - [An introduction to ASM Patching](./1_introduction/asm_patching.md) - [Why (not) Mixin?](./1_introduction/why_mixin.md) - [Why Lillero?](./1_introduction/why_lillero.md) + - [Your Toolbox](./1_introduction/toolbox.md) - [Patching Methods](./2_patching/patching.md) - - [Bytecode](./2_patching/bytecode.md) - - [Stack-oriented Programming](./2_patching/stack.md) - - [Examples](./2_patching/bytecode_examples.md) + - [Bytecode](./2_patching/bytecode/introduction.md) + - [Stack-oriented Programming](./2_patching/bytecode/stack.md) + - [Examples](./2_patching/bytecode/examples.md) - [Nodes](./2_patching/nodes.md) - [Jump Nodes](./2_patching/jump_nodes.md) - [Invoke Dynamic Nodes](./2_patching/jump_nodes.md) @@ -19,4 +20,4 @@ - [Table Switch Nodes](./2_patching/jump_nodes.md) - [Type Nodes](./2_patching/jump_nodes.md) - [Var Nodes](./2_patching/jump_nodes.md) - - [Pattern Matching](./2_patching/patterns.md) \ No newline at end of file + - [Pattern Matching](./2_patching/patterns.md) diff --git a/src/what_is_lillero.md b/src/what_is_lillero.md index 9de96cd..b86fca1 100644 --- a/src/what_is_lillero.md +++ b/src/what_is_lillero.md @@ -1,11 +1,11 @@ # What is Lillero? Lillero is a lightweight and simple Java ASM patching framework built on top of [ObjectWeb's ASM library](https://asm.ow2.io/). -It can be used in conjunction with any loader that supports the ASM library's `ClassVisitor` system. +It can be used in conjunction with any loader that supports the ASM library's Tree API (i.e. `ClassNode` and `MethodNode`). Lillero is made up of multiple components: - [Lillero](https://github.com/zaaarf/lillero), the core library. -- [Lillero-processor](https://github.com/zaaarf/Lillero-processor), the annotation processor. +- [Lillero-processor](https://github.com/zaaarf/Lillero-processor), an annotation processor that generates the boilerplate for you. - [Lillero-mapper](https://github.com/zaaarf/lillero-mapper), a library providing the ability to read multiple obfuscation mapping formats. - [Lillero-mapping-writer](https://github.com/zaaarf/Lillero-mapping-writer), a CLI tool for converting and inverting mapping formats.