
RUST Language Virtual Machine Learning digest
本文最后更新于 2024-10-17,本文发布时间距今超过 90 天, 文章内容可能已经过时。最新内容请以官方内容为准
RUST Language Virtual Machine Learning digest
在学习该项目得过程中,我也在不断地将原文英文文章翻译为中文,特此将其分享出来: LanguageVM Docs
希望有缘人得之, 能够帮到有缘人.
同时我学习过程中, 制作出来的项目也放在了 Github 上 ---> LRVM
其中使用 nom 最新版本进行 Language 解析, 解决了原文中老版本 nom 语法不再适用得问题.
00
进度
- build_vm_part_00.md
- build_vm_part_01.md
- build_vm_part_02.md
- build_vm_part_03.md
- build_vm_part_04.md
- build_vm_part_05.md
笔记
操作码表
#[derive(Debug, PartialEq)]
pub enum Opcode {
LOAD, // 0
ADD, // 1
SUB, // 2
MUL, // 3
DIV, // 4
HLT, // 5
JMP, // 6
JMPF, // 7
JMPB, // 8
EQ, // 9
NEQ, // 10
GTE, // 11
LTE, // 12
LT, // 13
GT, // 14
JMPE, // 15
NOP, // 16
ALOC, // 17
INC, // 18
DEC, // 19
DJMPE, // 20
IGL, // _
PRTS, // 21
LOADF64, // 22
ADDF64, // 23
SUBF64, // 24
MULF64, // 25
DIVF64, // 26
EQF64, // 27
NEQF64, // 28
GTF64, // 29
GTEF64, // 30
LTF64, // 31
LTEF64, // 32
SHL, // 33
SHR, // 34
AND, // 35
OR, // 36
XOR, // 37
NOT, // 38
LUI, // 39
CLOOP, // 40
LOOP, // 41
LOADM, // 42
SETM, // 43
PUSH, // 44
POP, // 45
CALL, // 46
RET, // 47
}
impl From<u8> for Opcode {
fn from(value: u8) -> Self {
match value {
0 => Opcode::LOAD,
1 => Opcode::ADD,
2 => Opcode::SUB,
3 => Opcode::MUL,
4 => Opcode::DIV,
5 => Opcode::HLT,
6 => Opcode::JMP,
7 => Opcode::JMPF,
8 => Opcode::JMPB,
9 => Opcode::EQ,
10 => Opcode::NEQ,
11 => Opcode::GTE,
12 => Opcode::LTE,
13 => Opcode::LT,
14 => Opcode::GT,
15 => Opcode::JMPE,
16 => Opcode::NOP,
17 => Opcode::ALOC,
18 => Opcode::INC,
19 => Opcode::DEC,
20 => Opcode::DJMPE,
21 => Opcode::PRTS,
22 => Opcode::LOADF64,
23 => Opcode::ADDF64,
24 => Opcode::SUBF64,
25 => Opcode::MULF64,
26 => Opcode::DIVF64,
27 => Opcode::EQF64,
28 => Opcode::NEQF64,
29 => Opcode::GTF64,
30 => Opcode::GTEF64,
31 => Opcode::LTF64,
32 => Opcode::LTEF64,
33 => Opcode::SHL,
34 => Opcode::SHR,
35 => Opcode::AND,
36 => Opcode::OR,
37 => Opcode::XOR,
38 => Opcode::NOT,
39 => Opcode::LUI,
40 => Opcode::CLOOP,
41 => Opcode::LOOP,
42 => Opcode::LOADM,
43 => Opcode::SETM,
44 => Opcode::PUSH,
45 => Opcode::POP,
46 => Opcode::CALL,
47 => Opcode::RET,
_ => Opcode::IGL,
}
}
}
pub struct VM {
/// 模拟硬件寄存器的数组
registers: [i32; 32],
/// 跟踪正在执行的字节的程序计数器
pc: usize,
/// 正在运行的程序的字节码
program: Vec<u8>,
/// 包含模除操作的余数
remainder: usize,
/// 包含最后一次比较操作的结果
equal_flag: bool,
}
- registers:
[i32; 32]
这个东西 (寄存器
) 就是拿来存放实际的值的一个数组,32 个 32 位的整数。- 如果需要取用一个暂存的值,那么需要用索引寄存器的方式来取得。
- 例如:
LOAD $1 #10
中$1
的1
就是索引,#10
的10
就是值。即registers[1] = 10
- 再比如:
ADD $1 $2 $3
$1
$2
$3
就是索引,ADD
就是操作符,$1
$2
$3
就是操作数。即registers[3] = registers[1] + registers[2]
- 又比如:
JMP $1
$1
就是索引,JMP
就是操作符,$1
就是操作数。即pc = registers[1]
- pc:
usize
这个是程序计数器,用来记录当前执行的字节码在 program 数组中的索引。 - program:
Vec<u8>
这个是程序,用来存储字节码。- 例如:
[0, 1, 1, 244, 1, 2, 0, 3]
在查询了操作码表之后就可以表示为:LOAD #1 calc(1u16 << 8 +244); ADD $2 $0 #3;
- 翻译为代码:
registers[1] = ((1u8 as u16) << 8) + 244 = 500; registers[3] = registers[2] + registers[0];
- 例如:
- reminder:
Vec<u8>
这个用来存储部分算术操作码的余数,- 例如:
DIV $1 $2 $3
即为registers[3] = registers[1] / registers[2]; VM.remainder = registers[1] % registers[2];
。
- 例如:
- equal_flag:
bool
用来表示两个数是否相等,- 例如:
EQ $1 $2
即为VM.equal_flag = registers[1] == registers[2];
但是本设计中为了和 MIPS 的EQ $1 $2 $3
指令保持一致,本设计中的EQ
指令的实现为VM.equal_flag = registers[1] == registers[2]; VM.next_8_bits();
使用了next_8_bits()
函数来跳过一个字节。
- 例如:
01
进度
- build_vm_part_06.md
- build_vm_part_07.md
- build_vm_part_08.md
- build_vm_part_09.md
笔记
- 更快的编译链接:
LLVM + LLD
, 从而提升编译速度,加快编码循环- 安装方式:On Linux:
- Ubuntu,
sudo apt-get install lld clang
- Arch,
sudo pacman -S lld clang
- Ubuntu,
- 使用方式:在项目目录下,新建一个
.cargo/config.toml
文件,内容如下:[target.x86_64-unknown-linux-gnu] rustflags = ["-C", "linker=clang", "-C", "link-arg=-fuse-ld=ll]
- 安装方式:On Linux:
- cargo clippy: 静态代码分析,优化代码,以写出更好,更优美的代码
- 安装方式:
rustup component add clipp
orcargo install clippy
- 使用方式:
cargo clippy
orcargo clippy -- -D warnings
- 安装方式:
- cargo fmt: 格式化代码,优化代码格式,使得代码更加美观
- 安装方式:
rustup component add rustfmt
orcargo install fmt
- 使用方式:
cargo fmt
orcargo fmt -- --check
- 安装方式:
- cargo watch: 自动编译和运行代码,快速迭代和调试。
- 安装方式:
cargo install cargo-watch
- 使用方式:
cargo watch -x check
orcargo watch -x run
cargo watch -x check -x test -x bench -x run
: 先检测,再测试,再基准测试,再运行。仅当有修改时,并且前面执行成功之后后续指令才执行
- 安装方式:
项目内容学习
- big endian rule/ little: 大端/小端字节序规则,端模式(Endian)的这个词出自 JonathanSwift 书写的《格列佛游记》。这本书根据将鸡蛋敲开的方法不同将所有的人分为两类,从圆头开始将鸡蛋敲开的人被归为 BigEndian,从尖头开始将鸡蛋敲开的人被归为 LittileEndian。小人国的内战就源于吃鸡蛋时是究竟从大头(Big-Endian)敲开还是从小头(Little-Endian)敲开。在计算机业 BigEndian 和 Little Endian 也几乎引起一场战争。在计算机业界,Endian 表示数据在存储器中的存放顺序。采用大端方式进行数据存放符合人类的正常思维,而采用小端方式进行数据存放利于计算机处理。
- 大端模式(Big_endian):字数据的高字节存储在低地址中,而字数据的低字节则存放在高地址中。
- 小端模式(Little_endian):字数据的高字节存储在高地址中,而字数据的低字节则存放在低地址中。
addr | big-endian | little-endian |
---|---|---|
0x0000 | 0x12 | 0xcd |
0x0001 | 0x34 | 0xab |
0x0002 | 0xab | 0x34 |
0x0003 | 0xcd | 0x12 |
- Nom 的各个功能作用以及使用:alt, map, map_res,terminated,tuple,context,multispace?,line_ending,alpha?,digit?,eof, preceded,tag,IResult...
┌─► Ok(
│ what the parser didn't touch,
│ what matched the regex
│ )
┌─────────┐ │
my input───►│my parser├──►either──┤
└─────────┘ └─► Err(...)
IResult<I, O> type.
The Ok variant has a tuple of (remaining_input: I, output: O);
whereas the Err variant stores an error.
List of parsers and combinators
Note: this list is meant to provide a nicer way to find a nom parser than reading through the documentation on docs.rs. Function combinators are organized in module so they are a bit easier to find.
Links present in this document will nearly always point to complete
version of the parser. Most of the parsers also have a streaming
version.
Basic elements
Those are used to recognize the lowest level elements of your grammar, like, "here is a dot", or "here is an big endian integer".
combinator | usage | input | output | comment |
---|---|---|---|---|
char | char('a') | "abc" | Ok(("bc", 'a')) | Matches one character (works with non ASCII chars too) |
is_a | is_a("ab") | "abbac" | Ok(("c", "abba")) | Matches a sequence of any of the characters passed as arguments |
is_not | is_not("cd") | "ababc" | Ok(("c", "abab")) | Matches a sequence of none of the characters passed as arguments |
one_of | one_of("abc") | "abc" | Ok(("bc", 'a')) | Matches one of the provided characters (works with non ASCII characters too) |
none_of | none_of("abc") | "xyab" | Ok(("yab", 'x')) | Matches anything but the provided characters |
tag | tag("hello") | "hello world" | Ok((" world", "hello")) | Recognizes a specific suite of characters or bytes |
tag_no_case | tag_no_case("hello") | "HeLLo World" | Ok((" World", "HeLLo")) | Case insensitive comparison. Note that case insensitive comparison is not well defined for unicode, and that you might have bad surprises |
take | take(4) | "hello" | Ok(("o", "hell")) | Takes a specific number of bytes or characters |
take_while | take_while(is_alphabetic) | "abc123" | Ok(("123", "abc")) | Returns the longest list of bytes for which the provided function returns true. take_while1 does the same, but must return at least one character, while take_while_m_n must return between m and n |
take_till | take_till(is_alphabetic) | "123abc" | Ok(("abc", "123")) | Returns the longest list of bytes or characters until the provided function returns true. take_till1 does the same, but must return at least one character. This is the reverse behaviour from take_while : take_till(f) is equivalent to take_while(|c| !f(c)) |
take_until | take_until("world") | "Hello world" | Ok(("world", "Hello ")) | Returns the longest list of bytes or characters until the provided tag is found. take_until1 does the same, but must return at least one character |
Choice combinators
combinator | usage | input | output | comment |
---|---|---|---|---|
alt | alt((tag("ab"), tag("cd"))) | "cdef" | Ok(("ef", "cd")) | Try a list of parsers and return the result of the first successful one |
permutation | permutation((tag("ab"), tag("cd"), tag("12"))) | "cd12abc" | Ok(("c", ("ab", "cd", "12")) | Succeeds when all its child parser have succeeded, whatever the order |
Sequence combinators
combinator | usage | input | output | comment |
---|---|---|---|---|
delimited | delimited(char('('), take(2), char(')')) | "(ab)cd" | Ok(("cd", "ab")) | Matches an object from the first parser and discards it, then gets an object from the second parser, and finally matches an object from the third parser and discards it. |
preceded | preceded(tag("ab"), tag("XY")) | "abXYZ" | Ok(("Z", "XY")) | Matches an object from the first parser and discards it, then gets an object from the second parser. |
terminated | terminated(tag("ab"), tag("XY")) | "abXYZ" | Ok(("Z", "ab")) | Gets an object from the first parser, then matches an object from the second parser and discards it. |
pair | pair(tag("ab"), tag("XY")) | "abXYZ" | Ok(("Z", ("ab", "XY"))) | Gets an object from the first parser, then gets another object from the second parser. |
separated_pair | separated_pair(tag("hello"), char(','), tag("world")) | "hello,world!" | Ok(("!", ("hello", "world"))) | Gets an object from the first parser, then matches an object from the sep_parser and discards it, then gets another object from the second parser. |
tuple | tuple((tag("ab"), tag("XY"), take(1))) | "abXYZ!" | Ok(("!", ("ab", "XY", "Z"))) | Chains parsers and assemble the sub results in a tuple. You can use as many child parsers as you can put elements in a tuple |
Applying a parser multiple times
combinator | usage | input | output | comment |
---|---|---|---|---|
count | count(take(2), 3) | "abcdefgh" | Ok(("gh", vec!["ab", "cd", "ef"])) | Applies the child parser a specified number of times |
many0 | many0(tag("ab")) | "abababc" | Ok(("c", vec!["ab", "ab", "ab"])) | Applies the parser 0 or more times and returns the list of results in a Vec. many1 does the same operation but must return at least one element |
many0_count | many0_count(tag("ab")) | "abababc" | Ok(("c", 3)) | Applies the parser 0 or more times and returns how often it was applicable. many1_count does the same operation but the parser must apply at least once |
many_m_n | many_m_n(1, 3, tag("ab")) | "ababc" | Ok(("c", vec!["ab", "ab"])) | Applies the parser between m and n times (n included) and returns the list of results in a Vec |
many_till | many_till(tag( "ab" ), tag( "ef" )) | "ababefg" | Ok(("g", (vec!["ab", "ab"], "ef"))) | Applies the first parser until the second applies. Returns a tuple containing the list of results from the first in a Vec and the result of the second |
separated_list0 | separated_list0(tag(","), tag("ab")) | "ab,ab,ab." | Ok((".", vec!["ab", "ab", "ab"])) | separated_list1 works like separated_list0 but must returns at least one element |
fold_many0 | fold_many0(be_u8, || 0, |acc, item| acc + item) | [1, 2, 3] | Ok(([], 6)) | Applies the parser 0 or more times and folds the list of return values. The fold_many1 version must apply the child parser at least one time |
fold_many_m_n | fold_many_m_n(1, 2, be_u8, || 0, |acc, item| acc + item) | [1, 2, 3] | Ok(([3], 3)) | Applies the parser between m and n times (n included) and folds the list of return value |
length_count | length_count(number, tag("ab")) | "2ababab" | Ok(("ab", vec!["ab", "ab"])) | Gets a number from the first parser, then applies the second parser that many times |
Integers
Parsing integers from binary formats can be done in two ways: With parser functions, or combinators with configurable endianness.
The following parsers could be found on docs.rs number section.
- configurable endianness:
i16
,i32
,i64
,u16
,u32
,u64
are combinators that take as argument anom::number::Endianness
, like this:i16(endianness)
. If the parameter isnom::number::Endianness::Big
, parse a big endiani16
integer, otherwise a little endiani16
integer. - fixed endianness: The functions are prefixed by
be_
for big endian numbers, and byle_
for little endian numbers, and the suffix is the type they parse to. As an example,be_u32
parses a big endian unsigned integer stored in 32 bits.be_f32
,be_f64
: Big endian floating point numbersle_f32
,le_f64
: Little endian floating point numbersbe_i8
,be_i16
,be_i24
,be_i32
,be_i64
,be_i128
: Big endian signed integersbe_u8
,be_u16
,be_u24
,be_u32
,be_u64
,be_u128
: Big endian unsigned integersle_i8
,le_i16
,le_i24
,le_i32
,le_i64
,le_i128
: Little endian signed integersle_u8
,le_u16
,le_u24
,le_u32
,le_u64
,le_u128
: Little endian unsigned integers
Streaming related
eof
: Returns its input if it is at the end of input datacomplete
: Replaces anIncomplete
returned by the child parser with anError
Modifiers
Parser::and
: method to create a parser by applying the supplied parser to the rest of the input after applyingself
, returning their results as a tuple (likesequence::tuple
but only takes one parser)Parser::and_then
: method to create a parser from applying another parser to the output ofself
map_parser
: function variant ofParser::and_then
Parser::map
: method to map a function on the output ofself
map
: function variant ofParser::map
Parser::flat_map
: method to create a parser which will map a parser returning function (such astake
or something which returns a parser) on the output ofself
, then apply that parser over the rest of the input. That is, this method accepts a parser-returning function which consumes the output ofself
, the resulting parser gets applied to the rest of the inputflat_map
: function variant ofParser::flat_map
cond
: Conditional combinator. Wraps another parser and calls it if the condition is metmap_opt
: Maps a function returning anOption
on the output of a parsermap_res
: Maps a function returning aResult
on the output of a parserinto
: Converts the child parser's result to another typenot
: Returns a result only if the embedded parser returnsError
orIncomplete
. Does not consume the inputopt
: Make the underlying parser optionalcut
: Transform recoverable error into unrecoverable failure (commitment to current branch)peek
: Returns a result without consuming the inputrecognize
: If the child parser was successful, return the consumed input as the produced valueconsumed
: If the child parser was successful, return a tuple of the consumed input and the produced output.verify
: Returns the result of the child parser if it satisfies a verification functionvalue
: Returns a provided value if the child parser was successfulall_consuming
: Returns the result of the child parser only if it consumed all the input
Error management and debugging
dbg_dmp
: Prints a message and the input if the parser fails
Text parsing
escaped
: Matches a byte string with escaped charactersescaped_transform
: Matches a byte string with escaped characters, and returns a new string with the escaped characters replaced
Binary format parsing
length_data
: Gets a number from the first parser, then takes a subslice of the input of that size, and returns that subslicelength_value
: Gets a number from the first parser, takes a subslice of the input of that size, then applies the second parser on that subslice. If the second parser returnsIncomplete
,length_value
will return an error
Bit stream parsing
bits
: Transforms the current input type (byte slice&[u8]
) to a bit stream on which bit specific parsers and more general combinators can be appliedbytes
: Transforms its bits stream input back into a byte slice for the underlying parser
Remaining combinators
success
: Returns a value without consuming any input, always succeedsfail
: Inversion ofsuccess
. Always fails.
Character test functions
Use these functions with a combinator like take_while
:
is_alphabetic
: Tests if byte is ASCII alphabetic:[A-Za-z]
is_alphanumeric
: Tests if byte is ASCII alphanumeric:[A-Za-z0-9]
is_digit
: Tests if byte is ASCII digit:[0-9]
is_hex_digit
: Tests if byte is ASCII hex digit:[0-9A-Fa-f]
is_oct_digit
: Tests if byte is ASCII octal digit:[0-7]
is_bin_digit
: Tests if byte is ASCII binary digit:[0-1]
is_space
: Tests if byte is ASCII space or tab:[ \t]
is_newline
: Tests if byte is ASCII newline:[\n]
Alternatively there are ready to use functions:
alpha0
: Recognizes zero or more lowercase and uppercase alphabetic characters:[a-zA-Z]
.alpha1
does the same but returns at least one characteralphanumeric0
: Recognizes zero or more numerical and alphabetic characters:[0-9a-zA-Z]
.alphanumeric1
does the same but returns at least one characteranychar
: Matches one byte as a charactercrlf
: Recognizes the string\r\n
digit0
: Recognizes zero or more numerical characters:[0-9]
.digit1
does the same but returns at least one characterdouble
: Recognizes floating point number in a byte string and returns af64
float
: Recognizes floating point number in a byte string and returns af32
hex_digit0
: Recognizes zero or more hexadecimal numerical characters:[0-9A-Fa-f]
.hex_digit1
does the same but returns at least one characterhex_u32
: Recognizes a hex-encoded integerline_ending
: Recognizes an end of line (both\n
and\r\n
)multispace0
: Recognizes zero or more spaces, tabs, carriage returns and line feeds.multispace1
does the same but returns at least one characternewline
: Matches a newline character\n
not_line_ending
: Recognizes a string of any char except\r
or\n
oct_digit0
: Recognizes zero or more octal characters:[0-7]
.oct_digit1
does the same but returns at least one characterbin_digit0
: Recognizes zero or more binary characters:[0-1]
.bin_digit1
does the same but returns at least one characterrest
: Return the remaining inputrest_len
: Return the length of the remaining inputspace0
: Recognizes zero or more spaces and tabs.space1
does the same but returns at least one charactertab
: Matches a tab character\t
02
进度
- build_vm_part_10.md
- build_vm_part_11.md
- build_vm_part_12.md
- build_vm_part_13.md
- build_vm_part_14.md
- build_vm_part_15.md
笔记
- label 跳转的原理:两遍遍历器实现跳转
因为存在俩个阶段,所以需要两遍遍历器实现跳转。那么也就是需要一个这样的数据结构 (Assembler):
// 枚举:第一阶段/第二阶段
#[derive(Debug, PartialEq, Clone)]
pub enum AssemblerPhase {
First,
Second,
}
#[derive(Debug)]
pub struct Assembler {
pub phase: AssemblerPhase, // 当前阶段:第一阶段/第二阶段
pub symbols: SymbolTable // 标签表:存储所有的 label: 标签和行号
}
impl Assembler {
pub fn new() -> Assembler {
Assembler {
phase: AssemblerPhase::First,
symbols: SymbolTable::new()
}
}
}
- 第一阶段:解析代码
- 将一串代码输入之后,尝试对代码进行解析,解析出所有的 label: 标签,并记录下来,同时记录对应后续调用 @label 标签对应的行号。
- 因为需要存储所有的 label: 标签,所以需要一个哈希表来存储,同时需要一个数组来存储行号。
- 继而就可以构建一个 SymbolTable,用来存储所有的 label: 标签和行号。
#[derive(Debug, PartialEq, Clone)]
pub struct SymbolTable {
// 存储所有的 Symbols (Labels)
pub symbols: Vec<Symbol>,
}
3. 有了 SymbolTable , 就需要考虑一下 Symbol 的数据结构了,要有 name 吧,要有 line 行号吧,额外还要有个 label 的类型,以便于后续拓展。
#[derive(Debug, PartialEq, Clone)]
pub enum SymbolType {
Label,
}
#[derive(Debug, PartialEq, Clone)]
pub struct Symbol {
name: String,
symbol_type: SymbolType,
offset: Option<u32>,
}
4. 名字有了,类型也有了,为什么要用 offset 来代替行号 line 呢?因为在第二遍的时候,要跳转,只用偏移量就可以按照需要跳转的位置 + 偏移量的计算方式就可以跳转了。
5. 如此这般,便在第一次遍历的时候,构建好了 SymbolTable,那么在第二次遍历的时候,就可以根据 SymbolTable 来进行跳转了。
2. 第二阶段:构建虚拟机代码
1. 它只是在每个 AssemblerInstruction 上调用 to_bytes
方法
2. 所有字节都被添加到 Vec<u8>
中,包含完全汇编的字节码
.
03
进度
- build_vm_part_16.md
笔记
Clap 是一个用于命令行参数解析的 Rust 库,它提供了一种简单的方式来解析命令行参数,并生成帮助信息。
- derive 模式:允许使用 derive 模式来定义命令行参数,同时使用 builder 模式来定义更复杂的参数组合。
- builder 模式:允许使用 builder 模式来定义命令行参数,并使用 derive 模式来定义更复杂的参数组合。
clap 的 FAQ
- Comparisons
- How many approaches are there to create a parser?
- When should I use the builder vs derive APIs?
- Why is there a default subcommand of help?
clap 优秀教程:
- Derive/Builder: 十分优秀的教程:深入探索 Rust 的 clap 库:命令行解析的艺术
- Derive/Builder: Rust 命令行库 Clap 快速入门教程
- Derive/Builder: Writing a CLI Tool in Rust with Clap
- Builder: Rust: Take your CLI to the Next Level with Clap
今日有些疲惫,因而代码并没有写什么,更多的是看资料和了解要编写的 Clap 内容。后续将 clap 部分加入到 lrvm 中。
04
进度
- build_vm_part_17.md
- build_vm_part_18.md
笔记
关于输入输出流在单元测试中的问题
今天在学习过程中,的确看到了一个问题:如何在测试中确保输出的内容就是 想要的字符/字符串呢?
我记得在最开始学习 rust 的时候,在 rust bible, rust 圣经里有说; 使用 write trait 来实现输出功能,就能够更方便的测试。
但是,在实际项目中,这种情况,比如 println!, print! 输出文字时是在整个函数的中间部分,而不能作为整个函数调用的返回部分..
例如:
fn main() {
// <snip>
if let Some(name) = abc()
{
println!("{}",name);
}
// <snip>
}
那么则没有办法直接将 name
这个字符串保留下来,进而直接判断。
该如何截取 stdin, stdout 这部分输入输出流才是正确的办法吧。但是我还不知道该如何处理,以后找一找解决办法,加一个中间层。
----------------------------- --------------------------------------- -----------------------------
| | | | | | | |
| stdin 用户输入 | ----> | 输入流 | 程序处理 | 输出流 | ----> | stdout 输出 |
| | | 拦截 | | 拦截 | | |
---------------------------- --------------------------------------- ----------------------------
关于 VM 架构和思考
- Opcode 和 Directive 的区别
- Opcode 是指在虚拟机中,对内存的操作,例如:MOV, ADD, SUB, MUL, DIV 等等。表示具体机器指令的操作码,可以直接被执行。
- Directive 是指在虚拟机中,对虚拟机自身进行操作,例如:LOAD, SAVE, CALL, RETURN 等等。用来指示汇编器如何处理源代码的伪指令,不会被执行,而是在汇编阶段被处理。
好的,让我们详细探讨一下Opcode
(操作码)和Directive
(伪指令/指令)之间的区别。这两个术语虽然在某些上下文中可能会有重叠,但在大多数情况下它们有着明确的不同意义。
Opcode(操作码)
操作码(Opcode)是指在机器语言中用来标识特定操作的代码。它是 CPU 指令集中的一部分,表示了一条具体的机器指令。在汇编语言中,操作码通常由助记符表示,例如 MOV
、ADD
、JMP
等。
特点
- 直接映射到硬件操作:操作码直接对应处理器上的物理操作,如移动数据、执行算术运算、跳转等。
- 可执行性:操作码是一条可以直接被执行的指令。
- 助记符表示:在汇编语言中,操作码通常用助记符表示,便于程序员理解和编写。
- 操作数:操作码通常带有一个或多个操作数,这些操作数指定了操作的对象或地址。
示例
假设一个简单的虚拟机,其中包含以下操作码:
LOAD
: 从内存加载一个值到寄存器。STORE
: 将寄存器中的值存储到内存。ADD
: 将两个寄存器的值相加并存入另一个寄存器。
在汇编语言中,这些操作码可能表示为:
LOAD $1 #0
ADD $1 $2 $3
SUB $2 $3 $1
Directive(伪指令/指令)
伪指令(Directive)并不是真正的机器指令,而是在汇编过程中用来控制汇编器行为的命令。伪指令主要用于提供元信息,如定义变量、分配内存、引入外部文件等。
特点
- 与硬件无关:伪指令在汇编阶段处理,不直接映射到任何特定的硬件操作。
- 非可执行性:伪指令本身不会被执行,而是用来改变汇编器的行为。
- 助记符表示:伪指令通常也有助记符表示,但这些助记符不是用来直接执行的。
- 控制汇编过程:伪指令用来控制汇编过程,如文件包含、宏定义等。
示例
假设一个简单的虚拟机,其中包含以下伪指令:
EQU
: 定义一个常量。ORG
: 设置汇编起始地址。DB
: 分配内存并初始化字节。DW
: 分配内存并初始化字。
在汇编语言中,这些伪指令可能表示为:
EQU START_ADDR 0x1000 ; 定义常量 START_ADDR
ORG $START_ADDR ; 设置汇编起始地址为 START_ADDR
DB 'Hello, World!' ; 分配内存并初始化字符串 "Hello, World!"
DW 0x1234 ; 分配内存并初始化一个16位数 0x1234
可以说伪指令提供了高层次的控制和描述,而操作码实现了这些描述的具体功能。
伪指令在汇编阶段被处理成相应的数据和符号定义,而操作码则被翻译成最终执行的机器码。