VM Syscalls
| Number | Category | Status | Author | Organization | Created |
|---|---|---|---|---|---|
| 0009 | Standards Track | Proposal | Xuejie Xiao | Nervos Foundation | 2018-12-14 |
VM Syscalls
Abstract
This document describes all the RISC-V VM syscalls implemented in CKB so far.
Introduction
CKB VM syscalls are used to implement communications between the RISC-V based CKB VM, and the main CKB process, allowing scripts running in the VM to read current transaction information as well as general blockchain information from CKB. Leveraging syscalls instead of custom instructions allow us to maintain a standard compliant RISC-V implementation which can embrace the broadest industrial support.
Specification
In CKB we use RISC-V's standard syscall solution: each syscall accepts 6 arguments stored in register A0 through A5. Each argument here is of register word size so it can store either regular integers or pointers. The syscall number is stored in A7. After all the arguments and syscall number are set, ecall instruction is used to trigger syscall execution, CKB VM then transfers controls from the VM to the actual syscall implementation beneath. For example, the following RISC-V assembly would trigger Exit syscall with a return code of 10:
li a0, 10li a7, 93ecallAs shown in the example, not all syscalls use all the 6 arguments. In this case the caller side can only fill in the needed arguments.
Syscalls can respond to the VM in 2 ways:
- A return value is put in
A0if exists. - Syscalls can also write data in memory location pointed by certain syscall arguments, so upon syscall completion, normal VM instructions can read the data prepared by the syscall.
For convenience, we could wrap the logic of calling a syscall in a C function:
static inline long__internal_syscall(long n, long _a0, long _a1, long _a2, long _a3, long _a4, long _a5){ register long a0 asm("a0") = _a0; register long a1 asm("a1") = _a1; register long a2 asm("a2") = _a2; register long a3 asm("a3") = _a3; register long a4 asm("a4") = _a4; register long a5 asm("a5") = _a5;
register long syscall_id asm("a7") = n;
asm volatile ("scall" : "+r"(a0) : "r"(a1), "r"(a2), "r"(a3), "r"(a4), "r"(a5), "r"(syscall_id));
return a0;}
#define syscall(n, a, b, c, d, e, f) \ __internal_syscall(n, (long)(a), (long)(b), (long)(c), (long)(d), (long)(e), (long)(f))(NOTE: this is adapted from riscv-newlib)
Now we can trigger the same Exit syscall more easily in C code:
syscall(93, 10, 0, 0, 0, 0, 0);Note that even though Exit syscall only needs one argument, our C wrapper requires us to fill in all 6 arguments. We can initialize other unused arguments as all 0. Below we would illustrate each syscall with a C function signature to demonstrate each syscall's accepted arguments. Also for clarifying reason, all the code shown in this RFC is assumed to be written in pure C.
- Exit
- Load Transaction Hash
- Load Script Hash
- Load Cell
- Load Cell By Field
- Load Input
- Load Input By Field
- Load Header
- Debug
Exit
As shown above, Exit syscall has a signature like following:
void exit(int8_t code){ syscall(93, code, 0, 0, 0, 0, 0);}Exit syscall don't need a return value since CKB VM is not supposed to return from this function. Upon receiving this syscall, CKB VM would terminate execution with the specified return code. This is the only way of correctly exiting a script in CKB VM.
Load Transaction Hash
Load Transaction Hash syscall has a signature like following:
int ckb_load_tx_hash(void* addr, uint64_t* len, size_t offset){ return syscall(2061, addr, len, offset, 0, 0, 0);}The arguments used here are:
addr: a pointer to a buffer in VM memory space denoting where we would load the serialized transaction data.len: a pointer to a 64-bit unsigned integer in VM memory space, when calling the syscall, this memory location should store the length of the buffer specified byaddr, when returning from the syscall, CKB VM would fill inlenwith the actual length of the buffer. We would explain the exact logic below.offset: an offset specifying from which offset we should start loading the serialized transaction data.
This syscall would calculate the hash of current transaction and copy it to VM memory space.
The result is fed into VM via the steps below. For ease of reference, we refer the result as data, and the length of data as data_length.
- A memory read operation is executed to read the value in
lenpointer from VM memory space, we call the read resultsizehere. full_sizeis calculated asdata_length - offset.real_sizeis calculated as the minimal value ofsizeandfull_size- The serialized value starting from
&data[offset]till&data[offset + real_size]is written into VM memory space location starting fromaddr. full_sizeis written intolenpointer0is returned from the syscall denoting execution success.
The whole point of this process, is providing VM side a way to do partial reading when the available memory is not enough to support reading the whole data altogether.
One trick here, is that by providing NULL as addr, and a uint64_t pointer with 0 value as len, this syscall can be used to fetch the length of the serialized data part without reading any actual data.
Load Script Hash
Load Script Hash syscall has a signature like following:
int ckb_load_script_hash(void* addr, uint64_t* len, size_t offset){ return syscall(2062, addr, len, offset, 0, 0, 0);}The arguments used here are:
addr: the exact sameaddrpointer as used in Load Transaction Hash syscall.len: the exact samelenpointer as used in Load Transaction Hash syscall.offset: the exact sameoffsetvalue as used in Load Transaction Hash syscall.
This syscall would calculate the hash of current running script and copy it to VM memory space. This is the only part a script can load on its own cell.
Load Cell
Load Cell syscall has a signature like following:
int ckb_load_cell(void* addr, uint64_t* len, size_t offset, size_t index, size_t source){ return syscall(2071, addr, len, offset, index, source, 0);}The arguments used here are:
addr: the exact sameaddrpointer as used in Load Transaction Hash syscall.len: the exact samelenpointer as used in Load Transaction Hash syscall.offset: the exact sameoffsetvalue as used in Load Transaction Hash syscall.index: an index value denoting the index of cells to read.source: a flag denoting the source of cells to locate, possible values include:- 1: input cells.
- 2: output cells.
- 3: dep cells.
This syscall would locate a single cell in the current transaction based on source and index value, serialize the whole cell into the Molecule Encoding 1 format, then use the same step as documented in Load Transaction Hash syscall to feed the serialized value into VM.
Specifying an invalid source value here would immediately trigger a VM error, specifying an invalid index value here, however, would result in 1 as return value, denoting the index used is out of bound. Otherwise the syscall would return 0 denoting success state.
Note this syscall is only provided for advanced usage that requires hashing the whole cell in a future proof way. In practice this is a very expensive syscall since it requires serializing the whole cell, in the case of a large cell with huge data, this would mean a lot of memory copying. Hence CKB should charge much higher cycles for this syscall and encourage using Load Cell By Field syscall below.
Load Cell By Field
Load Cell By Field syscall has a signature like following:
int ckb_load_cell_by_field(void* addr, uint64_t* len, size_t offset, size_t index, size_t source, size_t field){ return syscall(2081, addr, len, offset, index, source, field);}The arguments used here are:
addr: the exact sameaddrpointer as used in Load Transaction Hash syscall.len: the exact samelenpointer as used in Load Transaction Hash syscall.offset: the exact sameoffsetvalue as used in Load Transaction Hash syscall.index: an index value denoting the index of cells to read.source: a flag denoting the source of cells to locate, possible values include:- 1: input cells.
- 2: output cells.
- 3: dep cells.
field: a flag denoting the field of the cell to read, possible values include:- 0: capacity.
- 1: data.
- 2: data hash.
- 3: lock.
- 4: lock hash.
- 5: type.
- 6: type hash.
This syscall would locate a single cell in current transaction just like Load Cell syscall, but what's different, is that this syscall would only extract a single field in the specified cell based on field, then serialize the field into binary format with the following rules:
capacity: capacity is serialized into 8 little endian bytes, this is also how Molecule Encoding 1 handles 64-bit unsigned integers.data: data field is already in binary format, we can just use it directly, there's no need for further serializationdata hash: 32 raw bytes are extracted fromH256structure by serializing data fieldlock: lock script is serialized into the Molecule Encoding 1 formatlock hash: 32 raw bytes are extracted fromH256structure and used directlytype: type script is serialized into the Molecule Encoding 1 formattype hash: 32 raw bytes are extracted fromH256structure and used directly
With the binary result converted from different rules, the syscall then applies the same steps as documented in Load Transaction Hash syscall to feed data into CKB VM.
Specifying an invalid source value here would immediately trigger a VM error, specifying an invalid index value here, would result in 1 as return value, denoting index is out of bound. Specifying any invalid field will also trigger VM error immediately. Otherwise the syscall would return 0 denoting success state. Specifying a field that doesn't not exist(such as type on a cell without type script) would result in 2 as return value, denoting the item is missing.
Load Input
Load Input syscall has a signature like following:
int ckb_load_input(void* addr, uint64_t* len, size_t offset, size_t index, size_t source){ return syscall(2073, addr, len, offset, index, source, 0);}The arguments used here are:
addr: the exact sameaddrpointer as used in Load Transaction Hash syscall.len: the exact samelenpointer as used in Load Transaction Hash syscall.offset: the exact sameoffsetvalue as used in Load Transaction Hash syscall.index: an index value denoting the index of inputs to read.source: a flag denoting the source of inputs to locate, possible values include:- 1: inputs.
- 2: outputs, note this is here to maintain compatibility of
sourceflag, when this value is used in Load Input By Field syscall, the syscall would always return2since output doesn't have any input fields. - 3: deps, when this value is used, the syscall will also always return
2since dep doesn't have input fields.
This syscall would locate a single input field in the current transaction based on source and index value, serialize the whole input into the Molecule Encoding 1 format, then use the same step as documented in Load Transaction Hash syscall to feed the serialized value into VM.
Specifying an invalid source value here would immediately trigger a VM error. Specifying a valid source that is not input, such as an output, would result in 2 as return value, denoting the item is missing. Specifying an invalid index value here would result in 1 as return value, denoting the index used is out of bound. Otherwise the syscall would return 0 denoting success state.
Note this syscall is only provided for advanced usage that requires hashing the whole input in a future proof way. In practice this might be a very expensive syscall since it requires serializing the whole input, in the case of a large input with huge arguments, this would mean a lot of memory copying. Hence CKB should charge much higher cycles for this syscall and encourage using Load Input By Field syscall below.
Load Input By Field
Load Input By Field syscall has a signature like following:
int ckb_load_input_by_field(void* addr, uint64_t* len, size_t offset, size_t index, size_t source, size_t field){ return syscall(2083, addr, len, offset, index, source, field);}The arguments used here are:
addr: the exact sameaddrpointer as used in Load Transaction Hash syscall.len: the exact samelenpointer as used in Load Transaction Hash syscall.offset: the exact sameoffsetvalue as used in Load Transaction Hash syscall.index: an index value denoting the index of inputs to read.source: a flag denoting the source of inputs to locate, possible values include:- 1: inputs.
- 2: outputs, note this is here to maintain compatibility of
sourceflag, when this value is used in Load Input By Field syscall, the syscall would always return2since output doesn't have any input fields. - 3: deps, when this value is used, the syscall will also always return
2since dep doesn't have input fields.
field: a flag denoting the field of the input to read, possible values include:- 0: args.
- 1: out_point.
- 2: since.
This syscall would first locate an input in current transaction via source and index value, it then extract the field (and serialize it if it's args or out_point into Molecule format), then use the same steps as documented in Load Transaction Hash syscall to feed data into VM.
Specifying an invalid source value here would immediately trigger a VM error, specifying an invalid index value here, however, would result in 1 as return value, denoting the index used is out of bound. Specifying any invalid field will also trigger VM error immediately. Otherwise the syscall would return 0 denoting success state.
Load Header
Load Header syscall has a signature like following:
int ckb_load_header(void* addr, uint64_t* len, size_t offset, size_t index, size_t source){ return syscall(2072, addr, len, offset, index, source, 0);}The arguments used here are:
addr: the exact sameaddrpointer as used in Load Transaction Hash syscall.len: the exact samelenpointer as used in Load Transaction Hash syscall.offset: the exact sameoffsetvalue as used in Load Transaction Hash syscall.index: an index value denoting the index of cells to read.source: a flag denoting the source of cells to locate, possible values include:- 1: input cells.
- 2: output cells.
- 3: dep cells.
This syscall would locate the header associated with an input or a dep OutPoint based on source and index value, serialize the whole header into Molecule Encoding 1 format, then use the same step as documented in Load Transaction Hash syscall to feed the serialized value into VM.
Specifying an invalid source value here would immediately trigger a VM error. Specifying output as source field would result in 2 as return value, denoting item missing state. Specifying an invalid index value here, however, would result in 1 as return value, denoting the index used is out of bound. Otherwise the syscall would return 0 denoting success state.
Load Code
Load Code syscall has a signature like following:
int ckb_code(void* addr, size_t memory_size, size_t content_offset, size_t content_size, size_t index, size_t source){ return syscall(2091, addr, memory_size, offset, content_size, index, source);}The arguments used here are:
addr: the starting memory address to load code intomemory_size: the size of memory buffer used to load codecontent_offset: the offset of data in cell data to start loading codecontent_size: the size of cell data to load as codeindex: an index value denoting the index of cells to read.source: a flag denoting the source of cellss to locate, possible values include:- 0: current input, in this case
indexvalue would be ignored since there's only one current input. - 1: inputs.
- 2: outputs, note this is here to maintain compatibility of
sourceflag, when this value is used in Load Input By Field syscall, the syscall would always return2since output doesn't have any input fields. - 3: deps, when this value is used, the syscall will also always return
2since dep doesn't have input fields.
- 0: current input, in this case
This syscall first locates a cell in current transaction via source and index, it then loads the specified cell data part as code into VM, then mark the specified memory location as executable. It can be used to implement dynamic linking in the case of CKB VM.
Note the provided memory buffer might be larger than the specified content size, in this case, more memory pages can be marked as executable as requested.
If the requested memory buffer overlaps with an existing executable memory page, the syscall would signal an error. Specifying an invalid range of memory buffer or an invalid range of cell data will also generate errors.
Debug
Debug syscall has a signature like following:
void ckb_debug(const char* s){ syscall(2177, s, 0, 0, 0, 0, 0);}This syscall accepts a null terminated string and prints it out as debug log in CKB. It can be used as a handy way to debug scripts in CKB. This syscall has no return value.
CKB Docs