Building a PHP extension in Rust
What are we building?
We are building a PHP extension that intercepts all non-internal function calls and records their types. This extension will provide insights into the types of function calls made by PHP applications, which can later be used to add type information to PHP applications.
We will build it in Rust using the ext-php-rs extension, making use of the Observer API introduced by Datadog
Setting up the environment
We will use rustup to install Rust and devenv.sh to set up the development environment.
You can take a look at [this]() commit, to check the devenv.nix file.
Building a hello world extension
ext-php-rs makes it straightforward to create an extension.
As can be seen in the docs, we can define a module as shown below using the
get_module function.
#![cfg_attr(windows, feature(abi_vectorcall))]
use ext_php_rs::prelude::*;
#[php_function]
pub fn type_runner(name: &str) -> String {
format!("Type runner: {}!", name)
}
#[php_module]
pub fn get_module(module: ModuleBuilder) -> ModuleBuilder {
println!("Hello, world!");
module.function(wrap_function!(type_runner))
}
rustThe #[php_function] annotation, enables this function to be called from php code. We should also register it
in get_module function.
Let’s build the extension and run it.
cargo buildshellWe can enable the extension, by editing the php.ini file as shown below.
extension=target/debug/libkut_type_runner.so
ini
Now let’s run the below code.
// test.php
<?php
var_dump(type_runner("hello"));
phpphp -c php.ini test.phpshellNow we should see hello printed to console.
Using the Observer API
Now that our basic extension is working, let’s use the Observer API
to capture types of arguments in all function calls. Using the Observer API, when a PHP process is started, an extension can register itself as a
function-call observer using the zend_observer_fcall_register function.
Let’s take a look at the zend_observer.h
typedef void (*zend_observer_fcall_begin_handler)(zend_execute_data *execute_data);
typedef void (*zend_observer_fcall_end_handler)(zend_execute_data *execute_data, zval *retval);
typedef struct _zend_observer_fcall_handlers {
zend_observer_fcall_begin_handler begin;
zend_observer_fcall_end_handler end;
} zend_observer_fcall_handlers;
/* If the fn should not be observed then return {NULL, NULL} */
typedef zend_observer_fcall_handlers (*zend_observer_fcall_init)(zend_execute_data *execute_data);
// Call during minit/startup ONLY
ZEND_API void zend_observer_fcall_register(zend_observer_fcall_init);cAs you can see zend_observer_fcall_register takes a function pointer to a function(zend_observer_fcall_init) that takes a zend_execute_data struct (information about called function)
and returns a zend_observer_fcall_handlers struct. The zend_observer_fcall_handlers struct contains two function pointers,
zend_observer_fcall_begin_handler and zend_observer_fcall_end_handler, which are called at the beginning and end of a function call, respectively.
This structure of API, which lets us look into a called function to decide whether to observe it or not, can improve performance by skipping all functions we are not interested in.
And finally, we can use the module startup step, also called as MINIT to register our observer.
Now that we know what to do, let’s implement it in Rust by mimicking the C API.
// Define zend_observer_fcall_handlers, which takes a begin and end handlers
#[repr(C)]
pub struct zend_observer_fcall_handlers {
pub begin: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data)>,
pub end: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data, retval: *mut zval)>,
}
// Define the zend_observer_fcall_init function
unsafe extern "C" fn zend_observer_fcall_init(_execute_data: *mut zend_execute_data) -> zend_observer_fcall_handlers {
zend_observer_fcall_handlers {
begin: Some(observer_begin),
// We simply skip the handler for function ending
end: None,
}
}
// Define zend_observer_fcall_register
unsafe extern "C" {
fn zend_observer_fcall_register(init: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data) -> zend_observer_fcall_handlers>);
}
// Define zend_observer_fcall_begin_handler
unsafe extern "C" fn observer_begin(execute_data: *mut zend_execute_data) {
println!("Function called")
}
RustNow the only thing, left is to call zend_observer_fcall_register from get_module (MINIT) function.
#![cfg_attr(windows, feature(abi_vectorcall))]
use ext_php_rs::prelude::*;
use ext_php_rs::ffi::{zend_execute_data, zval};
#[php_function]
pub fn type_runner(name: &str) -> String {
format!("Type runner: {}!", name)
}
unsafe extern "C" fn observer_begin(execute_data: *mut zend_execute_data) {
println!("Function called")
}
#[repr(C)]
pub struct zend_observer_fcall_handlers {
pub begin: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data)>,
pub end: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data, retval: *mut zval)>,
}
unsafe extern "C" fn zend_observer_fcall_init(_execute_data: *mut zend_execute_data) -> zend_observer_fcall_handlers {
zend_observer_fcall_handlers {
begin: Some(observer_begin),
end: None,
}
}
unsafe extern "C" {
fn zend_observer_fcall_register(init: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data) -> zend_observer_fcall_handlers>);
}
#[php_module]
pub fn get_module(module: ModuleBuilder) -> ModuleBuilder {
unsafe {
zend_observer_fcall_register(Some(zend_observer_fcall_init));
}
module.function(wrap_function!(type_runner))
}
RustLet’s also modify our test.php to add a few function calls for testing.
<?php
namespace Me;
class T {
function __construct() {
}
function test_function($arg1, $arg2) {
return "Inside test_function";
}
}
$t = new T();
$t->test_function("hello", 123);
$t->test_function($t, new \stdClass());
// This shouldn't be captured
$n = ltrim(" hello");
var_dump($n);
// Function we defined
var_dump(type_runner("hello"));
phpDoing a cargo build and php -c php.ini test.php, we see that our handlers are getting called :D
Function called
Function called
Function called
Function called
Function called
Function called
string(5) "hello"
Function called
Function called
string(19) "Type runner: hello!"
sql
Note: If you are interested in the design decisions regarding the Observer API, please take a look at this page to
Capturing argument types
Now that our handlers are getting called, let’s add logic to print the class name, function name and the types of the arguments. The code along with explanation as comments can be found below.
unsafe extern "C" fn observer_begin(execute_data: *mut zend_execute_data) {
/* The structure of zend_function can be found at https://github.com/php/php-src/blob/PHP-8.0/Zend/zend_compile.h#L484
union _zend_function {
zend_uchar type; /* MUST be the first element of this struct! */
uint32_t quick_arg_flags;
struct {
zend_uchar type; /* never used */
zend_uchar arg_flags[3]; /* bitset of arg_info.pass_by_reference */
uint32_t fn_flags;
zend_string *function_name;
zend_class_entry *scope;
zend_function *prototype;
uint32_t num_args;
uint32_t required_num_args;
zend_arg_info *arg_info; /* index -1 represents the return value info, if any */
HashTable *attributes;
} common;
zend_op_array op_array;
zend_internal_function internal_function;
};
*/
let func = unsafe { (*execute_data).func };
if func.is_null() {
return;
}
// Only capture non-standard functions (ZEND_USER_FUNCTION = 2)
let type_ = unsafe { (*func).type_ };
if type_ != 2 {
return;
}
/*
https://github.com/php/php-src/blob/e4098da58a9eaee759d728d98a27d809cde37671/Zend/zend.h#L147
struct _zend_class_entry {
char type;
zend_string *name;
/* class_entry or string depending on ZEND_ACC_LINKED */
union {
zend_class_entry *parent;
zend_string *parent_name;
};
.....
*/
let class_name = unsafe {
let scope = (*func).common.scope;
if !scope.is_null() {
let class_name_ptr = (*scope).name;
if !class_name_ptr.is_null() {
Some(
CStr::from_ptr((*class_name_ptr).val.as_ptr() as *const _)
.to_string_lossy()
.into_owned(),
)
} else {
None
}
} else {
None
}
};
let func_name_ptr = unsafe { (*func).common.function_name };
if func_name_ptr.is_null() {
return;
}
// Get function name
let name = unsafe {
CStr::from_ptr((*func_name_ptr).val.as_ptr() as *const _)
.to_string_lossy()
.into_owned()
};
// Dont trace our function
if name == "type_runner" {
return;
}
// Get the number of arguments
/*
https://github.com/php/php-src/blob/PHP-8.0/Zend/zend_compile.h#L505
struct _zend_execute_data {
const zend_op *opline; /* executed opline */
zend_execute_data *call; /* current call */
zval *return_value;
zend_function *func; /* executed function */
zval This; /* this + call_info + num_args */
zend_execute_data *prev_execute_data;
zend_array *symbol_table;
void **run_time_cache; /* cache op_array->run_time_cache */
zend_array *extra_named_params;
};
*/
let num_args = unsafe { (*execute_data).This.u2.num_args };
let mut args = Vec::new();
// The arguments start right after the zend_execute_data structure on the stack
let first_arg_ptr = unsafe {execute_data.add(1) as *mut zval};
for i in 0..num_args {
// Move the pointer forward by exactly 1 zend_execute_data unit,
// then treat that memory location as a zval.
let arg_ptr = unsafe { first_arg_ptr.add(i as usize)};
// https://github.com/php/php-src/blob/master/Zend/zend_types.h#L344
let val = unsafe { &*(arg_ptr as *const Zval) };
args.push(zval_to_string(val));
}
let msg = if let Some(class_name2) = class_name {
format!("Intercepted call to {}::{}: args={:?}\n", class_name2, &name, args)
} else {
format!("Intercepted call to {}: args={:?}\n", &name, args)
};
print!("{}", msg);
}
// Since we are only interested in types of the arguments, we only check types and return it, without look at contents.
pub fn zval_to_string(zv: &Zval) -> String {
// https://github.com/php/php-src/blob/master/Zend/zend_types.h#L1069
// https://github.com/php/php-src/blob/master/Zend/zend_types.h#L609
match zv.get_type() {
DataType::Undef => "undefined".to_string(),
DataType::Null => "null".to_string(),
DataType::False => "bool".to_string(),
DataType::True => "bool".to_string(),
DataType::Long => "long".to_string(),
DataType::Double => "double".to_string(),
DataType::String => "string".to_string(),
DataType::Array => "array".to_string(),
DataType::Object(_) => zv
.object()
.and_then(|obj| obj.get_class_name().ok())
.unwrap_or_else(|| "object".to_string()),
DataType::Resource => "resource".to_string(),
DataType::Reference => "reference".to_string(),
DataType::Indirect => "indirect".to_string(),
DataType::Callable => "callable".to_string(),
DataType::ConstantExpression => "constant expression".to_string(),
DataType::Void => "void".to_string(),
DataType::Bool => "bool".to_string(),
DataType::Ptr => "pointer".to_string(),
DataType::Iterable => "iterable".to_string(),
_ => "unknown".to_string(),
}
}
RustDoing a cargo build and php -c php.ini test.php, we can see our argument types as can be seen below.
Intercepted call to Me\T::__construct: args=[]
Intercepted call to Me\T::test_function: args=["string", "long"]
Intercepted call to Me\T::test_function: args=["Me\\T", "stdClass"]
string(5) "hello"
string(19) "Type runner: hello!"
vbnet
Final Words
With this post, we used ext-php-rs to create a simple extension that uses the Observer API to capture types of arguments in all function calls.
You can find the repo here.
Happy Hacking!