Building a PHP extension in Rust

What are we building?

We are building a PHP extension that intercepts all non-internal function calls and records their types. This extension will provide insights into the types of function calls made by PHP applications, which can later be used to add type information to PHP applications.

We will build it in Rust using the ext-php-rs extension, making use of the Observer API introduced by Datadog

Setting up the environment

We will use rustup to install Rust and devenv.sh to set up the development environment. You can take a look at [this]() commit, to check the devenv.nix file.

Building a hello world extension

ext-php-rs makes it straightforward to create an extension. As can be seen in the docs, we can define a module as shown below using the get_module function.

#![cfg_attr(windows, feature(abi_vectorcall))]
use ext_php_rs::prelude::*;

#[php_function]
pub fn type_runner(name: &str) -> String {
    format!("Type runner: {}!", name)
}

#[php_module]
pub fn get_module(module: ModuleBuilder) -> ModuleBuilder {
    println!("Hello, world!");
    module.function(wrap_function!(type_runner))
}
rust

The #[php_function] annotation, enables this function to be called from php code. We should also register it in get_module function.

Let’s build the extension and run it.

cargo build
shell

We can enable the extension, by editing the php.ini file as shown below.

extension=target/debug/libkut_type_runner.so
ini

Now let’s run the below code.

// test.php
<?php
var_dump(type_runner("hello"));
php
php -c php.ini test.php
shell

Now we should see hello printed to console.

Using the Observer API

Now that our basic extension is working, let’s use the Observer API to capture types of arguments in all function calls. Using the Observer API, when a PHP process is started, an extension can register itself as a function-call observer using the zend_observer_fcall_register function. Let’s take a look at the zend_observer.h

typedef void (*zend_observer_fcall_begin_handler)(zend_execute_data *execute_data);
typedef void (*zend_observer_fcall_end_handler)(zend_execute_data *execute_data, zval *retval);

typedef struct _zend_observer_fcall_handlers {
	zend_observer_fcall_begin_handler begin;
	zend_observer_fcall_end_handler end;
} zend_observer_fcall_handlers;

/* If the fn should not be observed then return {NULL, NULL} */
typedef zend_observer_fcall_handlers (*zend_observer_fcall_init)(zend_execute_data *execute_data);

// Call during minit/startup ONLY
ZEND_API void zend_observer_fcall_register(zend_observer_fcall_init);
c

As you can see zend_observer_fcall_register takes a function pointer to a function(zend_observer_fcall_init) that takes a zend_execute_data struct (information about called function) and returns a zend_observer_fcall_handlers struct. The zend_observer_fcall_handlers struct contains two function pointers, zend_observer_fcall_begin_handler and zend_observer_fcall_end_handler, which are called at the beginning and end of a function call, respectively. This structure of API, which lets us look into a called function to decide whether to observe it or not, can improve performance by skipping all functions we are not interested in. And finally, we can use the module startup step, also called as MINIT to register our observer.

Now that we know what to do, let’s implement it in Rust by mimicking the C API.

// Define zend_observer_fcall_handlers, which takes a begin and end handlers
#[repr(C)]
pub struct zend_observer_fcall_handlers {
    pub begin: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data)>,
    pub end: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data, retval: *mut zval)>,
}

// Define the zend_observer_fcall_init function
unsafe extern "C" fn zend_observer_fcall_init(_execute_data: *mut zend_execute_data) -> zend_observer_fcall_handlers {
    zend_observer_fcall_handlers {
        begin: Some(observer_begin),
        // We simply skip the handler for function ending
        end: None,
    }
}

// Define zend_observer_fcall_register
unsafe extern "C" {
    fn zend_observer_fcall_register(init: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data) -> zend_observer_fcall_handlers>);
}

// Define zend_observer_fcall_begin_handler
unsafe extern "C" fn observer_begin(execute_data: *mut zend_execute_data) {
    println!("Function called")
}
Rust

Now the only thing, left is to call zend_observer_fcall_register from get_module (MINIT) function.

#![cfg_attr(windows, feature(abi_vectorcall))]
use ext_php_rs::prelude::*;
use ext_php_rs::ffi::{zend_execute_data, zval};

#[php_function]
pub fn type_runner(name: &str) -> String {
    format!("Type runner: {}!", name)
}

unsafe extern "C" fn observer_begin(execute_data: *mut zend_execute_data) {
   println!("Function called")
}

#[repr(C)]
pub struct zend_observer_fcall_handlers {
    pub begin: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data)>,
    pub end: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data, retval: *mut zval)>,
}

unsafe extern "C" fn zend_observer_fcall_init(_execute_data: *mut zend_execute_data) -> zend_observer_fcall_handlers {
    zend_observer_fcall_handlers {
        begin: Some(observer_begin),
        end: None,
    }
}

unsafe extern "C" {
    fn zend_observer_fcall_register(init: Option<unsafe extern "C" fn(execute_data: *mut zend_execute_data) -> zend_observer_fcall_handlers>);
}

#[php_module]
pub fn get_module(module: ModuleBuilder) -> ModuleBuilder {
    unsafe {
        zend_observer_fcall_register(Some(zend_observer_fcall_init));
    }
    module.function(wrap_function!(type_runner))
}
Rust

Let’s also modify our test.php to add a few function calls for testing.

<?php

namespace Me;

class T {
    function __construct() {
    }
    function test_function($arg1, $arg2) {
        return "Inside test_function";
    }
}

$t = new T();
$t->test_function("hello", 123);
$t->test_function($t, new \stdClass());

// This shouldn't be captured
$n = ltrim(" hello");
var_dump($n);
// Function we defined
var_dump(type_runner("hello"));
php

Doing a cargo build and php -c php.ini test.php, we see that our handlers are getting called :D

Function called
Function called
Function called
Function called
Function called
Function called
string(5) "hello"
Function called
Function called
string(19) "Type runner: hello!"
sql

Note: If you are interested in the design decisions regarding the Observer API, please take a look at this page to

Capturing argument types

Now that our handlers are getting called, let’s add logic to print the class name, function name and the types of the arguments. The code along with explanation as comments can be found below.

unsafe extern "C" fn observer_begin(execute_data: *mut zend_execute_data) {
    /* The structure of zend_function can be found at https://github.com/php/php-src/blob/PHP-8.0/Zend/zend_compile.h#L484
    union _zend_function {
	    zend_uchar type;	/* MUST be the first element of this struct! */
	    uint32_t   quick_arg_flags;

	    struct {
	    	zend_uchar type;  /* never used */
	    	zend_uchar arg_flags[3]; /* bitset of arg_info.pass_by_reference */
	    	uint32_t fn_flags;
	    	zend_string *function_name;
	    	zend_class_entry *scope;
	    	zend_function *prototype;
	    	uint32_t num_args;
	    	uint32_t required_num_args;
	    	zend_arg_info *arg_info;  /* index -1 represents the return value info, if any */
	    	HashTable   *attributes;
	    } common;

	    zend_op_array op_array;
	    zend_internal_function internal_function;
    }; 
    */
    let func = unsafe { (*execute_data).func };
    if func.is_null() {
        return;
    }

    // Only capture non-standard functions (ZEND_USER_FUNCTION = 2)
    let type_ = unsafe { (*func).type_ };
    if type_ != 2 {
        return;
    }

    /*
    https://github.com/php/php-src/blob/e4098da58a9eaee759d728d98a27d809cde37671/Zend/zend.h#L147
    struct _zend_class_entry {
	    char type;
	    zend_string *name;
	    /* class_entry or string depending on ZEND_ACC_LINKED */
	    union {
	    	zend_class_entry *parent;
	    	zend_string *parent_name;
	    };
	    .....
    */
    let class_name = unsafe {
        let scope = (*func).common.scope;
        if !scope.is_null() {
            let class_name_ptr = (*scope).name;
            if !class_name_ptr.is_null() {
                Some(
                    CStr::from_ptr((*class_name_ptr).val.as_ptr() as *const _)
                        .to_string_lossy()
                        .into_owned(),
                )
            } else {
                None
            }
        } else {
            None
        }
    };

    let func_name_ptr = unsafe { (*func).common.function_name };
    if func_name_ptr.is_null() {
        return;
    }

    // Get function name
    let name = unsafe {
        CStr::from_ptr((*func_name_ptr).val.as_ptr() as *const _)
            .to_string_lossy()
            .into_owned()
    };

    // Dont trace our function
    if name == "type_runner" {
        return;
    }

    // Get the number of arguments
    /*
    https://github.com/php/php-src/blob/PHP-8.0/Zend/zend_compile.h#L505
    struct _zend_execute_data {
	    const zend_op       *opline;           /* executed opline                */
	    zend_execute_data   *call;             /* current call                   */
	    zval                *return_value;
	    zend_function       *func;             /* executed function              */
	    zval                 This;             /* this + call_info + num_args    */
	    zend_execute_data   *prev_execute_data;
	    zend_array          *symbol_table;
	    void               **run_time_cache;   /* cache op_array->run_time_cache */
	    zend_array          *extra_named_params;
    };
    */
    let num_args = unsafe { (*execute_data).This.u2.num_args };
    let mut args = Vec::new();

    // The arguments start right after the zend_execute_data structure on the stack
    let first_arg_ptr = unsafe {execute_data.add(1) as *mut zval};
    for i in 0..num_args {
        // Move the pointer forward by exactly 1 zend_execute_data unit,
        // then treat that memory location as a zval.
        let arg_ptr = unsafe { first_arg_ptr.add(i as usize)};
        // https://github.com/php/php-src/blob/master/Zend/zend_types.h#L344
        let val = unsafe { &*(arg_ptr as *const Zval) };
        args.push(zval_to_string(val));
    }

    let msg = if let Some(class_name2) = class_name {
        format!("Intercepted call to {}::{}: args={:?}\n", class_name2, &name, args)
    } else {
        format!("Intercepted call to {}: args={:?}\n", &name, args)
    };

    print!("{}", msg);
}

// Since we are only interested in types of the arguments, we only check types and return it, without look at contents.
pub fn zval_to_string(zv: &Zval) -> String {
    // https://github.com/php/php-src/blob/master/Zend/zend_types.h#L1069
    // https://github.com/php/php-src/blob/master/Zend/zend_types.h#L609
    match zv.get_type() {
        DataType::Undef => "undefined".to_string(),
        DataType::Null => "null".to_string(),
        DataType::False => "bool".to_string(),
        DataType::True => "bool".to_string(),
        DataType::Long => "long".to_string(),
        DataType::Double => "double".to_string(),
        DataType::String => "string".to_string(),
        DataType::Array => "array".to_string(),
        DataType::Object(_) => zv
            .object()
            .and_then(|obj| obj.get_class_name().ok())
            .unwrap_or_else(|| "object".to_string()),
        DataType::Resource => "resource".to_string(),
        DataType::Reference => "reference".to_string(),
        DataType::Indirect => "indirect".to_string(),
        DataType::Callable => "callable".to_string(),
        DataType::ConstantExpression => "constant expression".to_string(),
        DataType::Void => "void".to_string(),
        DataType::Bool => "bool".to_string(),
        DataType::Ptr => "pointer".to_string(),
        DataType::Iterable => "iterable".to_string(),
        _ => "unknown".to_string(),
    }
}
Rust

Doing a cargo build and php -c php.ini test.php, we can see our argument types as can be seen below.

Intercepted call to Me\T::__construct: args=[]
Intercepted call to Me\T::test_function: args=["string", "long"]
Intercepted call to Me\T::test_function: args=["Me\\T", "stdClass"]
string(5) "hello"
string(19) "Type runner: hello!"
vbnet

Final Words

With this post, we used ext-php-rs to create a simple extension that uses the Observer API to capture types of arguments in all function calls. You can find the repo here.

Happy Hacking!