How to Create a Custom Generator Module

This guide will walk through the method for creating a custom parseLab Generator module. Creating a custom generator module can be an exhaustive task depending on the target backend, so for this case we will only work with a sub-set of the available features exposed by the parseLab framework. It is recommended that you read through the generator module for the Hammer library as a reference for your own generator module.

What is a Custom Generator Module?

The goal of parseLab is to provide a framework for developers to build a code generator that can generate source code for a target parsing library/language. In the case of the Hammer based generator module, it can generate source code in the C langauge which generates logic for parsing byte sequences using the Hammer parsing library. Anther example is the Daedalus based generator module, it can generate source code in the Daedalus Data Description Language which is used also for parsing byte sequences.

The generator module has three main responsibilities:

Generating a parser - When supplied information about the protocol, a generator module should be able to output source code for the parsing rules for parsing the messages in the target protocol.
Generating a test - When supplied a series of test data, a generator module should be able to output source code for consuming test data and passing it through a parser in the target backend.
Running a test - When supplied a test, a generator module should know how to compile/execute a test file for observation by the user.

Generator Module Interface

Since every implementation of a parseLab generator module will be different, the definition of a module is fairly loose. Every generator module will be a derrived class of the ParselabGenerator class which defines the interface that the custom modules must ahdere to. It is important that these generator modules align with the parent class' interface because we use common driver scripts to dynamically load the target module and run the expected functions and parse expected output.

Along with needing to conform to an interface requirement, there is also universal data that gets passed into each of the generator modules. This universal data is the information stored in parseLab which defines the properties of the protocol that will be targetted by the parseLab driver scripts.

Creating a Generator Module

Now that we have explored what a generator module does and the basics of how to implement it, we will create another generator for the Hammer library, but this time we will use the python bindings rather than the C library.

We will not go over how to install Hammer or the python bindings here, but please look at the install guide for help on Hammer, then look at the Hammer repo for instructions for setting up the python bindings.

Setup Logic

First we need to generate a generator module, which gets generated by a driver script, while using a boilerplate for generating new modules. I will call this new generator pyHammer.

# Go to bin directory of parseLab
cd ${PARSELAB_TOP}/bin

# Run the create_generator script and specify the name of the new module
./create_generator.py --name pyHammer

# Observe the new directory and its files
ls ../generators/pyHammer
> __init__.py  PyhammerGenerator.py  setup_data/

We can see here that this created a couple new files and a new sub directory.

The __init__.py can be ignored; the PyhammerGenerator.py is our main focus now. We will discuss the setup_data/ directory later on in the guide.

Take note of the name change, pyHammer -> PyhammerGenerator. When creating a generator, we will modify the name such that the generated class follows a CamelCase style name with Generator appended to it. All custom modules should be formatted in this structure for best usage of parseLab.

If we open the PyhammerGenerator.py, we can see that there is already a class and some logic defined in it. We'll first take a look at the __init__() function.

class PyhammerGenerator(ParselabGenerator):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.backend_name = "pyHammer"
        self.protocol_name = os.path.basename(self.protocol_dir)

We immediately notice that this code is generated based off the name, pyHammer, which we supplied to the create_generator.py script. Another thing is that we can see now that our custom module is a child class of the ParselabGenerator module. In the __init__(), we are calling out to the super class' __init__() function... Lets check that out now.

class ParselabGenerator:
    '''This is the Abstract class which all parseLab generator modules are to derrive from.
    Adhering to the interface provided here will allow your generator module to leverage all of the capablities found
    within parseLab.'''
    def __init__(self, protocol_dir=None, is_stateful=True, logger=None, debug_mode=False):
        self.log = logger
        if self.log is None:
            self.log = ParselabLogger(print_logs=False)
        self.spec_data = None
        self.debug_mode = debug_mode
        self.backend_name = ''
        self.is_stateful = is_stateful
        self.protocol_dir = ''
        self.set_protocol_directory(protocol_dir)

self.log - This is essentially a wrapper around python's logging library, you can find the code for this here
self.spec_data - spec_data is an object of ProtocolSpecData type, which contains all of the information for the message types defined in the protocol directory. If you are unsure about what a protocol directory is, please refer to the guide for making a protocol directory for UDP
self.debug_mode - This is a boolean flag to tell the generator module whether or not we want the generated parser/test to provide a more verbose output. Implementation of this is completely user-driven
self.backend_name - This is a string that can be used by your generator module to reference itself by name, should it need to do that
self.is_stateful - This is a flag to define whether or not the target protocol is stateful or not. The value of this should inform the generator module if it needs to consider semantic constaints on sequences of messages or not. We will not get into that here.
self.set_protocol_directory() - This will inform the generator module what the target protocol directory is; this is useful because the protocol directory is essentially the working space for the generator modules to place all their generated information for the target protocol specification.
self.protocol_dir - This is a string for the path to the target protocol specification directory.

With this established, we understand the member variables that we have access to when we are working with our custom generator module, PyhammerGenerator.

The rest of the information in the ParselabGenerator module can be ignored since we are going to be overloading the functions anyway. In fact, we will now start that process with the get_setup_directory() method.

class PyhammerGenerator(ParselabGenerator):
    # ...

    @staticmethod
    def get_setup_directory():
        raise NotImplementedError()

This function is going to return a path to a directory that contains any necessary files that we might have for generating the source code for our target backend. For example, in the C-based hammer generator, we have a Makefile which is a generic makefile that can be used to compile the generated code created by the C-based hammer generator module. In this case, we can start with an empty file, but we still need to point to something here, even if it is to an empty directory.

class PyhammerGenerator(ParselabGenerator):
    # ...

    @staticmethod
    @def get_setup_directory():
        this_file_filepath = os.path.realpath(__file__)
        this_file_dirpath = os.path.dirname(this_file_filepath)
        setup_data_dirpath = os.path.join(this_file_dirpath, 'setup_data')
        return setup_data_dirpath

Now that we have added these lines, we can now use driver scripts which leverage it. The driver script we will use first is the setup.py driver in the bin/ directory of parseLab. We will first make sure that parseLab has identified your new generator module. By passing --list into almost any of the drivers, it will list out all of the available generator modules along with a list of shorthand strings that can be used when running with the --module argument.

# Go to parselab/bin
cd ${PARSELAB_TOP}/bin

./setup.py --list

If you are following this guide with pyHammer, you should see the following line:

generators.pyHammer.PyhammerGenerator ['PyhammerGenerator', 'Pyhammer', 'P', 'pyhammer', 'p']

Now lets use this generator module with the setup.py driver script. To do this, we will use the --module argument with a value of pyhammer. As shown in the --list output, we could have used any of the following:

--module generators.pyHammer.PyhammerGenerator
--module PyhammerGenerator
--module Pyhammer
--module P
--module pyhammer
--module p

However, the --module argument is not the only required argument for this driver. We also need a --protocol argument. If you don't understand what gets passed in with the --protocol argument yet, please walk through the guide for generating a protocol specification for UDP. For this driver, and the rest of this guide, we will use a UDP specification as our target protocol.

./setup.py --module pyhammer --protocol ../protocols/udp
> Created log file: /tmp/parselab/pasrelab_1682448661.log
> Created directory structure for parseLab module (PyhammerGenerator) into (/home/parsival/parselab/protocols/udp)

Upon running our driver script, we can see that there were two lines printed. The reference to the log file is just telling the user where it can find a log of all the actions that occured during the driver's execution. The reference to the new directory structure is actually the creation of a new protocol directory that contains the starting files for working with UDP, and specifically with pyHammer. If we were to ls on this new directory, we would see that we created a couple .json files and a new directory pyHammer

NOTE: If you already ran through the guide on creating a udp protocol specification, this directory will already exist, and this step should simply create the pyHammer/ directory inside of it

ls ../protocols/udp
> protocol.json  pyHammer

For now, we will ignore what each of these things are. If you would like to learn more about the contents of the protocol directory, please look into the guide for generating a protocol specification for UDP.

If this successfully generated a new directory ../protocols/udp and a subdirectory pyHammer, we are off to a good start.

Since this is not a guide on how to create a protocol specification, I am not going to go through how to create or modify the protocol.json file. If you already completed the UDP protocol specification gude, then you can ignore this next step because it is a shortcut for some of the steps in that guide.

We will now copy the json file from parselab/examples/protocol.json into our protocol.json in the new udp/ directory.

# Go to parseLab/bin
cd ${PARSELAB}/bin

cp ../examples/udp/protocol.json ../protocols/udp/protocol.json

Doing this, allows parseLab to process the protocol.json file with information about our UDP protocol. Now that we have a protocol specification for our generator to generate code for... We need to actually make our generator module capable of generating code.

Parser Generation Logic

To do any parser generation, we will go back into the generators/pyHammer/PyhammerGenerator.py script. Our new focus is on the generate_parser(self) method.

class PyhammerGenerator(ParselabGenerator):
    # ...

    def generate_parser(self):
        self.log.info("Generating a %s parser" % (self.backend_name))

        if self.spec_data is None:
            err_msg = "There is no spec data set! Cannot generate a parser without spec data."
            self.log.error(err_msg)
            raise AttributeError(err_msg)

        # TODO: Place necessary logic to generate a parser in your target Data Description Language
        # TODO: Make sure to return a list of the files which you want to verify the creation of during testing
        raise NotImplementedError()

    # ...

We can see that there is already a little bit of logic in here. This logic just guarantees that the driver successfully populated our generator with the specification data found in our target protocol directory. You can keep this logic.

The logic in the rest of this the guide is NOT efficient and should be treated as only a simple guide rather than explaining good practices for code generation. Similarly, we will not even expose all of the functionality of parseLab - only a subset of capabilities, for simplicity.

Remember, this is implemented as an interface, which is essentially a contract that is employed between the driver and module which state the driver functions will have a particular set of inputs, and a known format for outputs. Because of this, the interface expects our function to return specific variables, rather than throwing the NotImplementedError().

The output of this function is going to be a list of the filepaths to the files that we generated with the function. I'm going to first create a variable named ret_files and declare it as an empty list, then just return it to make the interface happy.

class PyhammerGenerator(ParselabGenerator):
    # ...

    def generate_parser(self):
        self.log.info("Generating a %s parser" % (self.backend_name))

        if self.spec_data is None:
            err_msg = "There is no spec data set! Cannot generate a parser without spec data."
            self.log.error(err_msg)
            raise AttributeError(err_msg)

        ret_files = list()


        return ret_files
    # ...

Since this is the only obligation of the generate_parser() function, technically, we can run any drivers that want to run this function. Obviously, nothing is happening, but we can run the generate_parser.py driver to verify that we have "held our end of the contract", so to speak.

# Go to parselab/bin
cd ${PARSELAB}/bin

# Run the generate_parser driver
./generate_parser.py --module pyhammer --protocol ../protocols/udp
> #Log line
> Generated a parser for supplied module (generators.pyHammer.PyhammerGenerator)
> Generated files: []

We can see here that the driver script ran successfully, and as we expected, nothing actually happened - no files were created. Now that we know we are upholding the contract with the interface, lets move into writing the logic for our generate_parser function.

Since we are working to create a Hammer parser, written in python, we are going to need to start writing some python code. We have a lot of options here for how we want to structure this parser file, but I am going to write it as a function without a call to it. This means that we wont be able to directly run this module, but we can import the function in an executable import later.

I'll start by declaring a string variable that we will iteratively add content to which contains all of the info for the parser file. Then I'll throw in some basic strings to start it off.

class PyhammerGenerator(ParselabGenerator):
    # ...

    def generate_parser(self):
        self.log.info("Generating a %s parser" % (self.backend_name))

        if self.spec_data is None:
            err_msg = "There is no spec data set! Cannot generate a parser without spec data."
            self.log.error(err_msg)
            raise AttributeError(err_msg)

        ret_files = list()

        parser_text = ''
        parser_text += 'import hammer\n\n'
        parser_text += 'def init_{protocol_name}_parser():\n'.format(protocol_name=self.protocol_name)

        print(parser_text)

        return ret_files
    # ...

Instead of generating a file with the contents of the parser_text variable, I instead print out the variable. This is a nice way of testing without having to deal with the overhead of generating a file just yet. Now, if we run the generate_parser.py driver, it will print out the contents of parser_text

# Go to parselab/bin
cd {PARSELAB}/bin

# Run the generate_parser.py driver
./generate_parser.py --protocol ../protocols/udp --module pyhammer
> # Log line print statement...
> import hammer
>
> def init_udp_parser():
>
> # Driver print statements...

See how we can observe the contents of our variable by running the driver script? It is much easier than having to write to a file, then cat out the file. We will use this method until we are comfortable enough that the file could actually be used.

We are now going to take a step back and reformat it to be a little easier to modify as we continue.

    def generate_parser(self):
        self.log.info("Generating a %s parser" % (self.backend_name))

        if self.spec_data is None:
            err_msg = "There is no spec data set! Cannot generate a parser without spec data."
            self.log.error(err_msg)
            raise AttributeError(err_msg)

        # List of filepaths that we have generated with this function
        ret_files = list()

        # String to hold all of the lines of code that we will write our to the source code file
        parser_text = ''

        # list of imports that can be imported with "import <x>"
        simple_imports = ['hammer']
        # list of imports that can be imported with "from <x> import <y>, where each element
        #   is a tuple(x, y)
        from_imports = list()


        # Append the import information to the parser text
        for simple_import in simple_imports:
            parser_text += 'import %s\n' % (simple_import)
        parser_text += '\n'
        for from_import in from_imports:
            parser_text += 'from %s import %s' % (from_import[0], from_import[1])
        parser_text += '\n'
        
        parser_func = ''
        parser_func += 'def init_%s_parser():\n' % (self.protocol_name)

        parser_text += parser_func
        parser_text += '\n'

        print(parser_text)

        return ret_files

This new structure is a little more complicated, but it should generate the same code. Run the generate_parser.py driver again to verify.

Now that we have a decent structure going on, lets talk about the ParselabLogger. I'm going to add a few log lines that I think might be helpful during debugging in the future.

    def generate_parser(self):
        self.log.info("Generating a %s parser" % (self.backend_name))

        if self.spec_data is None:
            err_msg = "There is no spec data set! Cannot generate a parser without spec data."
            self.log.error(err_msg)
            raise AttributeError(err_msg)

        # List of filepaths that we have generated with this function
        ret_files = list()

        # String to hold all of the lines of code that we will write our to the source code file
        parser_text = ''

        # list of imports that can be imported with "import <x>"
        simple_imports = ['hammer']
        # list of imports that can be imported with "from <x> import <y>, where each element
        #   is a tuple(x, y)
        from_imports = list()

        # Append the import information to the parser text
        if len(simple_imports) == 0:
            self.log.err("There are no simple imports; Must at least import hammer!")
        for simple_import in simple_imports:
            parser_text += 'import %s\n' % (simple_import)
        parser_text += '\n'

        if len(from_imports) == 0:
            self.log.warn("There are no from imports")
        for from_import in from_imports:
            parser_text += 'from %s import %s' % (from_import[0], from_import[1])
        parser_text += '\n'
        
        parser_func = ''
        parser_func += 'def init_%s_parser():\n' % (self.protocol_name)

        parser_text += parser_func
        parser_text += '\n'

        self.log.info("Completed generation of parser code")
        print(parser_text)

        return ret_files

If we run the generate_parser.py driver again, we wont be able to see these new log lines. We need to pass in the --print argument to the driver for our log to show up in the console.

If we do that, we can observe our new print statements are shown:

(WARN) [generators.pyHammer.PyhammerGenerator::generate_parser:39] There are no from imports
(INFO) [generators.pyHammer.PyhammerGenerator::generate_parser:50] Completed generation of parser code

The format is fairly straight forward. The first element is either (INFO), (WARN), or (ERRO). The second element defines where exactly the log line comes from. This is in the form of [path.to.module::function_name:line_in_file]. Lastly is the string that was placed in the info(), warn(), or err() function call.

What were going to do now is define a new function in our generator module which is responsible for building up the parse rules for all of the message types found in the protocol specification. We are going to replace the hard-coded declaration of the parser_func string with this method, then append its results to the parser_text

    def generate_parser(self):
        self.log.info("Generating a %s parser" % (self.backend_name))

        if self.spec_data is None:
            err_msg = "There is no spec data set! Cannot generate a parser without spec data."
            self.log.error(err_msg)
            raise AttributeError(err_msg)

        # List of filepaths that we have generated with this function
        ret_files = list()

        # String to hold all of the lines of code that we will write our to the source code file
        parser_text = ''

        # list of imports that can be imported with "import <x>"
        simple_imports = ['hammer']
        # list of imports that can be imported with "from <x> import <y>, where each element
        #   is a tuple(x, y)
        from_imports = list()

        # Append the import information to the parser text
        if len(simple_imports) == 0:
            self.log.err("There are no simple imports; Must at least import hammer!")
        for simple_import in simple_imports:
            parser_text += 'import %s\n' % (simple_import)
        parser_text += '\n'

        if len(from_imports) == 0:
            self.log.warn("There are no from imports")
        for from_import in from_imports:
            parser_text += 'from %s import %s' % (from_import[0], from_import[1])
        parser_text += '\n'
        
        parser_text += self.__generate_parser_func()
        parser_text += '\n'

        self.log.info("Completed generation of parser code")
        print(parser_text)

        return ret_files

    # NEW PRIVATE FUNCTION (__ before a function denotes private function in python)
    def __generate_parser_func(self):
        parser_func = ''

        # Create function definition
        parser_func += 'def init_%s_parser():\n' % (self.protocol_name)

        return parser_func

If we run this again with the generate_parser.py driver, we should still get the same output.

Now lets take a look at how Hammer's python library handles the creation of parsers. It is important to have a good grasp of this, so that we can understand how to take the protocol specification and generate the parsers for each message and subsequently each field in the messages.

For simplicity, we are going to only make this generator capable of the limited features required to make a UDP parser with Hammer's python library. So lets investigate all of the things we will need to do...

{
    "protocol_types": [
        {
            "name": "UDP_MESSAGE",
            "fields" : [
                {
                    "name": "SRC_PORT",
                    "type": "U16"
                },
                {
                    "name": "DEST_PORT",
                    "type": "U16"
                },
                {
                    "name": "LENGTH",
                    "type": "U16",
                    "value": "(1,512)",
                    "dependee": true
                },
                {
                    "name": "CHECKSUM",
                    "type": "U16"
                },
                {
                    "name": "DATA",
                    "type": "U8[LENGTH]"
                },
            ]
        }
    ]
}

"type": "U16" - parse rule for value-ambiguous fields of type unsigned 16-bit
"type": "U8" - parse rule for value-ambiguous fields of type unsigned 8-bit integers
"type": "U8[LENGTH]" - parse rule for array parsing
"type": "U8[LENGTH]" - parse rule for distant-dependent fields
"value": "(1,512)" - parse rule for constraining a value between two integers

Lets go through each one and understand how the Hammer python library resolves each parser requirement

UDP Requirement	Hammer Rule
"type": "U16"	`hammer.uint16()`
"type": "U8"	`hammer.uint16()`
"type": "U8[LENGTH"]	NOT POSSIBLE
"value": "(1,512)"	`hammer.int_range(1, 512)`

Unfortunately, it seems as though the python binding for Hammer does not support dependent array length definitions. However, that is not a show-stopper for parseLab. The generator modules are not required to fulfill every capacility of parseLab. This just means that the generated parser that we create will be not as robust as we would have liked. We are lucky that the data field is placed at the end of the format because it allows us to get around this issue semi-decently. We will replace this array with a repeating stream of unsigned 8-bit integers that will only stop at the end of the input stream.

With that out of the way, we can begin building our parser generator which is capable of stringing together parse rules of these types. We will go back to working on the PyhammerGenerator::__generate_parser_func() function.

from src.utils.Value import ValueRange

# ...

def __generate_parser_func(self):
    parser_func = ''

    # Create function definition
    parser_func += 'def init_%s_parser():\n' % (self.protocol_name)

    # Make final parse rule 
    msg_parser_names = [msg_type.name + '_parser' for msg_type in self.spec_data.message_types]
    protocol_parser_rule_name = '%s_parser' % (self.protocol_name)
    protocol_parser_rule = '%s = hammer.sequence(%s)' % (protocol_parser_rule_name, \
                                                          ', '.join(msg_parser_names))
    message_parser_rules = list()
    field_parser_rules = list()
    ## Iterate through the message types specified in the spec_data
    for msg_type in self.spec_data.message_types:
        fld_rules = list()
        # Iterate over the fields in this message type
        for field in msg_type.fields:
            dtype = field.dtype
            size = dtype.get_size_in_bits()
            signed_prefix = 'u' if not dtype.signed else ''
            value = field.value_def.value
            h_type = ''
            # Hammer has library functions for parsing ints vs floats
            if dtype.is_int:
                # Hammer only has simple integer parsers for these values
                if size not in [8, 16, 32, 64]:
                    raise NotImplementedError("Generator does not support size=%d" % size)
                # Define the data type according to hammer's library
                h_type = 'hammer.%sint%d()' % (signed_prefix, size)
                # Hammer can support series of a data type, AKA a list
                if dtype.is_list:
                    # Dependencies are covered in the UDP_protocol_specification.md guide
                    if dtype.has_type_dependency:
                        # Not possible in pyHammer, replace with a many() combinator
                        ## Note: this will only work if field is at the end of the message frame
                        h_type = 'hammer.many(%s)' % (h_type)
                    else:
                        h_type = 'hammer.repeat_n(%s, %s)' % (h_type, dtype.list_count) 
                else:
                    # Add a value constraint if exists
                    if value is not None:
                        # parseLab has multiple Value types, but we will only do ValueRange right now
                        if isinstance(value, ValueRange):
                            h_type = 'hammer.int_range(%s, %s, %s)' % (h_type, value.min_bound, value.max_bound)
                        else:
                            raise NotImplementedError("Generator does not support value type=%s" % type(ValueChoice))
            else:
                raise NotImplementedError("Generator does not support non-integer dtypes")
            # Build the parse rule for the field and put it in our array
            field_rule_name = '%s__%s_parser' % (msg_type.name, field.name)
            field_rule = '%s = ' % (field_rule_name)
            field_rule += h_type
            field_parser_rules.append(field_rule)
            fld_rules.append(field_rule_name)
        # Add an "end stream" parser to the end of the field parser list
        fld_rules.append('hammer.end_p()')
        field_rule_list_str = ', '.join(fld_rules)
        # Create the message rule as a sequence of all its field parsers
        msg_rule = '%s_parser = hammer.sequence(%s)' % (msg_type.name, field_rule_list_str)
        message_parser_rules.append(msg_rule)
    # Write out all of the parsers to the running text string
    tab = ' '*4
    parser_func += tab + '\n    '.join(field_parser_rules) + '\n\n'
    parser_func += tab + '\n    '.join(message_parser_rules) + '\n\n'
    parser_func += tab + protocol_parser_rule + '\n'
    # The function we are building must return the final parser for it to be used
    parser_func += tab + 'return %s' % (protocol_parser_rule_name)
    return parser_func

That was a lot all at once. I would recommend putting this into your generator module and running the generate_parser.py driver again. Observing the output of it, in parallel with reading through the comments, I believe should be enough to get a grasp on the basic concepts we are showing here.

It is safe to say that we now have a working parser function; we can now write it out to a file. Lets modify the __init__ function so that we can refernce the output directory that we will place all of our generated files.

class PyhammerGenerator(ParselabGenerator):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.backend_name = "pyHammer"
        self.protocol_name = os.path.basename(self.protocol_dir)
        self.output_directory = os.path.join(self.protocol_dir, self.backend_name)
        self.test_directory = os.path.join(self.output_directory, 'tests')

The output directory is going to be in the target protocol directory, and named after out backend. So for our example, this will put all of our generated files in parselab/protocols/udp/pyHammer. We are also going to add the test_directory, as that will be useful in later steps. We can now write our parser code to a file in this directory.

def generate_parser(self):
    
    # ...

    parser_text += self.__generate_parser_func()
    parser_text += '\n'

    self.log.info("Completed generation of parser code")
    
    # Add the output directory of it doesn't exist
    if not os.path.isdir(self.output_directory):
        os.makedirs(self.output_directory)

    # Establish filepath for parser
    parser_filename = '%s_parser.py' % (self.protocol_name)
    parser_filepath = os.path.join(self.output_directory, parser_filename)
    ret_files.append(parser_filepath)

    # Write it out to a file
    with open(parser_filepath, 'w') as f:
        f.write(parser_text)
    
    return ret_files

If we run the generate_parser.py driver again, we should print out the contents of this to a python file.

# From parselab/bin
ls ../protocols/udp/pyHammer
> udp_parser.py

Test Generation Logic

Now that we have a parser that we generated... how do we go about pushing input through it? That is where the Test Generation part of the generator module comes into play.

For that, we are going to focus on the generate_test() method of our generator module. Much like the generate_parser() method, our only requirement for this funciton is that we return a list paths to all of the files that we generated within the method. Aside from this requirement, whatever it takes to create something that is capable of pushing bytes through the generated parser is fair game.

For our needs, we will need something that imports our generated parsers' python module (udp_parser.py) and uses the binary data found in the testcase files (if the concept of testcase files is foreign to you, follow the guide for generating udp messages with the parseLab testcase generation system. Lets make a two new testcases for the udp protocol that we are working with; one valid test, and one invalid test.

# From parselab/bin
./generate_testcase.py --protocol ../protocols/udp --name validTest --one_per --valid
./generate_testcase.py --protocol ../protocols/udp --name invalidTest --one_per --invalid

With some test data ready for us, we will now modify the generate_test() method of our generator so that we can leverage these tests.

from src.utils import gen_util

# ...

def generate_test(self, testcase_dir, protocol_dir, print_results=False):
    self.log.info("Generating a test script that iterates over the messages in the directory: %s" % (testcase_dir)
    test_files = list()

    # Get the results file
    results_file = os.path.join(testcase_dir, gen_util.results_filename)
    test_messages = []
    with open(results_file, 'r') as f:
        pass

    return test_files

We start by creating the list of file pathts that we create during the generation of this test, then make sure to return it. Now that we abide by the "interface contract", we can add the logic for creating test assertions. There is a results.txt file in every one of the testcase directories. In the results file, there is a ' - ' separated list of elements which describe each generated message in the testcase. At its core, the descriptions for the generated message define whether the message was valid or not. We will use this to form our assertion.

If our parser rejects a valid generated message, that is a failure.
If our parser accepts an invalid generated message, that is a failure.
If our parser accepts a valid generated message, that is a pass.
If our parser rejects an invalid generated message, that is a pass.

# From parselab/bin
cat ../protocols/udp/testcases/validTest/results.txt
> 0000_UDP_MESSAGE - valid

# NOTE: This will probably be different for everyone due to the random generators that
#        create this testcase message
cat ../protocols/udp/testcases/invalidTest/result.txt
> 0000_UDP_MESSAGE - invalid - LENGTH - LESS_THAN_BOUNDS

Lets pull this data out of the results file and build up the TestMessage object. The test message object is built by passing it a binary file and a boolean telling it whether or not it is a valid/invalid formatted binary file for the target message type.

from src.TestcaseGenerator import TestMessgae
from src.utils import gen_util

# ...

def generate_test(self, testcase_dir, protocol_dir, print_results=False):
    self.log.info("Generating a test script that iterates over the messages in the directory: %s" % (testcase_dir))
    test_files = list()

    # Get the results file
    results_file = os.path.join(testcase_dir, gen_util.results_filename)
    test_messages = []
    with open(results_file, 'r') as f:
        for line in f.readlines():
            split_line = line.strip().split(' - ')
            filename = split_line[0]
            validity = split_line[1]
            valid = validity == 'valid'
            # Build a TestMessage with this information
            self.log.info("Creating a (%s) TestMessage instance from %s" % (valid, filename))
            tm = TestMessage(filename, valid, testcase_dir)
            test_messages.append(tm)


    # Setup imports 
    simple_imports = ['%s_parser' % (self.protocol_name)] 
    from_imports = list()

    # Build up test file
    test_text = ''
    tab = ' '*4

    # Add imports
    for simple_import in simple_imports:
        test_text += 'import %s\n' % (simple_import)
    for from_import in from_imports:
        test_text += 'from %s import %s' % (from_import[0], from_import[1])
    test_text += '\n'

    # Make a main
    test_text += 'def main():\n'

    # Define parser
    test_text += tab + 'parser = %s_parser.init_%s_parser()\n' % (self.protocol_name, self.protocol_name)
    test_text += '\n'

    # Add module check
    test_text += "if __name__ == '__main__':\n    main()"

    print(test_text)

    return test_files

Again, that was a lot of changes all at once, but if we run the generate_test.py driver, and read through the code/comments, we will see that not much hapened just yet. All we did was setup the framework for the generated test module that we will be putting logic into next.

# From parselab/bin
./generate_test.py --protocol ../protocols/udp --module pyhammer --testcase ../protocols/udp/testcases/validTest
> import udp_parser
>
> def main():
>     parser = udp_parser.init_udp_parser()
>
> if__name__ == '__main__':
>     main()

Now that we have the general structure down, we can go and add the actual testing logic...

def generate_test(self, testcase_dir, protocol_dir, print_results=False):
    self.log.info("Generating a test script that iterates over the messages in the directory: %s" % (testcase_dir))
    test_files = list()

    # Get the results file
    results_file = os.path.join(testcase_dir, gen_util.results_filename)
    test_messages = []
    with open(results_file, 'r') as f:
        for line in f.readlines():
            split_line = line.strip().split(' - ')
            filename = split_line[0]
            validity = split_line[1]
            valid = validity == 'valid'
            # Build a TestMessage with this information
            self.log.info("Creating a (%s) TestMessage instance from %s" % (valid, filename))
            tm = TestMessage(filename, valid, testcase_dir)
            test_messages.append(tm)

    # Setup imports 
    first_imports = ['import os, sys', \
                     'sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), ".."))']
    simple_imports = ['%s_parser' % (self.protocol_name)] 
    from_imports = list()

    # Build up test file
    test_text = ''
    tab = ' '*4

    # Add imports
    for first_import in first_imports:
        test_text += first_import + '\n'
    for simple_import in simple_imports:
        test_text += 'import %s\n' % (simple_import)
    for from_import in from_imports:
        test_text += 'from %s import %s' % (from_import[0], from_import[1])
    test_text += '\n'

    # Make a main
    test_text += 'def main():\n'

    # Define parser
    test_text += tab + 'parser = %s_parser.init_%s_parser()\n' % (self.protocol_name, self.protocol_name)
    test_text += '\n'

    # Add a tuple array of all the binary filepaths and their validity
    tm_paths_str = tab + 'message_paths = ['
    for tm in test_messages:
        tm_path = os.path.abspath(os.path.join(testcase_dir, tm.filename))
        tm_valid = tm.result
        tm_paths_str += '("%s", %s), ' % (tm_path, tm_valid)
    tm_paths_str = tm_paths_str[:-2]
    tm_paths_str += ']\n\n'
    test_text += tm_paths_str

    # Add logic to iterate over the tuple arrayand run parser against its contents
    file_reader_text = '''    # Iterate over binary files and run contents against parser
for (fp, result) in message_paths:
    with open(fp, 'rb') as f:
        ba = bytearray(f.read())
    # Passing bytes through parser
    parse_result = parser.parse(bytes(ba))
    
    if parse_result is not None and {debug_mode}:
        print(parse_result)

    msg_name = os.path.basename(fp)
    test_result = (parse_result is not None) == result
    test_pass_str = 'PASS' if test_result else 'FAIL'
    correct_prefix = 'in' if not test_result else ''
    action_str = 'accepted' if parse_result else 'rejected'
    status_str = 'was %scorrectly %s by the parser' % (correct_prefix, action_str)
    print('[%s] Test %s %s' % (test_pass_str, msg_name, status_str))\n\n'''.format(debug_mode=gen_util.debug_mode)
    test_text += file_reader_text

    # Add module check
    test_text += "if __name__ == '__main__':\n    main()\n"

    testcase_name = os.path.basename(os.path.dirname(testcase_dir))
    testfile_filepath = os.path.join(self.test_directory, testcase_name + '.py')
    testfile_filepath = os.path.abspath(testfile_filepath)
    test_files.append(testfile_filepath)

    if not os.path.isdir(self.test_directory):
        os.makedirs(self.test_directory)

    with open(testfile_filepath, 'w') as f:
        f.write(test_text)

    return test_files

To recap all that was added here:

Updated the imports so that our test can import a module that is one layer above our test module
Created a tuple array of all the test message files and their validity (True=valid, False=invalid)
Added logic to the test_text that iterates over the tuple array, reads the file, and passes the contents of each file into our generated parser
Added logic to print the results to the screen
Added logic to print the contents of the message on a valid parse, if the gen_util.debug_mode is active
Removed the printing of the test_text and replaced it with writing to testfile_filepath

As for the generator module, this is a simplistic way of doing things, and there are infinitely many other ways of implementing this system - plus, every generator will have different requirements, as different backends will have different requirements for running written parser code.

As a test, we can try to run this code from our terminal:

# From parselab/bin
./generate_test.py --protocol ../protocols/udp --module pyhammer --testcase ../protocols/udp/testcases/validTest/
./generate_test.py --protocol ../protocols/udp --module pyhammer --testcase ../protocols/udp/testcases/invalidTest/

# enable debug_mode in parselab/src/utils/gen_util.py
# debug_mode = False -> debug_mode = True
vim ../src/utils/gen_util.py

# Go to the created tests
cd ../protocols/udp/pyHammer/tests

# Run the tests
python3 validTest.py
> [PASS] Test 0000_UDP_MESSAGE.bin was correctly accepted by the parser
python3 invalidTest.py
> [PASS] Test 0000_UDP_MESSAGE.bin was correctly rejected by the parser

Running Generated Tests

Much like all of the other parseLab actions, we are going to run our generated tests with a driver script (run_test.py) Also like the other parseLab actions, were going to need to add the logic for running our tests in our generator module. To do this, we are going to modify the run_test_from_testcase() function. The interface contract for this function expects that it returns an integer which represents the return value/code of our executable test.

import subprocess

# ...

def run_test_from_testcase(self, testcase_dir, protocol_dir):
    testcase_name = os.path.basename(os.path.dirname(testcase_dir))
    testfile_filepath = os.path.join(self.test_directory, testcase_name + '.py')
    testfile_fliepath = os.path.abspath(testfile_filepath)

    self.log.info("Running test %s" % (testfile_filepath))
    cmd = "python3 %s" % (testfile_filepath)
    rv = subprocess.call([cmd], shell=True, stdout=None)
    self.log.info("Test executed with return code=%d" % (rv))

    return rv

Since python is interpreted, we don't have to worry about any sort of build system, and can just leverage subprocess to execute our test script

If we run our run_test.py driver, we can see our output of the test results in the terminal:

# From parselab/bin

# Your output will have an array printing out if debug_mode still is set to True
./run_test.py --protocol ../protocols/udp --module pyhammer --testcase ../protocols/udp/testcases/validTest
> [PASS] Test 0000_UDP_MESSAGE.bin was correctly accepted by the parser

./run_test.py --protocol ../protocols/udp --module pyhammer --testcase ../protocols/udp/testcases/validTest
> [PASS] Test 0000_UDP_MESSAGE.bin was correctly rejected by the parser

Conculsion

Now with the completion of the three interface functions (generate_parser(), generate_test(), run_test_from_testcase()) which handle generation and execution of parsers, we are officially done with our parseLab Generator Module for the Hammer python-bindings.

There is still plenty that can be added to this; many features of parseLab were not used for this guide, some examples include:

ValueChoice and ValueList constraints
float constraints
integer data types that are not limited to 8, 16, 32, and 64-bit lengths
little vs big endian support
distant dependent lengths (unsupposted by python Hammer, but often available in other backends)

For a more discrete representation of completion, running the unit_tests.py driver against our new module will give isight towards its completion.

# From parselab/bin

# Run the unit tests and observe which tests pass versus fail
./unit_tests.py --module pyhammer
> [EXCEPT] unit_tests.ValueTypesTest.ValueTypesTest
> [EXCEPT] unit_tests.GenerateParser.GenerateParser
> [Passed] unit_tests.Setup.Setup
> [Passed] unit_tests.GenerateTest.GenerateTest
> [EXCEPT] unit_tests.DataTypesTest

ValueTypesTest - This checks if your module can generate parsers for all of the possible value types avaiable in parseLab (we know it can't)
GenerateParser - This checks if your module can generate a parser file for a specific protocol.json (we know it most likely can't since our generator module is very limited in capablities)
Setup - This checks if your model can generate the correct files for a valid protocol directory (we know it can)
GenerateTest - This checks if your module can generate a test script that consumes data from a testcase and pushes it into a parser function (we know it can)
DataTypesTest - This checks if your module can generate parsers for all of the possible data types available in parseLab (we know it can't)

If you would like to get an idea of all the potential of parseLab, please look at the parseLab generator module for Hammer for C.

Lastly, the final version of the PyhammerGenerator.py script will be in parselab/examples/pyHammer/PyhammerGenerator.py for a reference

To explore more about the inner workings of parseLab and all of the things that will need to be handled by a parseLab module, you'll want to read through the documentation for the different data types that are used by parseLab which are derived from the protocol specification files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

creating_custom_generator_modules.md

creating_custom_generator_modules.md

How to Create a Custom Generator Module

What is a Custom Generator Module?

Generator Module Interface

Creating a Generator Module

Setup Logic

Parser Generation Logic

Test Generation Logic

Running Generated Tests

Conculsion

Files

creating_custom_generator_modules.md

Latest commit

History

creating_custom_generator_modules.md

File metadata and controls

How to Create a Custom Generator Module

What is a Custom Generator Module?

Generator Module Interface

Creating a Generator Module

Setup Logic

Parser Generation Logic

Test Generation Logic

Running Generated Tests

Conculsion