Xsmith
(require xsmith) | package: xsmith |
Version xsmith 2.0.2 (1c277cb)
1 Introduction
Xsmith is a library for creating fuzz testers, also known as fuzzers, for programming language compilers and interpreters. In other words, Xsmith is a library for creating random program generators.
Xsmith implements a domain-specific language (DSL) for defining random program generators. The DSL is used to specify a programming language’s grammar, typing rules, and other information that guides generation choices. Xsmith also includes utilities for creating a command-line interface for generating a single program or starting a web server that generates one program per request.
There are example random program generators built with Xsmith. They are in the xsmith-examples package.
Reporting bugs. Please send bug reports and fixes to xsmith-bugs@flux.utah.edu.
Getting help. There is a mailing list, xsmith-dev@flux.utah.edu, for discussing Xsmith. Visit the xsmith-dev list management website to subscribe. To post messages to the list, you must first subscribe to the list.
2 How to Install Xsmith
First, install Racket. If your operating system’s package manager doesn’t have a package or you want a fresher version, download it.
Then run raco pkg install xsmith.
3 Xsmith Guide
Xsmith uses RACR, an attribute grammar library, in its implementation, and some knowledge of RACR is necessary when using Xsmith.
To create a fuzzer with Xsmith, users create a specification by combining specification components (more commonly referred to as spec components), defined with define-spec-component. A spec component can define productions of a grammar with add-to-grammar, or it can provide properties of grammar productions with add-property. The grammar and properties are used to generate a RACR grammar, attributes for the grammar, and choice objects, which guide AST generation.
Program generation starts by generating an AST hole for a given grammar production. Generation continues by filling holes with concrete AST nodes, which may introduce new holes as child nodes. The grammar specification is used to determine how to fill holes in the AST. For example, in a grammar with addition and subtraction expressions, a generic Expression hole may be replaced by an Addition or Subtraction node. A choice object is created for each valid possible replacement. Choice objects have methods (called choice-methods) which aid in choosing a concrete replacement. Some of these methods act as predicates to filter out choices that are not legal in a particular context, such as choices that introduce more holes when the maximum tree depth has been reached. The choice-weight property defines a method which determines the relative probability of each choice being chosen. The fresh property defines a method which determines how the choice is instantiated as a RACR node. Additional methods may be defined as helpers. Choice objects have access to the current-hole, so they may query RACR attributes in method bodies. Choice object classes follow the same hierarchy as the grammar, so method inheritance for choice objects is similar to attribute inheritance for RACR nodes.
RACR attributes and choice object methods may be added directly with add-attribute and add-choice-method, respectively, but many are defined indirectly by various Xsmith properties. Properties allow users to specify various attributes and choice methods in a more declarative fashion.
Xsmith was primarily designed for the implementation of language specifications for differential compiler/interpreter testing. Therefore Xsmith takes pains to avoid producing programs that rely on commonly unspecified behaviors, such as the order of evaluation of function or operator arguments. Because the language-agnostic analysis within Xsmith is quite conservative, there are many valid programs that will not be generated. However, Xsmith makes it easy to create a fuzzer that obeys such rules without much language-specific work.
3.1 RACR Overview
RACR is a library for Reference Attribute Grammars. Xsmith’s DSL defines a RACR grammar specification as well as various attributes. The attributes are queried to determine how to generate the AST.
RACR caches the results of attribute queries and keeps track of the nodes accessed for any attribute. When nodes used in an attribute computation are changed, future queries to that attribute are re-computed.
Users can specify new RACR attributes for Xsmith generators, but they should use add-attribute or add-property from Xsmith rather than using RACR functions directly. In expressions evaluated in the context of RACR attributes (attributes) or choice methods, RACR attributes may be queried.
The main RACR APIs of interest are:
Functions for querying the AST:
att-value
ast-child
ast-children
ast-parent
Xsmith provides a function which generates a complete AST, but users can also perform AST rewrites after initial program generation. Relevant RACR functions for performing AST rewrites include:
perform-rewrites
Full RACR documentation is here.
3.2 Holes and Choice Objects
Hole nodes are RACR AST nodes. For every node type in the grammar, a hole node is created as a subclass of that node, inheriting all of its RACR attributes. A hole can be recognized by the xsmith_is-hole? attribute.
my-spec-component |
(add-to-grammar my-spec-component [Expression #f ()] [LiteralInt Expression (v = (random 1000))] [AdditionExpression Expression ([left : Expression] [right : Expression])])
When a fresh AdditionExpression is created, it will include two Expression hole nodes. When the generator gets to those holes, a choice object is created for each subclass of Expression (including Expression itself unless it is disabled with the may-be-generated property). The choice objects have types corresponding to LiteralInt and AdditionExpression, and therefore may have different implementations for various choice methods. The choice objects all have access to the Expression hole (through current-hole), but while choice objects have access to their specialized choice method implementations, the hole is of type Expression, and so all RACR attributes that may be queried are specialized only as far as Expression, not to LiteralInt or AdditionExpression.
Although Xsmith can create holes for any type of production defined in the grammar, by default it will only generate more general holes. In the case of this example, the default behavior would be for Xsmith to generate Expression holes, but not LiteralInt or AdditionExpression holes. More specific holes are used either when a grammar production specifies that a child must be of a specific kind, or when a custom fresh implementation uses make-hole with a specific kind of production. For example, we could extend the above example:
(define-spec-component my-spec-component) (add-to-grammar my-spec-component [Expression #f ()] [LiteralInt Expression (v = (random 1000))] [AdditionExpression Expression ([left : Expression] [right : Expression])] [LiteralSubtractionExpression Expression ([left : LiteralInt] [right : LiteralInt])])
Now, an Expression hole could be filled with LiteralInt, AdditionExpression, or LiteralSubtractionExpression. However, where the AdditionExpression’s two Expression child holes could be filled with any kind of Expression, the LiteralSubtractionExpression’s children will only be LiteralInts.
3.3 Scope Graphs
Xsmith uses the language-independent resolution algorithm of scope graphs.
The theory of scope graphs is described in the paper “A Theory of Name Resolution with Extended Coverage and Proofs”, (available at https://researchr.org/publication/NeronTVW15).
3.4 Attributes, Choices, and Properties, Oh My!
Aside from the grammar productions themselves, Xsmith language specifications deal with attributes (eg. via add-attribute), choice methods (via add-choice-method), and properties (via add-property). The exact nature of these terms and how they relate to one another can be confusing, so let’s talk about them.
Attributes, written with add-attribute, are RACR attributes. They are evaluated dynamically by traversing the generated tree. RACR caches the results of attribute evaluation, flushing the cache when there are changes to parts of the tree that were previously used in computing an attribute. For a full understanding of attributes, you must also read the RACR documentation.
Attributes are evaluated using att-value. For example, if you have a nodemy-node
and wish to evaluate the attributexsmith_type
on that node, you would do: (att-value 'xsmith_type my-node).Some attributes are defined automatically by properties. You may or may not need to write custom attributes while creating a fuzzer, though we have tried to provide as many useful attributes as possible to avoid this.
Attributes generally can not directly access choice methods or properties.
Choice Methods, written with add-choice-method, are methods on choice objects. When Xsmith begins filling in a hole node, it creates a choice object for each grammar production that could fit in that hole. choice methods are used to filter and choose which of those productions to generate and guide generation.
choice methods are run using the send method-calling syntax on the choice object. For example, if you wish to run thexsmith_choice-weight
rule onsome-choice-object
, you would do: (send xsmith_choice-weight some-choice-object).choice methods are just Racket class methods. You may or may not need to write custom choice methods while creating a fuzzer, but custom choice methods are less likely to be needed than custom attributes. During evaluation of a choice method, the hole in question is available as current-hole, so attributes may still be queried in the context of choice methods, but choice methods can not directly access properties.
Properties, written with add-property, are macros that automatically generate attributes and choice methods using a syntax that is often more convenient than manually implementing the attributes and choice methods separately yourself. Properties are evaluated statically, but the attributes and choice methods they define are evaluated dynamically. Each property may require its arguments to be given in a different way, so it is important to read the documentation for each property careful to know how to implement it correctly.
Most of the core attributes and choice methods provided by Xsmith, such as fresh and type-info, are actually defined in terms of properties. These properties are provided by Xsmith to make it easy to define a language specification, and many of them are required to use Xsmith successfully. For the majority of programming languages in the mainstream, these supplied properties are sufficient for a full specification.
It is unlikely that you will need to implement custom properties. Generally, you will want a custom property when you have some common set of attributes or choice methods that you would like to use in multiple separate fuzzers, in which case it may be worth defining a custom property for those fuzzers to share. Such custom properties can be implemented with define-property. A property can read the static specifications of other properties, and generate any number of attributes and choice methods, but it cannot statically access attribute or choice methods values because those are only available during AST generation. However, a property can use the values of attributes and choice methods dynamically.
Remember: Attributes and choice methods are functions that can be used within specific contexts of Xsmith fuzzers. On ther other hand, properties are compile-time macros for generating attributes and choice methods and so are not themselves used during generation.
3.5 Lifting
The term “lift” is used throughout this document and the Xsmith library. In the context of Xsmith, "lifting" refers to creating a binding definition after it’s needed by a reference. In other words, when a reference node (i.e., a node which implements the reference-info property) is created and there is not an available definition that satisfies the type system and effect system, a new definition is “lifted” to a binding form that is visible from that reference location. Additionally, with some probability, new definitions may be lifted even when there is a suitable definition available. This probability can be controlled with the reference-choice-info property.
3.6 Getting Started
This section is a small walkthrough of creating a simple fuzzer using only basic Xsmith forms to show the fundamentals. However, we recommend using Canned Components for any serious fuzzer implementation.
(require xsmith racr racket/string)
(define-spec-component arith)
The spec component is where we store definitions of the grammar productions, properties, attributes, and choice methods. Let’s add some productions to the grammar defined by this spec component! When adding nodes, we must always specify three components: the node’s name, its supertype (also called a "parent type"), and a list of children nodes. We’ll define a top-level node that we will later use to specify where to start program generation, which we’ll call Program. The Program production will have a single child: an Expression. Therefore, we will also need to provide a base Expression production. Neither of these nodes has a supertype, so we will supply #f in the supertype field after the node name. Note that the names of node types should be capitalized camel case (punctuation characters like - and _ are disallowed).
(add-to-grammar arith [Program #f (Expression)] [Expression #f ()])
We want the Expression node to be abstract and not generated itself, so we’ll use the may-be-generated property to restrict it.
(add-property arith may-be-generated [Expression #f])
We can also put properties inline with the grammar definition if we prefer. We can replace the above definition using this style:
(add-to-grammar arith [Program #f (Expression)] [Expression #f () #:prop may-be-generated #f])
(Note that we cannot use both forms, as this will produce an error due to conflicting definitions of a property value for a given grammar production.)
If we add node types that are subtypes of Expression, they can be generated in Expression holes. Let’s add a node for literal integers.
(add-to-grammar arith [LiteralInt Expression ([v = (random 100)])])
Note that the literal node contains a child v that is a normal Racket value, not a grammar node type. It is initialized with the expression on the right-hand side of the = sign.
The literal integers generated in our language will only be values from 0 to 99. Note that we can add the initialization expression inline as above or with the fresh property. If we don’t add an initialization expression, then non-node fields will be initialized with #f, while node fields will be initialized with hole nodes of the appropriate type..
Let’s add addition expressions. Because we have multiple Expressions, we need to give them names. Note that we aren’t supplying initialization code. Because they are grammar node typed, they will be initialized with Expression Hole nodes (same with the Program node’s Expression child).
(add-to-grammar arith [Addition Expression ([l : Expression] [r : Expression])])
Our language only has one type: integers. This means we don’t need to add type rules, but we will anyway for the sake of a complete tutorial. We will call the type int and add an implementation of the type-info property for our language so far.
(define int (base-type 'int)) (add-property arith type-info [Program [int (λ (n t) (hash 'Expression int))]] [LiteralInt [int (λ (n t) (hash))]] [Addition [int (λ (n t) (hash 'l int 'r int))]])
Each rule supplied to the property is surrounded in square brackets, [].
The left-hand side of each of these rules is an expression that returns the type the node can inhabit. Most often, this will just be the name of the node type, such as Program and LiteralInt in this example. If a node can inhabit multiple types, fresh-type-variable can be used to specify an unconstrained or partially constrained type.
On the right-hand side is a function with two parameters: the node currently being evaluated, and the type that is currently being used to confine generation of that node. The body of the function returns a dictionary mapping children (by name or by node object) to a type. Note that type variables are unified during the type analysis, so you should take care with object/pointer equality (i.e., eq?) of type variables.
Now we need to specify how to print our programs, or “render” them. For this, we use the render-node-info property.
(add-property arith render-node-info [Program (λ (n) (att-value 'xsmith_render-node (ast-child 'Expression n)))] [LiteralInt (λ (n) (number->string (ast-child 'v n)))] [Addition (λ (n) (format "(~a + ~a)" (att-value 'xsmith_render-node (ast-child 'l n)) (att-value 'xsmith_render-node (ast-child 'r n))))])
In this case, we are rendering each node directly to a string, but that’s not usually the best approach. More often, we render programs with some intermediate data structure and convert that to a string only as a last step. Most Xsmith fuzzers either use s-expressions for rendering Lisp-like languages, or else the pprint Racket library’s document objects. You can, of course, use your own custom implementation if you prefer.
We put everything together with the define-xsmith-interface-functions macro, which compiles our language specification and defines the arith-command-line function.
(define-xsmith-interface-functions [arith] #:comment-wrap (λ (lines) (string-join (map (λ (x) (format "// ~a" x)) lines) "\n")))
Note that we give it a function for how it wraps comments. Specifically it is a function that takes a list of single-line strings and returns a single string that’s been appropriately formatted to be commented in your language. There are many optional arguments to the define-xsmith-interface-functions that changes the way it behaves.
To actually run our fuzzer, we need to run the arith-command-line function. Let’s put it in our main submodule. Then when we run our program from the command line it will generate a program. It automatically has –help support, and has various options, such as setting a seed, a max depth, etc.
(module+ main (arith-command-line))
3.7 Minimal Example
Below is a complete generator of arithmetic expressions, following the implementation we just outlined in Getting Started. Note that we use #lang clotho instead of #lang racket, which allows us to capture, replay, and modify random choices during generation.
"minimal-example.rkt"
#lang clotho (require xsmith racr racket/string) (define-spec-component arith) (add-to-grammar arith [Program #f (Expression)] [Expression #f () #:prop may-be-generated #f] [LiteralInt Expression ([v = (random 100)])] [Addition Expression ([l : Expression] [r : Expression]) ;; The default weight is 10. #:prop choice-weight 20]) (define int (base-type 'int)) (add-property arith type-info [Program [int (λ (n t) (hash 'Expression int))]] [LiteralInt [int (λ (n t) (hash))]] [Addition [int (λ (n t) (hash 'l int 'r int))]]) (add-property arith render-node-info [Program (λ (n) (att-value 'xsmith_render-node (ast-child 'Expression n)))] [LiteralInt (λ (n) (number->string (ast-child 'v n)))] [Addition (λ (n) (format "(~a + ~a)" (att-value 'xsmith_render-node (ast-child 'l n)) (att-value 'xsmith_render-node (ast-child 'r n))))]) ;; This line defines `arith-command-line`. (define-xsmith-interface-functions [arith] #:comment-wrap (λ (lines) (string-join (map (λ (x) (format "// ~a" x)) lines) "\n"))) (module+ main (arith-command-line))
3.8 Another Small Example with Variables
Here is a bigger example that contains variables and references to those variables.
Note that instead of a Program node, we use the LetStar node as the base node from which to begin generation. The purpose in having a specific top-level Program node is to be sure that the top-level node can have definitions lifted to it. The LetStar node in the following example satisfies this purpose.
This example also renders to s-expressions rather than directly to strings like the previous example. Because of this change, we have to give xsmith-command-line another optional argument, called format-render, specifying how we convert our rendered format into strings.
"minimal-example-with-variables.rkt"
#lang clotho (require xsmith xsmith/app racr racket/pretty racket/string racket/port) (define-spec-component arith) (add-to-grammar arith [Definition #f (name type Expression) #:prop binder-info ()] [Expression #f () #:prop may-be-generated #f] [LetStar Expression ([definitions : Definition *] [sideEs : Expression * = (random 2)] Expression) #:prop strict-child-order? #t] [VariableReference Expression (name) #:prop reference-info (read)] [SetBangRet Expression (name Expression) #:prop reference-info (write)] [LiteralInt Expression ([v = (random 100)])] [Addition Expression ([es : Expression * = (+ 1 (random 5))]) #:prop choice-weight 50]) (define int (base-type 'int)) (add-property arith type-info [Definition [(fresh-type-variable) (λ (n t) (hash 'Expression t))]] [LetStar [(fresh-type-variable) (λ (n t) (hash 'definitions (λ (cn) (fresh-type-variable)) 'sideEs (λ (cn) (fresh-type-variable)) 'Expression t))]] [LiteralInt [int (λ (n t) (hash))]] [VariableReference [(fresh-type-variable) (λ (n t) (hash))]] [SetBangRet [(fresh-type-variable) (λ (n t) (hash 'Expression t))]] [Addition [int (λ (n t) (hash 'es t))]]) (add-property arith render-node-info ;; Note that we've imported xsmith/app, so our #%app allows quoted ;; symbols to be used in function position as a shorthand for ;; using `att-value`. [LetStar (λ (n) `(let* (,@(map (λ (d) `[,(string->symbol (ast-child 'name d)) ,($xsmith_render-node (ast-child 'Expression d))]) (ast-children (ast-child 'definitions n)))) ,@(map (λ (c) ($xsmith_render-node c)) (ast-children (ast-child 'sideEs n))) ,($xsmith_render-node (ast-child 'Expression n))))] [LiteralInt (λ (n) (ast-child 'v n))] [VariableReference (λ (n) (string->symbol (ast-child 'name n)))] [SetBangRet (λ (n) `(begin (set! ,(string->symbol (ast-child 'name n)) ,($xsmith_render-node (ast-child 'Expression n))) ,(string->symbol (ast-child 'name n))))] [Addition (λ (n) `(+ ,@(map (λ (c) ($xsmith_render-node c)) (ast-children (ast-child 'es n)))))]) ;; This line defines `arith-command-line`. (define-xsmith-interface-functions [arith] #:program-node LetStar #:type-thunks (list (λ () int)) #:comment-wrap (λ (lines) (string-join (map (λ (x) (format ";; ~a" x)) lines) "\n")) #:format-render (λ (ast) (with-output-to-string (λ () (pretty-print ast (current-output-port) 1))))) (module+ main (arith-command-line))
3.9 An Upgrade: Using Canned Components
Many programming languages share various features in common, such as binding scope, arithmetic expressions, functions, etc. The xsmith/canned-components library in Xsmith provides tools for quickly building this common infrastructure for your language specification with minimal effort. It supplies the ability to enable various productions as needed, as well as default implementations of the type-info and fresh properties for those nodes. Here is an example of a fuzzer built using Canned Components.
"minimal-example-with-canned-components.rkt"
#lang clotho (require xsmith racr xsmith/racr-convenience xsmith/canned-components racket/string racket/list racket/pretty) ;; We first define a basic component and add a bunch of expressions. (define-basic-spec-component somelisp) (add-basic-expressions somelisp #:ProgramWithSequence #t #:VoidExpression #t #:AssignmentExpression #t #:VariableReference #t #:ProcedureApplication #t #:IfExpression #t #:ExpressionSequence #t #:LetSequential #t #:LambdaWithExpression #t #:Numbers #t #:Booleans #t #:Strings #t #:ImmutableList #t) (add-loop-over-container somelisp #:name Map #:collection-type-constructor (λ (inner) (immutable (list-type inner))) #:loop-type-constructor (λ (inner) (immutable (list-type inner)))) ;; Now we basically just add the render property unless we want to manually ;; define more. (define (render-child sym n) (att-value 'xsmith_render-node (ast-child sym n))) (define (render-children sym n) (map (λ (cn) (att-value 'xsmith_render-node cn)) (ast-children (ast-child sym n)))) (define ((binary-op-renderer op) n) `(,op ,(render-child 'l n) ,(render-child 'r n))) (add-property somelisp render-node-info [ProgramWithSequence (λ (n) ;; Unless our language has a well-defined exception interface that ;; we are trying to fuzz, we need to provide “safe” versions of ;; functions that aren't total. ;; Canned components uses safe functions and requires us to provide ;; definitions. ;; Here we provide some before the generated program. `((define (safe-divide numerator divisor) (if (equal? 0 divisor) 0 (/ numerator divisor))) (define (safe-car list fallback) (if (null? list) fallback (car list))) (define (safe-cdr list fallback) (if (null? list) fallback (cdr list))) ,@(render-children 'definitions n) (define program-result ,(render-child 'ExpressionSequence n)) ;; Let's add code to print the values of generated global variables. (begin ,(if (base-type? (att-value 'xsmith_type (ast-child 'ExpressionSequence n))) '(printf "Program body result: ~v\n" program-result) '(void)) ,@(for/list ([c (ast-children (ast-child 'definitions n))] #:when (base-type? (concretize-type (att-value 'xsmith_type c)))) `(printf "Variable ~a value: ~v\n" ',(string->symbol (ast-child 'name c)) ,(string->symbol (ast-child 'name c)))))))] [Definition (λ (n) `(define ,(string->symbol (ast-child 'name n)) ,(render-child 'Expression n)))] [AssignmentExpression (λ (n) `(set! ,(string->symbol (ast-child 'name n)) ,(render-child 'newvalue n)))] [ExpressionSequence (λ (n) `(begin ,@(render-children 'effectexpressions n) ,(render-child 'finalexpression n)))] [LetSequential (λ (n) `(let* (,@(map (λ (dn) `[,(string->symbol (ast-child 'name dn)) ,(render-child 'Expression dn)]) (ast-children (ast-child 'definitions n)))) ,(render-child 'body n)))] [IfExpression (λ (n) `(if ,(render-child 'test n) ,(render-child 'then n) ,(render-child 'else n)))] [VariableReference (λ (n) (string->symbol (ast-child 'name n)))] [ProcedureApplication (λ (n) `(,(render-child 'procedure n) ,@(render-children 'arguments n)))] [FormalParameter (λ (n) (string->symbol (ast-child 'name n)))] [LambdaWithExpression (λ (n) `(lambda (,@(render-children 'parameters n)) ,(render-child 'body n)))] [BoolLiteral (λ (n) (not (not (ast-child 'v n))))] [Not (λ (n) `(not ,(render-child 'Expression n)))] [And (binary-op-renderer 'and)] [Or (binary-op-renderer 'or)] [IntLiteral (λ (n) (ast-child 'v n))] [Plus (binary-op-renderer '+)] [Minus (binary-op-renderer '-)] [Times (binary-op-renderer '*)] [LessThan (binary-op-renderer '<)] [GreaterThan (binary-op-renderer '>)] [SafeDivide (binary-op-renderer 'safe-divide)] [StringLiteral (λ (n) (ast-child 'v n))] [StringAppend (binary-op-renderer 'string-append)] [StringLength (λ (n) `(string-length ,(render-child 'Expression n)))] [ImmutableListLiteral (λ (n) `(list ,@(render-children 'expressions n)))] [ImmutableListSafeCar (λ (n) `(safe-car ,(render-child 'list n) ,(render-child 'fallback n)))] [ImmutableListSafeCdr (λ (n) `(safe-cdr ,(render-child 'list n) ,(render-child 'fallback n)))] [ImmutableListCons (λ (n) `(cons ,(render-child 'newvalue n) ,(render-child 'list n)))] [VoidExpression (λ (n) '(void))] [Map (λ (n) `(map (λ (,(ast-child 'name (ast-child 'elemname n))) ,(render-child 'body n)) ,(render-child 'collection n)))]) ;; Sometimes we have a type variable that is not concrete but needs to be. ;; Here we provide a list of options we can pick for an unconstrained ;; type variable. (define (type-thunks-for-concretization) (list (λ()int-type) (λ()bool-type) (λ()string-type) (λ()(immutable (list-type (fresh-type-variable)))))) (define (somelisp-format-render s-exps) (define out (open-output-string)) (for ([symex s-exps]) (pretty-print symex out 1)) (get-output-string out)) (define-xsmith-interface-functions [somelisp] #:program-node ProgramWithSequence #:type-thunks type-thunks-for-concretization #:comment-wrap (λ (lines) (string-join (map (λ (l) (format ";; ~a" l)) lines) "\n")) #:format-render somelisp-format-render) (module+ main (somelisp-command-line))
4 Xsmith Reference
4.1 Stability
Xsmith does not currently promise API stability between versions.
4.2 Auto-Generated attributes and choice-methods
Many attributes and choice-methods are auto-generated by Xsmith:
attributes
'xsmith_is-hole?
Accepts no arguments other than the node to call it on. Returns #t when called on a hole node, otherwise returns #f.
Example:; Within the context of a choice method, this will return #t (att-value 'xsmith_is-hole? (current-hole)) 'xsmith_render-node
Accepts no arguments other than the node to call it on. Defined by the render-node-info property.
Example:(add-property my-spec-component render-node-info ; assuming we're outputting for a lispy language [AdditionExpression (λ (n) `(+ ,(att-value 'xsmith_render-node (ast-child 'l n)) ,(att-value 'xsmith_render-node (ast-child 'r n))))]) 'xsmith_find-descendants
Accepts the node to call it on, then a predicate. Returns a list of all descendant nodes that satisfy the predicate.
Example:'xsmith_find-a-descendant
Accepts the node to call it on, then a predicate. Like 'xsmith_find-descendants, but it only returns one. If no descendant matching the predicate is found, it returns #f. It is mostly useful if you just want to know if there is any descendant matching a type, such as to determine whether to do an analysis based on the presence or absence of some feature.
'xsmith_binding Accepts the node to call it on and optionally a boolean require-binder-or-reference flag (defaults to #t). If the given node is a reference, as determined by the reference-info property, then the reference is resolved and a binding object is returned. If the given node is a binder, as determined by the binder-info property, then its binding object is returned. For any other node, #f is returned when require-binder-or-reference is false, otherwise an error is raised.
Example:; This raises an exception if `n` is not a binder or reference. (att-value 'xsmith_binding n) ; If `n` is not a binder or a reference, return #f. (att-value 'xsmith_binding n #f) The main use of this is to do analyses where you need to look up the declaration node that a reference refers to. The binding? object returned by this function contains a reference to the binder node. When resolving references, the scope graph model is used.
'xsmith_ast-depth
Accepts the node it is called on, and no other arguments. Returns the tree depth at that node. Determined by depth-increase.
Example:(att-value 'xsmith_ast-depth n)
'xsmith_type This attribute takes no arguments (besides a RACR node) and returns the type (type?) of that node. Note that the type of the node may be a not-yet-concretized type variable.
(att-value 'xsmith_type n)
choice-methods
'xsmith_get-reference! This is a choice method that can be used when creating a reference node. It uses reference-choice-info to choose a reference. The default value of reference-choice-info will cause a fresh appropriate definition to be lifted if none exists. If no reference can be made (due to a non-default reference-choice-info), #f will be returned. Otherwise, it returns a binding? of the reference, which you can use binding-name on.
Example:(att-value 'xsmith_type (current-hole)) ; -> (fresh-type-variable int bool) ; This will return the name of a variable in scope ; that has type int or type bool. ; It may make a new definition node to do that. (define name (binding-name (send this xsmith_get-reference!))) Note that the reference is choosing one of the possible types (via concretize-type). Probably you are using this in a fresh method to initialize a node. So you will want to put that name in the node.
Generally you don’t need to do this manually, however. Nodes with the reference-info property marked will automatically have their name fields initialized, and nodes with the binder-info property marked will automatically have their name and type fields initialized.
- 'xsmith_get-reference-for-child! This is a choice method that can be used when creating reference nodes for children during the fresh rule. It takes two additional parameters:
type: settled-type? - the type you want the reference to be. Note that it must be a settled type. If you want this type to be based somehow on the type of the node in question, use force-type-exploration-for-node! so the node’s type will be maximally concretely computed before you concretize it.
write-reference?: boolean? - whether the reference will be used as a write reference.
It returns a binding? of the reference or #f if none can be made (due to a non-default reference-choice-info).
Example:; This will return the name of a variable in scope ; with type int for a write reference (define name (binding-name (send this xsmith_get-reference-for-child! int #t))) Like xsmith_get-reference!, it may cause a new definition to be lifted. Only use the result for children of the current-hole.
Node type names, attribute names, and choice-method names are just symbols, so they are not hygienic. The names of symbols used or defined by the xsmith library start with xsmith or _xsmith, so don’t tread on those namespaces.
4.3 Forms for Defining a Grammar and Its Attributes
syntax
(define-spec-component component-name)
(define-spec-component my-spec-component) ; Now use add-to-grammar, add-property, etc. ; Then use my-spec-component in assemble-part-specs.
syntax
(define-xsmith-interface-functions [spec-component ...] option ...)
option = #:fuzzer-name fuzzer-name:id | #:fuzzer-version fuzzer-version-string | #:program-node program-node:id | #:properties [property ...] | #:command-line-name command-line-name:id | #:function-name function-name:id | #:comment-wrap comment-wrap-function | #:format-render format-render-function | #:default-max-depth default-max-depth | #:default-type-max-depth default-type-max-depth | #:type-thunks type-thunks | #:features ([feature-name:id feature-default-value:boolean optional-feature-docstring:string] ...) | #:extra-parameters ([parameter-name:id parameter-docstring:string parameter-expression param-converter])
The fuzzer name defaults to the name of the first spec component provided. The command-line function is named the id passed to #:command-line-name, defaulting to <fuzzer-name>-command-line with <fuzzer-name> replaced appropriately. The command-line function takes no arguments and parses current-command-line-arguments.
#:comment-wrap takes a list of strings which contain info about the generated program, such as the command line used to generate it, the #:fuzzer-name, the #:fuzzer-version, and the random seed number. It should return a string representing those lines commented out. Such as the following, assuming the "#" character is the line-comment character in your language:
(λ (lines) (string-join (map (λ (x) (format "# ~a" x)) lines) "\n"))
fuzzer-version takes a string describing the version of the current fuzzer, used in automatically-generated output.
#:format-render is a function which takes the output of your render-node-info property as input and should return a string representing the program (perhaps by use of a pretty-printing function). If your render-node-info property produces a string already, you will not need to specify the #:format-render argument.
The #:default-max-depth sets the “max” tree depth generated. It’s a bit of a fib – there are situations where Xsmith will allow deeper generation, but they are limited. Similarly the #:default-type-max-depth sets the maximum depth that will be used for type concretization. In other words, when a type variable is concretized it will only choose base types past that depth. However, deeper types can still be made through other means.
The #:type-thunks argument should be a list or thunk that returns a list of thunks that return a type. IE (listof (-> type?)) or (-> (listof (-> type?))). However, #f may be returned instead of a thunk or instead of a type to filter out a given type in some situations.
The #:features argument takes a list of lists containing a feature name (as an identifier) and a default value (as a boolean), and optionally a list of documentation strings. Each feature will be included in the command-line options as --with-<feature-name>. Documentation strings will be displayed in the --help text, one per line. The values of these features is available via xsmith-feature-enabled?.
#:extra-parameters is a list of specifications for extra custom command line parameters. Each list contains an identifier to be used as the name of the parameter (eg. for use in the command-line interface), a documentation string, a parameter, and a normalization function (or #f). The normalization function should take the string given on the command line and convert it to the value that is actually parameterized during program generation. The following example defines a "--max-widgets" parameter with a default of 3:
(define widget-parameter (make-parameter 3)) ... (define-xsmith-interface-functions ... #:extra-parameters ([max-widgets "The maximum number of widgets" widget-parameter string->number]))
--help – See all command-line options. The --help list will automatically stay up to date, unlike this documentation.
--version – Prints the version info of Xsmith and exits.
--server <true-or-false> – Whether to run the web server. Defaults to false.
--server-port <port-number> – Port to use when running server. Defaults to 8080.
--server-ip <ip-address> – Listen for connections to <ip-address>. Use false to accept connections from all the machine’s addresses. Defaults to 127.0.0.1.
--server-path <url-path> – Run the server at localhost:8080/url-path. Defaults to /.
--seed – Random seed for program generation. Defaults to whatever Racket does to initialize randomness normally.
--output-file <filename> – Outputs to <filename> instead of standard output when not used with --server.
--max-depth <n> – Maximum depth of the generated program tree.
--tree-on-error <true-or-false> – Whether to show a partial program tree when an error occurs. Defaults to false.
--with-<feature> <bool> – Enables or disables generator-specific features, where <feature> is replaced with concrete feature names.
Various attributes are automatically defined within the spec, see Auto-Generated attributes and choice-methods.
Properties (defined with define-property) are used to derive more RACR attributes as well as Xsmith choice-methods, and extra properties can be used in your fuzzer by adding them with #:properties. Each property may have a transformer function that alters other properties, attributes, or choice-methods. All properties referenced within a spec-component are used to generate attributes and choice-methods, as well as any properties specified in the maybe-properties list. Unless values for that property have also been specified within a spec component, properties in the #:properties list will only be able to generate rules based on the default value for the property. Note that any property that has a value attached to a node in the grammar will be run, but if you have defined custom properties that don’t have values attached you may need to add the properties to this list. In other words, if you have a property that defines an attribute on the default node without attaching a value to any node, you need to add it here or the attribute won’t actually be added.
syntax
(add-to-grammar spec-component grammar-clause ...)
grammar-clause = (node-name parent-name (field ...) maybe-prop ..) parent-name = identifier | #f field = name/type-id | (name/type-id maybe-type-id maybe-kleene-star maybe-init-expr) maybe-type-id =
| : type-name maybe-kleene-star =
| * maybe-init-expr =
| = init-expr maybe-prop = #:prop prop-id prop-val
node-name will be the name of the grammar production in RACR. parent-name is either the name of the parent grammar production or #f.
Names for the node and fields are limited to alphabetic characters. You may want to use camelCase style names since kebab-style or snake_style names are invalid due to this limitation.
Fields are then specified. Each nonterminal inherits all fields of its parent nodes. A field has a name, a type, an optional kleene star, and an optional initialization expression. The type of each field is the name of the nonterminal that it must be or #f for fields that may contain arbitrary Racket values. A field name may be the same as the type, in which case the type does not need to be specified. If a type is not specified and the name does not match the name of a nonterminal, then the type #f is used. If the optional kleene star is supplied, the field will be a list field. If a kleene star is provided for a non-false type, the name and type must be specified separately.
The init-expr for each field specifies a default value for the field. When a node is generated, each init-expr is evaluated unless a non-default value is supplied to the generating function. If no init-expr is supplied, the following defaults are used:
For false types, #f.
For nonterminal types, a hole node of the appropriate type.
For fields with a kleene star, an empty list.
For nodes with a kleene star, init-expr may return a list or a number. If a number is provided, the default value is a list of that many of the non-kleene-star default value.
(add-to-grammar my-spec-component [Expression #f ()] [LiteralInt Expression (v = (random 1000))] [AdditionExpression Expression ([left : Expression] [right : Expression])] [SumExpression Expression ([addends : Expression * = (random 5)])])
The example defines a piece of a grammath that includes some kinds of expressions. When a LiteralInt expression is generated, it’s v field will be populated with a random number. Since the random expression is evaluated for each fresh LiteralInt, they will (probably) receive different values for v. When an AditionExpression node is generated, it will be populated with an Expression hole node for each of its left and right fields. When a fresh SumExpression is generated, its addends field will be populated with a list of zero to four Expression hole nodes.
syntax
(add-attribute spec-component rule-name rule-clause ...)
rule-clause = (nonterminal-name rule-function)
(add-attribute my-spec-component interp [LiteralInt (λ (n) (ast-child 'v n))] [AdditionExpression (λ (n) (+ (att-value 'interp (ast-child 'left n)) (att-value 'interp (ast-child 'right n))))] [SumExpression (λ (n) (apply + (map (λ (c) (att-value 'interp c)) (ast-children (ast-child 'addends n)))))])
syntax
(add-choice-method spec-component rule-name rule-clause ...)
rule-clause = (nonterminal-name rule-function)
Xsmith creates a choice object class for each node type in the specification grammar, following the same class hierarchy that AST nodes themselves do. Choice objects are created every time Xsmith fills in a hole node. One choice object is created for every node type that is legal to use in filling the hole. Choice objects are then filtered according to the choice-filters-to-apply property, and then the choice-weight property of the remaining choice objects is used to determine the probability of choosing each one. When a choice object is selected, its fresh property is used to generate a new AST node of its type. If all choices are eliminated, an exception is raised with a message stating which filter step invalidated each potential choice.
Choice-methods are methods on the choice objects. Some choice-methods are used by choice-filters-to-apply to filter choices. Other choice-methods may be used by those filters or in the body of the fresh property as helper methods. While most information about the AST and the current choice are probably computed using attributes, information about choosing a specific node type to fill in an abstract hole (such as an expression hole which may be filled with many different types of expressions) are computed using choice-methods.
Choice-methods are methods in Racket’s class system and therefore have the this macro available for use in their bodies to access other methods (eg. with the send macro). Choice-methods also have the current-hole macro available within their body so that they can query attributes of the RACR AST being elaborated (eg. with att-value to access attributes and ast-parent to inspect other nodes in the AST).
Since choice-methods are methods in Racket’s class system, they must be defined with a literal lambda (with no parameter for the implicit this argument). If a method needs to modify state (such as to cache the computation of available references of the appropriate type), I would normally recommend the “let-over-lambda” pattern, but that is not allowed in this case. To make up for this, I recommend using make-weak-hasheq to hold the state, using the this object as a key.
(add-choice-method my-spec-component my-weight-helper [#f 7] [AdditionExpression (λ () (if (att-value 'my-helper-attribute (current-hole)) 20 5))]) (add-property my-spec-component choice-weight [#f (send this my-weight-helper)])
syntax
(add-property spec-component prop-name prop-clause ...)
prop-clause = (nonterminal-name prop-value)
Since property transformers are macros that may accept arbitrary domain-specific syntax, the grammar of prop-value varies for each property.
syntax
Elsewhere it raises a syntax error.
syntax
(make-hole hole-type-expression)
(make-hole 'Expression)
This function is essentially used by add-to-grammar as the default value for grammar fields with nonterminal types that lack an init-expr.
Outside of a spec component context, it raises a syntax error.
syntax
(make-fresh-node node-type-expression optional-field-value-dict)
(make-fresh-node 'AdditionExpression (hash 'left (make-fresh-node 'LiteralInt (hash 'v 5))))
Note that the fresh node is initially created unattached to the rest of the program tree. This means that any nodes whose fresh implementation needs to inspect the tree may fail. In particular, reference nodes can only lift bindings when attached to the tree, and will probably fail if created with make-fresh-node.
4.4 Custom Properties
Properties are used to create domain-specific languages or terse declarative syntax for specifying attributes and choice-methods. Custom properties are probably only useful for shared infrastructure for multiple languages (perhaps with the exception of define-non-inheriting-rule-property).
syntax
(define-property name maybe-dups maybe-reads maybe-rewrites maybe-appends maybe-transformer)
maybe-dups =
| #:allow-duplicates? literal-bool maybe-reads =
| #:reads transformer-arg-spec ... maybe-rewrites =
| #:rewrites transformer-arg-spec ... maybe-appends =
| #:appends transformer-arg-spec ... maybe-transformer =
| #:transformer transformer-function transformer-arg-spec = (property prop-name) | (attribute rule-name) | (choice-method rule-name) | (grammar)
Properties can have a transformer function that produce attributes, choice-methods, or other property values. Property transformers can read the values set for the property by add-property, and optionally the values of other properties as well. Not all properties must have a transformer – some properties may just be a single place to store information that is used by (multiple) other properties.
The transformer function accepts a dictionary mapping grammar node names (as symbols) to the syntax objects from the right-hand-side of add-property uses for each property that it reads. The transformer function must return a list of dictionaries mapping grammar node names (as symbols) to syntax objects for the properties, attributes, and choice-methods that it writes.
Property transformers are run during define-xsmith-interface-functions in an order determined by the read/write dependencies among the properties. Appending goes before rewriting, which goes before reading.
If a property appends (using the #:appends keyword) to a property or rule, its return dictionary will be appended to the existing dictionary for that property/rule. This allows a property or rule to be specified in part by the property that appends and in part by another property or the user. If an appended rule (or property that disallows multiple values) ends up with two values for any node, an error is raised.
If a property rewrites (using the #:rewrites keyword) to a property or rule, its return dictionary replaces any previous dictionary for that property/rule. Rewriting a property also automatically implies reading that property. A property or rule may only be rewritten by one property transformer.
(define-property my-property #:reads (property a) (property b) #:rewrites (property c) #:appends (attribute d) (choice-method e) #:transformer (λ (this-prop-dict prop-a-dict prop-b-dict prop-c-dict) ; compute output dictionaries... (list dict-for-c dict-for-d dict-for-e)))
The syntax object value for a property can be anything, since property transformers define the grammar and semantics of properties.
The syntax object value for attributes and choice-methods should be a syntax object specifying a function (IE a lambda). attributes may be any syntax that evaluates to a function (so you may return an identifier that references a function or an expression that computes a function such as let-over-lambda), but choice-method syntax is provided to Racket’s class macro, which requires literal lambda forms.
Dictionaries may or may not contain an entry for each nonterminal in the grammar. A dictionary may even be empty.
In addition to nonterminals, each dictionary may include a mapping for the value #f, which will define a default value used for the (super secret) parent node that define-xsmith-interface-functions defines. If nothing is specified for #f, attributes and choice-methods will have a default which errors, providing a helpful error message.
If the #:allow-duplicates? argument is supplied and is #t, then add-property may be used more than once for the property for the same node, and the syntax object in the dictionary for the property will be a syntax list of the syntax objects specified in the various add-property calls. But by default only one use of add-property is allowed per property per node type, and the syntax object in the dict is the single syntax object from the one call.
Here is a simple example that basically desugars straight to an attribute with a default:
(define-property strict-child-order? #:appends (attribute xsmith_strict-child-order?) #:transformer (λ (this-prop-info) (define xsmith_strict-child-order?-info (hash-set (for/hash ([(n v) (in-dict this-prop-info)]) (values n (syntax-parse v [b:boolean #'(λ (n) b)]))) #f #'(λ (n) #f))) (list xsmith_strict-child-order?-info)))
Allow values to be specified by just #t or #f, rather than (λ (n) #t)
To implicitly provide a default value to the root node (#f).
For more realistic examples of properties, see the file private/core-properties.rkt in the Xsmith implementation. Generally they are big, hairy macros.
syntax
(define-simple-property property-name rule-type optionals)
rule-type = attribute | choice-method optionals =
| #:rule-name rule-name | #:default default | #:transformer transformer
Defines a property that generates an attribute or choice-method according to rule-type. The optional #:default value will be associated with the implicit #f node (and thus be inherited by other nodes unless overrided). The default is given as literal syntax (IE no hash-quote or syntax form needed). The optional transformer should be a syntax parser, or in other words, it must be a function from a syntax object to a syntax object. The default transformer is the identity function, meaning the rule is written verbatim. Note that using the default transformer means there is little reason to use a property, since you can just use add-attribute or add-choice-method directly instead.
By default the name of the generated attribute or choice-method is the same as the name of the property, but this may be overrided by providing #:rule-name.
syntax
(define-non-inheriting-rule-property property-name rule-type maybe-rule-name default-value maybe-transformer)
maybe-rule-name =
| #:rule-name rule-name default-value = #:default default-expr maybe-transformer =
| #:transformer transformer-func
rule-type must be either attribute or choice-method.
rule-name defaults to property-name, but you can make it give the rule a different name than the property.
default-expr is the default value of the property. Any nonterminal that does not have a different value specified gets this one. Note that the default is required, not optional, despite being a keyword argument.
transformer-func is an optional transformer to transform the value. It is not called with a dictionary like the transformers of define-property, but rather it receives each value individually. This allows a small amount of sugar. Note that the value supplied as the default-expr is also transformed by the transformer-func when it is supplied. When no transformer-func is supplied, values are passed through literally.
(define-non-inheriting-rule-property some-bool-flag attribute #:default #f #:transformer (syntax-parser [#f #'(λ () #f)] [#t #'(λ () #t)])) (add-to-grammar a-spec-component [A #f ()] [B A ()] [C B ()]) (add-property a-spec-component some-bool-flag [A #t] [C #t])
Normally B would inherit a method from A when none was specified for it. But in this case it inherits the default (#f). When a user tries (att-value 'some-bool-flag <node-of-type-B>) it will return #f, not #t.
4.5 Refiners
Xsmith supports iterative refinement, which is a technique that allows for the modification of an existing AST. There are no default refiners, and their use is entirely optional. You might use a refiner to handle simplifications due to a post-generation analysis, such as using native mathematical operations instead of a specialized library when it has been proven that such operations will not produce unsafe results.
syntax
(define-refiner spec-name refiner-name maybe-follows maybe-refiner-predicate maybe-global-predicate refiner-clause ...)
maybe-follows =
| #:follows refiner-name ... maybe-refiner-predicate =
| #:refiner-predicate pred maybe-global-predicate =
| #:global-predicate pred
Refiners can have an order imposed on them if so desired. You can specify the #:follows parameter, which accepts either a refiner name or a list of refiner names. All refiners will be sorted based on these #:follows declarations, and executed in the order determined.
If you want to predicate the execution of a refiner itself, you can use the #:refiner-predicate parameter. This parameter accepts a single function that, if given, will be run prior to attempting to run the refiner at all. If it returns anything other than #f, the refiner will be run. The function should be a thunk (i.e., it should accept zero arguments). This is useful if you wish to have a refiner toggled on or off by a command-line argument.
Additionally, you may wish to have some predicate shared among all clauses of a refiner. The #:global-predicate parameter allows for this. Similar to refiner-predicate, #:global-predicate accepts a single predicate function as argument. This function will be prepended to the list of functions for each clause.
The refiner-clauses are similar to those used by the add-property (and similar) functions. One key difference with refiners is that there is always a #f clause given. (The default implementation of this clause if omitted by the user is [(λ (n) #f)], which means the refiner has no effect for any node which does not have a matching clause defined.) The predicate defined by #:global-predicate (if present) will not ever be applied to the #f clause. This means that if you want its functionality to be applied to nodes which do not have matching clauses, you will need to use it in a custom #f clause.
(define-spec-component arith) (add-to-grammar arith ; ... [ArithOrVal #f ()] ; This simplifies choices for generation. [Val ArithOrVal ([v = (random -100 100)])] ; Integer values on the specified ; interval (-100, 100). [ArithOp ArithOrVal ([lhs : ArithOrVal] ; Arithmetic operations have a [rhs : ArithOrVal])] ; left-hand side and a right-hand ; side, which can be either a value ; or another arithmetic operation. [AddOp ArithOp ()] ; + [SubOp ArithOp ()] ; - [MulOp ArithOp ()] ; * [DivOp ArithOp ()] ; / [ModOp ArithOp ()] ; % ; ...) ; ... (more code here) (define-refiner arith make-even ; This refiner makes all literal integer values into even values by ; incrementing them by 1. The `#:refiner-predicate` parameter says that it ; will only be run when the `make-even` feature is enabled. This does ; require the make-even feature to be defined in the `define-xsmith-interface-functions` ; macro. #:refiner-predicate (λ () (xsmith-feature-enabled? 'make-even)) [Val [(λ (n) (odd? (ast-child 'v n))) (λ (n) (make-replacement-node 'Val n (hash 'v (+ 1 (ast-child 'v n)))))]]) (define-refiner arith replace-rhs-val-with-zero ; The `replace-rhs-val-with-zero` refiner looks at the right-hand side of ; every ArithOp to see if it's a Val. If it is, its internal value will ; be replaced with 0. #:global-predicate (λ (n) (node-subtype? (ast-child 'rhs n) 'Val)) [ArithOp [(make-replacement-node (node-type n) n (hash 'rhs (make-fresh-node 'Val (hash 'v 0))))]]) (define-refiner arith prevent-divide-by-zero ; To prevent divide-by-zero errors, this refiner looks for generated ; DivOps and checks whether their right-hand side is a 0. If it is, it ; is replaced with a 13. #:follows:replace-rhs-val-with-zero #:refiner-predicate (λ () (xsmith-feature-enabled? 'prevent-divide-by-zero)) [DivOp [(λ (n) (node-subtype? (ast-child 'rhs n) 'Val)) (λ (n) (eq? 0 (ast-child 'v (ast-child 'rhs n)))) (λ (n) (make-replacement-node 'DivOp n (hash 'rhs (make-fresh-node 'Val (hash 'v 13)))))]]) ; ... (more code here) (define-xsmith-interface-functions [arith] ; ... (other options specified here as needed) #:features ([make-even #f] [prevent-divide-by-zero #t]))
syntax
(make-replacement-node new-node-type original-node [children])
children = (hash child-name child-value ...)
A new-node-type must be given, which is a symbol corresponding to a node type in the grammar. Then, original-node is the node which is being replaced. The new node will completely take the place of the old node; there will no longer be any references to original-node in the AST. Additionally, all children of the original-node will be copied to the new node except those replaced by the children hash. This is simply a hash of child names to values, just like in make-fresh-node.
(define-refiner my-spec-component replace-plus-with-minus [PlusOp [(λ (n) (make-replacement-node 'MinusOp n))]])
Note that make-replacement-node should only be used in the bodies of refiner functions! If a replacement is made but the refiner fails for some other reason (i.e., it returns #f), the replacement will be undone.
4.6 Scope Graph Functions
This section describes the API to the Scope Graph model of binding.
struct
name : string? ast-node : ast-node? type : type? def-or-param : (or/c 'definition 'parameter)
Notably this is returned by the attribute xsmith_binding.
The ast-node field is the grammar node containing the definition of name. The def-or-param field is there to distinguish names that are bound as function parameters vs names that are bound as (local or global) definitions.
Probably all you need to do with this, though, is get the name field and stuff it in the name field of your definition nodes in the fresh method.
parameter
(current-well-formedness-regexp r) → void? r : regexp?
= #px"rp*i?d"
When the xsmith_binding attribute is used or when Xsmith searches for a valid reference with xsmith_get-reference, this regexp determines valid scope-graph resolution paths. The path elements (reference, parent, import, definition) are turned into characters (r, p, i, and d respectively). If the path from reference to definition matches this regexp, it is valid. If two definitions have the same name and paths from a reference to both definitions are valid, the definition that is in scope for the reference is determined by current-path-greater-than.
Because Xsmith doesn’t currently support import elements at all, modifying this regexp is somewhat of a moot point.
parameter
→
(-> (listof/c (or/c 'reference 'parent 'import 'declaration)) (listof/c (or/c 'reference 'parent 'import 'declaration)) any/c) (current-path-greater-than comparator) → void?
comparator :
(-> (listof/c (or/c 'reference 'parent 'import 'declaration)) (listof/c (or/c 'reference 'parent 'import 'declaration)) any/c)
By default the comparator walks down the paths, comparing each element. A path is greater than another if its first differing element is greater. From least to greatest, path elements are 'reference, 'parent, 'import, 'declaration.
For most languages you probably don’t need to fuss with this.
4.7 Core Properties
These properties are available for specification on Xsmith grammars. Many of them define or modify the behavior of core functionality, such as fresh modifying how fresh program nodes are instantiated, or type-info defining a language’s type rules.
Many are optional, but for any language at all you probably need to use type-info and may-be-generated. Additionally, for any language with variables you need binder-info and reference-info. Next, the fresh property will likely become necessary for a few fields that Xsmith can’t infer. Then the most important of the truly optional properties is likely choice-weight.
spec-property
If may-be-generated is false, the node is not added to the list of possibile choices to replace an appropriate AST hole. It is useful to set it to false for abstract node types or for specialized node types that are meant to be swapped in only after a full tree is generated, such as by a later analysis to determine validity of an unsafe operation. This property is NOT inherited by subclasses.
(add-property my-spec-component may-be-generated ; Expression is abstract and should not be instantiated, ; only AdditionExpression, SubtractionExpression, etc. [Expression #f] ; Only safe addition expressions should be generated, ; but maybe a later pass after generation swaps some ; safe addition expressions for unsafe ones after analysis. [UnsafeAdditionExpression #f])
spec-property
(define number (base-type 'number)) (define int (base-type 'int number)) (define float (base-type 'float number)) (define bool (base-type 'bool)) (add-property my-spec-component type-info [AdditionExpression [(fresh-subtype-of number) (λ (n t) (hash 'l t 'r t))]] [SubtractionExpression [(λ (n) (fresh-subtype-of number)) (λ (n t) (hash 'l t 'r t))]] [EqualityExpression [bool (λ (n t) (define arg-type (fresh-type-variable)) (hash 'l arg-type ; You could compute more by using ; a function and computing the type ; given the actual child node. 'r (λ (r-node) arg-type)))]] [Lambda [(function-type (fresh-type-variable) (fresh-type-variable)) (λ (n t) (hash 'arg (function-type-arg-type t) 'Expression (function-type-return-type t)))]])
The property is two armed. The first arm specifies the type(s) that a node can inhabit, and its value is unify!-ed with the node’s type during type checking. The second arm specifies a relationship between a parent node and its children. Children nodes are allowed to be subtypes of the type specified by their parent, so a node’s type is subtype-unify!-ed with the type specified by its parent during type checking.
The first part is the type (or partially-constrained type variable) that the given node can inhabit. The expression given is evaluated fresh every time a node is type checked or considered for generation. When determining the type of a node, this value is unify!-ed with this value.
Additionally, the expression for a node’s type constraint may be a function that takes the node and returns a type instead of a type directly. However, the function may be passed a hole node that does not yet have properly populated fields and that may be a supertype of the node type you are defining the constraint for. So if you use a function here, you need to check the node with (att-value 'xsmith_is-hole? node)
The second part is a function that takes a node, its type, and must return a dictionary mapping its children nodes to types. The dictionary keys may be the node objects of the node’s children OR the symbol of the field name the child inhabits. The values in the hash may be types or functions from node to type. For kleene-star children, if they are all supposed to have the same type, it is convenient to use the field name as the key. However, if a kleene-star child needs a different type for each element of the list, you should use the nodes themselves as keys or use a function as dictionary value.
(add-to-grammar my-spec-component [Cons Expression ([newval : Expression] [list : Expression])]) (add-property my-spec-component type-info [Cons [(list-type (fresh-type-variable)) (λ (n t) (define lt (fresh-list-type)) (unify! lt t) (define inner (list-type-type lt)) (hash 'newval inner 'list t))]])
If the typing function makes any decisions about the type of the parent node when deciding the type of the child nodes, the decision needs to be reflected by either some unification with the child node types, construction/deconstruction using the parent type, or by saving the decision in the parent node and unifying with it in future passes. Here’s an example of a subtly broken type rule:
(add-to-grammar my-spec-component [Convert Expression (Expression)]) (add-property my-spec-component type-info [Convert [(fresh-type-variable int float) (λ (n t) (cond ; Don't do this. [(can-unify? t int) (hash 'Expression float)] [(can-unify? t float) (hash 'Expression int)]))]])
This example uses the input type to determine an output type, and in fact makes a decision about what the input type is. But this decision isn’t saved. The problem is that t could be a concrete int, a concrete float, or it could be a type variable that could be unified with either! In the first two cases this type rule is fine, but in the third case it will effectively decide that the type is an int, and generate a float child expression. Because it doesn’t unify and save this decision, generation in another branch of code could then unify the variable with float, which would then cause a type error when this type function is re-run.
- You could split Convert into two rules that accept one type each.
(add-to-grammar my-spec-component [IntToFloat Expression (Expression)] [FloatToInt Expression (Expression)]) (add-property my-spec-component type-info [IntToFloat [int (λ (n t) float)]] [FloatToInt [float (λ (n t) int)]]) This solves the problem by making the type choice explicit in the requirements to generate the conversion node in the first place.
You could save the choice in a field of the Convert node and re-unify in later passes.
(add-to-grammar my-spec-component [Convert Expression (Expression outputtype)]) (add-property my-spec-component type-info [Convert [(fresh-type-variable int float) (λ (n t) (define it (ast-child 'outputtype n)) (when it (unify! t it)) (define (save-choice chosen-type) (when (not it) (enqueue-inter-choice-transform (λ () (rewrite-terminal 'outputtype n chosen-type))))) (cond [(can-unify? t int) (save-choice int) (hash 'Expression float)] [(can-unify? t float) (save-choice float) (hash 'Expression int)]))]])
Note that the complexity of the Convert example arises because the input type has a relationship with the output type that can’t be expressed in terms of a construction, decomposition, or unification of types.
spec-property
Acceptable values for this property are expressions which produce a dict? object, or expressions which produce a function of type (-> dict? dict?). Keys of the dictionary must be field names of the node being generated. The values in the dictionary are used to fill node fields of the appropriate name. Any field whose name is not in the dictionary will be filled by evaluating the default init-expr defined in the grammar (via add-to-grammar).
(add-to-grammar my-spec-component [Expression #f ()] [LiteralInt Expression (v = (random 1000))] [AdditionExpression Expression ([left : Expression] [right : Expression])]) (add-property my-spec-component fresh ; Make AdditionExpressions always be generated with a literal 7 argument. [AdditionExpression (hash 'left (make-fresh-node LiteralInt (hash 'v 7)))])
This is useful for fields that must be determined together. For example, a function call needs the function name and the number of arguments to be chosen together rather than independently.
As with all choice-methods, this and current-hole are available for use in expressions, which you may want to do for eg. accessing available bindings or mutable information connected to the choice object.
If the result is a procedure instead of a dictionary, that procedure must accept and return a dictionary. It is called with a dictionary that is empty unless the node being created is the result of lifting a definition. In that case it will have the appropriate name and type fields with the name and type chosen by the lifting mechanism. In the case of lifting a definition, the name and type fields in the return dictionary are ignored. This procedure option is allowed because your fresh expression may need access to the name or type to determine the values of other fields. If a definition node only has a name and type field then a fresh property is unnecessary when lifting, and if lifting is the only way you generate definitions then fresh properties or initializers for definition nodes are unnecessary.
If the value for a field (IE values inside the result dictionary) is a procedure, it will be called with 0 arguments. This allows the fresh property to provide a default value that is not evaluated when make-fresh-node is called with an appropriate value.
spec-property
The expression provided as the choice weight will be evaluated in the context of a method call, so this and current-hole are available.
Choice weights should be positive integer values. The default weight is 10 unless set explicitly.
(add-property my-spec-component choice-weight "The default choice weight." [#f (λ () 10)] "Generate more AdditionExpressions" [AdditionExpression 20] [MultiplicationExpression 15] "Generate fewer SumExpressions" [SumExpression 5])
spec-property
The property accepts an expression which much evaluate to a function of one argument (the RACR AST node) which returns an integer for the depth increase. The default is (λ (n) 1). This property is NOT inherited by subclasses.
This is useful to allow node re-use. For example, the body of an if or for statement might be a block and have the same semantics, but you might want a block inside an if to only be considered a depth increase of 1, not 2.
(define no-depth-if-body-is-block (λ (n) (if (node-subtype? (ast-child 'body n) 'Block) 0 1))) (add-property my-spec-component depth-increase [IfStatement no-depth-if-body-is-block] [ForStatement no-depth-if-body-is-block])
spec-property
But if you want to set it:
The property accepts expressions which will evaluate to booleans (IE anything but only #f is false...), which are evaluated if the choice is made at the point where the AST is at it maximum depth. A true value means that the choice is acceptable, false otherwise. The default is computed by checking whether a node includes AST-node-typed fields. If it does not it is considered atomic and therefore acceptable to choose when the AST is already at its maximum depth.
spec-property
#:name-field specifies the name of a field in the node that contains the name of the definition, and defaults to name.
#:type-field specifies the name of a field in the node that contains a type annotation for the definition, and defaults to type.
#:binder-style must be either definition or parameter, reflecting whether the binding is a function parameter (default: definition). This is used by some Xsmith analyses about higher order values.
#:lift-target? defaults to #t. When #:lift-target? is true and the binding style is definition and not parameter then the binder is eligible to be lifted automatically.
Note that for a basic definition with default name fields, the property need only contain an empty list to mark that the node is in fact a binder.
(add-to-grammar my-spec-component [Definition #f (name type Expression) #:prop binder-info ()] [FormalParameter #f (name type) #:prop binder-info (#:binder-style parameter)] [Reference #f (name) #:prop reference-info (read)])
spec-property
The identifier read or the identifier write, indicating whether the reference reads or writes the variable
(Optional) #:name-field: The name of the field that stores the reference name (as an identifier). Defaults to name.
(Optionaly) #:unifies: Accepts the name of a field that the type checker unifies with respect to. The default value, #t, unifies the node itself instead of one of its fields. Use #f to disable automated unification for this node. (If you disable unification, you should implement your own manually!)
If you give a field for the unification target, that field’s type rule must NOT depend on the parent node’s type. Essentially, the #:unifies argument is meant for writes to any type in a node that itself has type void.
(add-property my-spec-component reference-info [Reference (read)] [Assignment (write name #:unifies Expression)])
spec-property
The node
A (potentially empty) list of binding? reference options
A boolean that tells whether lifting a new definition is an option
The function must return one of the options, the symbol 'lift to lift a fresh definition, or #f. If it returns #f, then you can’t have a reference there, and you have to deal with that in your fuzzer. Returning #f is probably a bad idea. The default shouldn’t ever return #f. The default value biases towards choosing parameters over definitions and lexically closer bindings over far ones.
You can give different values to different nodes, but a default on #f is probably good enough?
(add-property my-spec-component reference-choice-info [#f (λ (n options lift-available?) (if (null? options) (and lift-available? 'lift) (random-ref options)))])
spec-property
If the property is not specified, 'serial is assumed and used as a default.
(add-to-grammar my-spec-component [Let #f ([definitions : Definition *] Expression)] [Letstar #f ([definitions : Definition *] Expression)] [Letrec #f ([definitions : Definition *] Expression)]) (add-property my-spec-component binding-structure [Let 'parallel] ; Letstar we can leave blank if we want because serial is the default. [Letrec 'recursive])
spec-property
Setting this property is unnecessary, but using it allows more liberal use of references, mutation, and io effects.
(add-property my-spec-component strict-child-order? ; Most languages have some sort of sequential construct, like a block [Block #t])
spec-property
The property takes a list of either the identifier read or the identifier write, then an expression for the key for the kind of mutable container. The key can be anything, but it needs to be eq? for each access of the same container type and not eq? for accesses to different container types. Eg. you could use the type constructor, or just a symbol for the name of the type.
(add-property my-spec-component mutable-container-access [MutableRecordGetField (read 'MutableRecord)] [MutableRecordSetField (write 'MutableRecord)] [MutableArrayReference (read 'MutableArray)] [MutableArraySet (write 'MutableArray)])
spec-property
(add-property my-spec-component io [Print #t])
spec-property
(add-to-grammar my-spec-component [Let #f ([definitions : Definition *] Expression)] [Letstar #f ([definitions : Definition *] Expression)] [Letrec #f ([definitions : Definition *] Expression)]) (add-property my-spec-component lift-predicate ; Allow any definition type to be lifted into the top level of a program. [Program (λ (n type) #t)] ; Lifting a definition to Lambda's formal parameter list would require changing all calls. [Lambda (λ (n type) #f)] ; Allow local variables to be lifted, except make all functions top-level. [Let (λ (n type) (not (function-type? type)))])
spec-property
(add-to-grammar my-spec-component [VariableDefinition #f (name type Expression)] [FunctionDefinition #f (name type Body)]) (add-property my-spec-component lift-type->ast-binder-type [#f (λ (type) (if (function-type? type) 'FunctionDefinition 'VariableDefinition))])
spec-property
Uses for this include restricting where certain nodes can appear. For example, the easiest way to create a fuzzer for a language with functions is to have a Lambda expression node. However, some languages do not support first-class functions. But you can still encode the fuzzer as having a lambda node that is simply restricted to only be generated as children of definition nodes, or perhaps only global definition nodes.
(add-choice-method my-component no-lambda-except-global-def [#f (λ () #t)] [Lambda (λ () (and (parent-node current-hole) (equal? (ast-node-type (parent-node current-hole)) 'Definition) (parent-node (parent-node current-hole)) (equal? (ast-node-type (parent-node (parent-node current-hole))) 'Program)))]) (add-property my-component choice-filters-to-apply [#f (no-lambda-except-global-def)])
Some core methods are always applied in addition to this list, such as the method defined by the may-be-generated property. If you don’t make custom filtering rules you don’t need to specify this property.
spec-property
The edit property is specified as a procedure that takes the node in question and returns either #f to signify that there is no edit necessary or a thunk that performs the edit. The returned thunk must perform an edit in such a way that the edit property’s procedure will return #f in the future, or editing will loop indefinitely. The edit property may be specified multiple times, each with a different procedure that potentially performs an edit. If multiple edit procedures are specified, the ordering of the procedures is not guaranteed, so if an order between them is necessary, they must have their own method of signalling between them.
One way the edit property may be used to stage edits is to specify a fresh property that fills nodes as bud nodes (with create-ast-bud), then check for that node’s dependencies (eg. sibling nodes) in the edit property. When the dependencies are appropriately filled, the edit property can then replace the bud node with a hole node (using rewrite-subtree and make-hole) to be filled in normally. Note that if you follow this pattern, you need to take care in other properties (such as type-info) to check whether children are bud nodes before trying to query their attributes.
(add-to-grammar my-component ; be sure the left node is filled before the right node [Addition Expression ([l : Expression] [r : Expression = (create-ast-bud)]) #:prop edit (λ (n) (and (ast-bud-node? (ast-child 'r n)) (not (att-value 'xsmith_is-hole? (ast-child 'l n))) (λ () (rewrite-subtree (ast-child 'r n) (make-hole 'Expression)))))])
spec-property
(add-property my-spec-component render-node-info [#f (λ (node) (symbol->string (ast-node-type node)))])
spec-property
(add-property render-hole-info [#f (λ (hole) (format "<~a>" (symbol->string (ast-node-type hole))))])
4.8 Types
These type constructors and other functions are largely useful for specifying the type-info property.
While there are various predicates for different types, at any point in type checking you might actually have a type variable instead of a concrete type. So if you want to check if you have a particular type (and maybe deconstruct it), you should maybe create an instance of the type you are interested in, check if it can-unify?, then unify!-ing it if you want to deconstruct it. Note that if you do unify! a type variable, that unification needs to be consistent between multiple runs of type checking (since it runs multiple times as the tree is constructed). In other words, if you randomly choose a type at any point, you need to store that type in a grammar attribute and consistently unify against it.
procedure
(fresh-type-variable args ...) → type?
args : type?
; Unconstrained (define v1 (fresh-type-variable)) (define int (base-type 'int)) (define float (base-type 'float)) (define bool (base-type 'bool)) (define v2 (fresh-type-variable int bool)) (unify! v1 v2) (can-unify? v1 float) ; #f (can-unify? v1 int) ; #t (unify! v2 bool) (can-unify? v1 int) ; #f
procedure
(fresh-subtype-of t) → type?
t : type?
procedure
(type-variable? t) → bool?
t : any/c
procedure
(can-unify? t1 t2) → bool?
t1 : type? t2 : type?
If unification fails an exception is raised. Right now a failure to unify might mean that type variables are left in a bad state, so code generation should just give up at that point.
procedure
(can-subtype-unify? sub super) → bool?
sub : type? super : type?
procedure
(subtype-unify! sub super) → void?
sub : type? super : type?
(begin (subtype-unify! sub super) (subtype-unify! super sub))
(unify! sub super)
If unification fails an exception is raised. A failure in unification is basically catastrophic, so no code generation should be attempted after a unification failure.
The subtype-unify! function is used automatically during type checking to put a node’s type in the subtype relationship with the type its parent provides it, so you probably don’t need to use this function manually. The unify! function is more useful in user code.
procedure
name : symbol? supertype : (or/c #f base-type?) = #f leaf? : any/c = #t
If leaf? is true, no subtypes of this base type can be created.
procedure
(base-type? x) → any/c
x : any/c
procedure
(base-type-name bt) → symbol?
bt : base-type?
procedure
(function-type arg-type return-type) → type?
arg-type : type? return-type : type?
procedure
(function-type? t) → bool?
t : any/c
procedure
(function-type-arg-type t) → type?
t : function-type?
procedure
t : function-type?
procedure
(product-type types) → type?
types : (or/c (listof types?) #f)
(define any-length (product-type #f)) (define l2 (product-type (list int int))) (define l3 (product-type (list int int int))) (can-unify? any-length l2) ; #t (can-unify? any-length l3) ; #t (unify! any-length l2) (can-unify? any-length l2) ; #t (can-unify? any-length l3) ; #f
procedure
(product-type? t) → bool?
t : any/c
procedure
t : product-type?
syntax
(define-generic-type name (field-spec ...))
field-spec = [field-name variance] | field-name variance = invariant | covariant | contravariant
This form defines a constructor name, a predicate name?, and one accessor for each type-argument name-type-argument a la struct. If no variance is given for a field, it is invariant when subtyping.
Each instance of a generic type can be unified with other instances of the same generic.
(define number (base-type 'number)) (define int (base-type 'int number)) (define-generic-type list-type ([type covariant])) (define t1 (list-type number)) (generic-type? t1) ; #t (list-type? t1) ; #t (generic-type-type-arguments t1) ; (list number) (list-type-type t1) ; number (define t2 (list-type int)) (can-subtype-unify? t1 t2) ; #f (can-subtype-unify? t2 t1) ; #t (can-unify? t1 t2) ; #f
procedure
(generic-type? t) → bool?
t : any/c
procedure
(generic-type-name t) → symbol?
t : generic-type?
procedure
(generic-type-type-arguments t) → (listof type?)
t : generic-type?
procedure
(nominal-record-type? t) → bool?
t : any/c
Partially specified nominal-record-type?s are created with nominal-record-type-with. Fully specified nominal-record-type?s are created by using concretize-type on a partially specified one. Rather than making them manually, simply rely on Xsmith’s definition-lifting mechanism to create appropriate fully-specified nominal-record-type?s.
When a partially defined nominal-record-type is unify!-ed with a fully defined nominal-record-type, the partially defined one is mutated to become the same as the fully defined one. When two partially defined nominal-record-types are unified together, an error is raised.
Every node in a language grammar that stands for a nominal-record-type constructor, accessor, or mutator must include a reference to a nominal-record-definition-type containing the nominal-record-type being used.
The reason for this is that nominal types must be defined in the program. nominal-record-definition-types are necessary because the lookups of these type names are a different type than uses of the record type that the name is bound to.
(add-to-grammar my-grammar ... [VariableReference Expression (name)] ... [StructReference Expression (fieldname [structdefref : VariableReference] [structval : Expression])] ...) (add-property my-grammar type-info ... [StructReference [(fresh-type-variable) (λ (n t) (define type-with-field (nominal-record-type-with (ast-child 'fieldname n) t)) (hash 'structval type-with-field 'structdefref (nominal-record-definition-type type-with-field)))]] ...)
procedure
(nominal-record-type-with fieldname fieldtype) → type? fieldname : string? fieldtype : type?
procedure
procedure
t : nominal-record-type?
procedure
t : nominal-record-type?
procedure
t : nominal-record-type?
procedure
(nominal-record-definition-type? t) → bool?
t : any/c
procedure
(nominal-record-definition-type-type t) → nominal-record-type?
t : nominal-record-definition-type?
procedure
(structural-record-type? v) → bool/c
v : any/c
procedure
(fresh-structural-record-type [ field-dict #:finalized? finalized?]) → type? field-dict : (hash/c symbol? type?) = (hash) finalized? : any/c = #f
procedure
→ (hash/c symbol? type?) srt : structural-record-type?
Because this may be updated by unification, and type exploration is lazy where possible, you should use force-type-exploration-for-node! before using this on the type of any particular node.
procedure
(force-type-exploration-for-node! n) → void/c
n : ast-node?
If you need to manually inspect details of types inside type-info, you should use this function on the node whose type is in question to be sure the returned type reflects a maximally unified view.
TODO - explain better when you need to use this.
procedure
(settled-type? t) → bool?
t : type?
In other words, all type variables contained in this type (including product-types, nominal-record-types, and structural-record-types) have been unified such that they only have one option, which is itself settled. If you have a variable that is not settled and you need a settled one, you can use concretize-type on the type you have.
See warnings in concretize-type about its use!
Note that through subtype-unify!, variables may be effectively settled without passing the settled-type? predicate. Xsmith uses subtype-unify! internally, so this could be an issue even if you don’t manually use it. When a variable is subtype-unified to be the subtype of a base type, say string, the variable can unify with the range (#f, string). If string has no subtypes, then string is the only type it can be unified with. However, the settled-type? function doesn’t currently have access to the list of subtypes (because currently they can be created dynamically), so it doesn’t know that it’s effectively settled.
procedure
(type-has-no-variables? t) → bool?
t : type?
Mostly this is just useful to tell whether you can confidently use projection functions (eg. product-type-inner-type-list) without running into type variable wrappers.
procedure
(concretize-type t) → type?
t : type?
This function can be useful if you want to generate a random type or proactively make a decision about an unsettled type you have where you need a settled, concrete type. But beware! You should probably NOT generate random types or unify with concretized types unless you also store them in the grammar node that represents that type. The type-checking code defined in the type-info property can be flushed and re-run many times for each node, so a node that randomly chooses its type will not be stable. Because the type algorithm imperatively unifies types, this causes mayhem. Don’t do it.
Note that to use this function you must pass a value in to the #:type-thunks argument of define-xsmith-interface-functions.
4.9 Miscellaneous Utilities
procedure
(fresh-int!) → number?
procedure
(fresh-var-name template) → string?
template : string?
(fresh-var-name "variable_") ; returns the string "variable_123", or something like it.
procedure
(enqueue-inter-choice-transform thunk) → void?
thunk : procedure?
In other words, if your attribute needs to use rewrite-terminal, stick that computation in a thunk, use this on it, and it will be evaluated after the current round of attribute evaluation but before the next one.
4.10 Debug Logging
procedure
(datt-value method node arg ...) → any/c
method : symbol? node : ast-node? arg : any/c
4.11 Turning Your Grammar Specification Into a Program
Use the xsmith-define-interface-functions macro. This section used to have more before things were rearranged. Maybe things should be rearranged more.
procedure
(xsmith-feature-enabled? feature) → boolean?
feature : symbol?
procedure
4.12 RACR Convenience Functions
(require xsmith/racr-convenience) | package: xsmith |
These are a group of convenience functions around RACR. They are not necessary, and some may not be the best choices. But I have found them a little more convenient than using certain RACR APIs directly.
syntax
(expr->ast-list length-expression field-expression)
Wrapper for ast-node-type that returns false rather than erroring when it gets bud nodes or list nodes...
procedure
(parent-node n) → any/c
n : any/c
procedure
(top-ancestor-node n) → any/c
n : any/c
procedure
(node-subtype? n) → any/c
n : any/c
4.13 xsmith/app
(require xsmith/app) | package: xsmith |
The xsmith/app module provides a convenient #%app replacement for accessing attributes (AKA methods, or attributes on AST nodes) and methods (AKA choice-methods) on choice objects.
See Procedure Applications and #%app for more details on #%app.
Note that these bindings are not provided by the main xsmith module.
syntax
(#%app form ...)
($xsmith_type n)
(att-value 'xsmith_type n)
($choice-method-name n 1 2 3)
(send n choice-method-name 1 2 3)
In practice, whether n is an ast-node? or object? can’t be determined statically, so it is tested at runtime.
If the first form is not a quoted symbol, then the racket/base:#%app from is used.
syntax
(define-xsmith-app xsmith-app-name inner-app prefix)
Additionally, prefix is used instead of $. prefix is given as a symbol.
(define-xsmith-app my-xsmith-app #%plain-app $^!)
4.14 Canned Components
(require xsmith/canned-components) | package: xsmith |
The abstract grammars for many languages are very similar. We provide a library of canned components to get fuzzers up and running quickly for languages that include common patterns. All uses of canned components should probably start with define-basic-spec-component.
Note that add-basic-expressions and add-basic-statements aren’t good abstractions, they just facilitate some shared code of common components. You ultimately still have to know the structure of all components added.
The canned-components also export various types that the canned grammar nodes use. Of note, statements use two statement types: return-type and no-return-type. The expressions use all the other provided types.
For some examples that use these canned components, see the xsmith-examples/simple directory.
Note that these bindings are not provided by the main xsmith module.
syntax
(define-basic-spec-component grammar-component)
The Expression and Statement nodes have no fields and cannot be generated by the grammar, but serve as scaffolding for other productions in add-basic-expressions, add-basic-statements, and your own definitions.
The Definition, DefinitionNoRhs, and FormalParameter nodes all have both type and name fields. The Definition node also has an Expression field.
DefinitionNoRhs is meant to be used in forms that implicitly add a definition to subforms, but whose definition is not necessarily of the same type as a child the definition comes from. For example, a loop form where you bind a variable to each element of a given list.
syntax
(add-basic-expressions grammar-component optional ...)
optional = #:ProgramWithSequence boolean | #:VoidExpression boolean | #:AssignmentExpression boolean | #:IfExpression boolean | #:LambdaWithExpression boolean | #:LambdaWithBlock boolean | #:LetSequential boolean | #:ExpressionSequence boolean | #:Booleans boolean | #:Strings boolean | #:MutableArray boolean | #:MutableArraySafeAssignmentExpression boolean | #:ImmutableArray boolean | #:ImmutableList boolean | #:MutableStructuralRecord boolean | #:MutableStructuralRecordAssignmentExpression boolean | #:ImmutableStructuralRecord boolean
TODO - maybe just show the add-to-grammar segments in the lists of what is added. For now, you really just need to look at the source of canned-components to see what it is. It’s not a real abstraction, but I don’t want to tediously document all the internal names right now.
The following top-level node types are always added to the grammar:
An abstract Expression node (IE may-be-generated is false).
If #:ProgramWithSequence is true, (ProgramWithSequence #f ([definitions : Definition *] ExpressionSequence)) is added. Use this when your language is free from the nonsense of statements.
The following Expression node types are always added:
TODO - document all of the field names
VariableReference with name
ProcedureApplication
IntLiteral – uses int type
Plus
Minus
Times
SafeDivide
LessThan
GreaterThan
Other options mostly add a single node type with the same name as the option. The exceptions are:
TODO
Numeric operations provided operate on the number type.
Arrays use either (mutable (array-type (fresh-type-variable))) or (immutable (array-type (fresh-type-variable))).
Structural records use either (mutable (fresh-structural-record-type)) or (immutable (fresh-structural-record-type)).
Lists use (immutable (list-type (fresh-type-variable)))
syntax
(add-basic-statements grammar-component optional ...)
optional = #:ProgramWithBlock boolean | #:AssignmentStatement boolean | #:ExpressionStatement boolean | #:MutableArraySafeAssignmentStatement boolean | #:MutableStructuralRecordAssignmentStatement boolean
TODO - maybe just show the add-to-grammar segments in the lists of what is added. For now, you really just need to look at the source of canned-components to see what it is. It’s not a real abstraction, but I don’t want to tediously document all the internal names right now.
The following node types are always added to the grammar:
An abstract Statement node (IE may-be-generated is false).
ReturnStatement with Expression
Block (a Statement subtype) with definitions and statements
IfElseStatement with test then else
The optional arguments add a node with the same name as the keyword.
ReturnStatement is of type (return-type (fresh-type-variable))
Block and IfElseStatement are of type (fresh-maybe-return-type)
AssignmentStatement, ExpressionStatement, and others are of type no-return-type
spec-property
(add-basic-statements my-component ... #:Block #t ...) (add-to-grammar my-component [OneArmedIfStatement Statement ([test : Expression] [then : Block]) #:prop block-user? #t])
syntax
(add-loop-over-container grammar-component kw-arg ...)
kw-arg = #:name identifier | #:loop-ast-type identifier | #:body-ast-type identifier | #:bind-whole-collection? boolean | #:collection-type-constructor function | #:loop-type-constructor function | #:body-type-constructor function | #:loop-variable-type-constructor function
collection - if #:bind-whole-collection? is #t, it will be of AST node type Definition, otherwise it will be of AST node type Expression (the default). This represents the collection to be looped over, and the #:bind-whole-collection? is a convenience to make references to the whole collection available.
elemname - of AST node type DefinitionNoRhs which itself has name and type fields but no Expression field. This node represents the binding to an element of the list in each iteration of the body.
body - of AST node type body-ast-type, which defaults to Expression. Changing this is useful to make loops whose body is a Statement or Block.
The #:collection-type-constructor should be a function from the type inside the collection to the type of the collection.
The #:loop-type-constructor should be a function from the type inside the collection to the type of the whole loop. By default #:loop-type-constructor is the same as #:collection-type-constructor, which corresponds to a loop that forms a comprehension form with a result (similar to Racket’s for/list form). However, common values for #:loop-type-constructor include (λ (elem-type) void-type) for loop expressions that only side-effect, or (λ (elem-type) (fresh-maybe-return-type)) for loops in those silly statement languages.
The #:body-type-constructor should be a function from the loop type and the element type to the type of the loop body. By default #:body-type-constructor returns the element type, but for side-effectful loops and/or statement-bodied loops it should be something else. For example, a statement-bodied loop should have #:body-type-constructor (λ (loop-type elem-type) loop-type).
The #:loop-variable-type-constructor should be a function from the type inside the collection to the type of the loop variable. By default this is the identity function. However, you can override this to make a loop over a container where the loop variable is, say, an integer that indexes the collection rather than an element of the collection. The details of how you make the bound name match the type are essentially up to the render-node-info rules you write.
Most keyword arguments are optional, but #:name is required.
(add-loop-over-container python-comp ; Sure, Python calls them lists, but my type system calls them arrays. #:name ArrayComprehension #:collection-type-constructor (λ (elem-type) (mutable (array-type elem-type)))) (add-property python-comp render-node-info [ArrayComprehension (λ (n) (h-append (text "[") ($xsmith_render-node (ast-child 'body n)) (text " for ") (text (ast-child 'name (ast-child 'elemname n))) (text " in ") ($xsmith_render-node (ast-child 'collection n)) (text "]")))])
procedure
(return-type t) → type?
t : type?
This is used to encode where a return statement is needed and what type it must be. In other words, the block in a LambdaWithBlock has return-type, the last statement in the block is unified to have the same type.
For the curious, it is implemented as a generic-type? with a covariant inner type.
value
procedure
IE (fresh-type-variable (return-type (fresh-type-variable)) no-return-type).
value
value
value
value
value
value
procedure
(array-type t) → type?
t : type?
5 Acknowledgments
This material is based upon work supported by the National Science Foundation under Grant Number 1527638. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
6 Code and License
The code is available at the University of Utah Flux Research Group’s GitLab server.
Xsmith is Copyright (c) 2016–2020 The University of Utah.
Xsmith is distributed under the following license, which is often called the “BSD License.”
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.