Verifying a specification

Given a specification, Vehicle can be used to check whether a particular neural network satisfies it.

To do this, Vehicle relies on a type of tool called neural network verifiers. As input, these tools take in a query, a set of equality and inquality constraints relating the inputs and outputs of the neural network. They then attempt to find to an input that satisfies all the constraints at the same time. They can therefore be thought of as domain-specific SAT solvers.

However, these queries are written in a very low-level format and are often extremely large. Furthermore they often don’t support operations such as disjunction. Consequently they are very difficult to write or read, and a single high-level property often needs to be split up into many queries in a non-obvious manner.

However, Vehicle is capable of:

generating these queries automatically from a specification.

calling a verifier on the generated queries to obtain whether each query is satisfiable and using this information to reconstruct the truth status for the original property.

Generating verifier queries

Verifier queries can be generated by passing a suitable target to the vehicle compile command, e.g.

vehicle compile \
  --target MarabouQueries
  --specification my/project/mnist-robustness.vcl \
  --output my/project/queries
  --network classify:my/project/mnist.onnx \
  --dataset trainingImages:my/project/mnist-trainingImages.idx \
  --dataset trainingLabels:my/project/mnist-trainingLabels.idx \
  --parameter epsilon:0.1 \

This command will create a folder my/project/queries which contains all the necessary queries, as well as verification plan plan file .vcl-plan. The latter contains all the information about how the queries relate to the original property they were compiled from.

The full list of relevant command line options are:

--target, -t: The compilation target. There is currently one query format supported: MarabouQueries.

--specification, -s: The .vcl file containing the specification to compile.

--declaration, -y: The name of the declarations in the specification to compile. You may provide this option multiple times to compile multiple properties at once. If not provided, then by default all properties in the specification are compiled. When compiling to verify queries, the declaration provided via this option must be annotated with a @property annotation.

--network, -n

Provide the implementation of a network declared in the specification. Its value should consist of a colon-separated pair of the name of the network in the specification and a file path, i.e.

--network NAME:FILEPATH

Can be used multiple times if the specification involves more than one network.

--dataset, -d

Provide a dataset declared in the specification. Its value should consist of a colon-separated pair of the name of the dataset in the specification and a file path, i.e.

--dataset NAME:FILEPATH

Can be used multiple times if the specification involves more than one dataset.

--parameter, -p

Provide a parameter referenced in the specification. Its value should consist of a colon-separated pair of the name of the parameter in the specification and its value, i.e.

--parameter NAME:VALUE

Can be used multiple times to provide multiple parameters.

--output, -o: The output directory in which to store the compiled queries and the verification plan.

Calling the verifier

Given a folder my/project/compiled-queries containing queries and a verification plan generated by the vehicle compile command above, the specification can be verified by using the vehicle verify command, e.g.

vehicle verify \
  --specification my/project/compiled-queries
  --verifier Marabou

The full list of available command line arguments are as follows:

--specification, -p: The location of the folder containing the queries and verification plan previously generated by Vehicle.

--verifier, -v: Which verifier should be used to perform the verification. At the moment the only supported option is Marabou.

--verifierLocation, -l: Location of the executable for the verifier. If not provided, then Vehicle will search for the name of the executable in the PATH environment variable.

--proofCache, -c: The location to write out a Vehicle proof cache that provides a permanent record of the results of the verification. This can be be used to later re-check the result in an interactive theorem prover. If this option is not present then no proof cache will be generated.

Warning

The verify command is not atomic. Verification involves repeatedly loading the network(s) from disk and Vehicle will not detect changes to the networks that occur while the command is running.

Compiling and verifying

In practice, you often want to perform the two steps above in sequence and you don’t care about the queries generated in the middle. For convenience, this may also be performed using the verify mode by passing a .vcl file as the value of the specification argument. When run in this mode, vehicle will stores the queries in a temporary directory and immediately call the verifier.

vehicle verify \
  --specification my/project/mnist-robustness.vcl \
  --network classify:my/project/mnist.onnx \
  --dataset trainingImages:my/project/mnist-trainingImages.idx \
  --dataset trainingLabels:my/project/mnist-trainingLabels.idx \
  --parameter epsilon:0.1 \
  --verifier Marabou

The table below contains the full list of command line arguments available for the verify command when target is a .vcl file.

--specification, -s: See --specification in compile mode.

--property, -y: See --declaration in compile mode.

--network, -n: See --network in compile mode.

--dataset, -d: See --dataset in compile mode.

--parameter, -p: See --parameter in compile mode.

--verifier, -v: See --verifier in verify mode.

--verifierLocation, -l: See --verifierLocation in verify mode.

--proofCache, -c: See --proofCache in verify mode.

Re-checking a verification result

There are several reasons why one might want to check the status of a specification some time after having initially called verify:

1. The verification could be part of an automated test suite in a continuous integration framework.

2. The specification could have been exported to an interactive theorem prover whose workflow consists of regularly rechecking the validity of proofs.

Unfortunately, depending on the size of the network and the complexity of the specification, verification can be a very expensive procedure taking hours or days. Therefore it is important to avoid unnecessary re-verification.

To solve this problem, the vehicle verify command can produce a proof cache file, which contains:

The original text of the specification.
The verification status of the specification.
The values of the provided parameters.
The file paths of the networks and datasets provided to the original verify command along with a hash of the contents of each file.

The validate command can then be run to use the proof cache to check the status of the specification as follows:

vehicle validate \
 --proofCache /my/project/spec.vcl-cache

Vehicle will read the proof cache, and use its contents to find and rehash the networks and datasets that were used during the original verification of the specification. If the new hashes match those stored in the proof cache then the check passes, otherwise the validate command will exit with an error.

Note

For obvious reasons, moving or renaming any of the networks or datasets will result in the validate command failing.

Limitations of verification

As you might expect, verification is a very hard problem. Therefore there are several limitations that users should be aware of.

Linearity

Quantified variables in the specification must be used in a linear manner. For example, neither of the following is allowed:

@network
f : Vector Rat 2 -> Vector Rat 2

@property
p1 : Bool
p1 = forall x . x * x > 2 => f [ x , 2 ] >= 0.5

@property
p2 : Bool
p2 = forall x y . x * y > 2 => f [ x , y ] >= 0.5

In p1 the variable x is used to calculate a non-linear value x * x, and in p2 the variables x and y are used to create a non-linear value x * y.

In the case where you do try to verify a non-linear property, Vehicle will use its sophisticated auxiliary type-system to help you pinpoint the source of the non-linearity.

Quantifiers

While verifiers can be used to verify both universal properties (i.e. with forall) and existential properties (i.e. with exists) they cannot verify properties with alternating quantifiers where one type of quantifier is used within the scope of the other type of quantifier. Here are some examples.

@network
f : Vector Rat 2 -> Vector Rat 1

@property
good1 : Bool
good1 = forall x . f x ! 0 >= 0.5

@property
good2 : Bool
good2 = exists x . f x ! 0 >= 0.75

Property good1 and good2 can both be verified as they each only use a single type of quantifier.

@property
bad1 : Bool
bad1 = forall y . exists x . f x == y

In contrast property bad1 cannot be verified as it contains a alternating forall and exists.

@property
good3 : Bool
good3 = (forall x . f x ! 0 >= 0.5) and (exists y . f y ! 0 >= 0.75)

However, property good3 can be verified even though it contains both a forall and an exists as the quantifiers are not alternating (i.e. it can split into two to form good1 and good2.)

@property
bad2 : Bool
bad2 = forall x . not (forall x . f x != y)

Note, that as shown by property bad2 alternating quantifiers is not a syntactic property but a logical one. This property can also not be verified despite only containing forall quantifiers. This is because under the rules of classical first order logic, bad2 is logically equivalent to bad1.

In the case where you do try to verify a property with alternating quantifiers, Vehicle will use its sophisticated auxiliary type-system to help you pinpoint the source of the alternation.

Network architecture

Verifiers tend to only support certain layer types and activation functions. At the moment Vehicle doesn’t perform any compatability checking, so please consult the verifier’s own documentation.

Performance

Verification has been shown to be an NP-complete problem so in the worst-case all verification algorithms will take an infeasibly long time to run. However, as with many NP-complete algorithms, in the common case performance can be surprisingly good.

How long it takes to verify a property depends on several factors:

1. The complexity of the property. The more SAT queries that a property is compiled down to, the longer it will take to verify them all. Language features that are likely to increase the number of queries generated are

if statements

and statements underneath a forall quantifier

or statements underneath a exists quantifier

2. The complexity of the network. The larger the number of nodes in the network, the longer it will take the verifier to run the query. In general, networks with a small number of wide layers will be easier to verify than networks with a large number of narrow layers.

3. How “close” the network is to satisfying each query. If a query is easily satisfiable, or easily non-satisfiable then the verifier will return an answer quickly. The closer to the boundary the network lies with respect to the query, the longer it will take the verifier to make a decision. Unfortunately this is almost impossible to quantify to advance.