Verifying a specification

Given a suitable specification, Vehicle can be used to check whether a particular neural network satisfies it.

To do this, Vehicle relies on a type of tool called a neural network verifier. As input, such tools take in a query, a set of equality and inquality constraints relating the inputs and outputs of the neural network. They then attempt to find to an assignment to the inputs that satisfy all the constraints at the same time. They can therefore be thought of as domain-specific SMT solvers.

However, these queries are written in a very low-level format and are often extremely large. Furthermore they often don’t support operations such as disjunction. Consequently they are very difficult to write or read, and a single high-level property often needs to be split up into many queries in a non-obvious manner.

However, Vehicle is capable of generating queries automatically from a specification and verifying them using neural network verifiers. Using the satisfiability results from the verifiers, Vehicle can then reconstruct the truth status for the high-level original properties as well as witnesses and counter-examples.

Simple verification

The simplest way to verify a specification is to GET Vehicle to both generate the queries and perform the verification in a single step. This may be performed using the verify mode and passing a .vcl file as the value of the specification argument.

vehicle verify \
  --specification my/project/mnist-robustness.vcl \
  --network classify:my/project/mnist.onnx \
  --dataset trainingImages:my/project/mnist-trainingImages.idx \
  --dataset trainingLabels:my/project/mnist-trainingLabels.idx \
  --parameter epsilon:0.1 \
  --verifier Marabou

The table below contains the full list of command line arguments available for the verify command when specification is a .vcl file.

--specification, -s: The .vcl file containing the specification to compile.

--property, -y: The name of a property in the specification to verify. You may provide this option multiple times to verify multiple properties at once. If not provided, then by default all properties in the specification are compiled. All declarations provided via this option must be annotated with a @property annotation in the specification file.

--network, -n

Provide the implementation of a network declared in the specification. Its value should consist of a colon-separated pair of the name of the network in the specification and a file path, i.e.

--network NAME:FILEPATH

Can be used multiple times if the specification involves more than one network.

--dataset, -d

Provide a dataset declared in the specification. Its value should consist of a colon-separated pair of the name of the dataset in the specification and a file path, i.e.

--dataset NAME:FILEPATH

Can be used multiple times if the specification involves more than one dataset.

--parameter, -p

Provide a parameter referenced in the specification. Its value should consist of a colon-separated pair of the name of the parameter in the specification and its value, i.e.

--parameter NAME:VALUE

Can be used multiple times to provide multiple parameters.

--verifier, -v: Which verifier should be used to perform the verification. At the moment the only supported option is Marabou.

--verifierLocation, -l: Location of the executable for the verifier. If not provided, then Vehicle will search for the name of the executable in the PATH environment variable.

--cache, -c: The location to write out the verification cache that provides a permanent record of the results of the verification. See the sections below for more detail.

Warning

The verify command is not atomic. Verification involves repeatedly loading the network(s) from disk and Vehicle will not detect changes to the networks that occur while the command is running.

Advanced verification

There are several reasons why one might want to check the verification status of a specification some time after having initially called verify:

1. The verification could be part of an automated test suite in a continuous integration framework.

2. The specification could have been exported to an interactive theorem prover whose workflow consists of regularly rechecking the validity of proofs.

Unfortunately, depending on the size of the network and the complexity of the specification, verification can be a very expensive procedure taking hours or days. Therefore, simply calling verify every time you want to check the status of a specification may not be desirable.

Vehicle allows you to avoid unnecessary re-verification using the notion of a __verification cache__.

Structure of a verification cache

Firstly, every verification cache contains:

.vcl-cache-index - this file stores the critical information for the cache, including:
- the list of properties contained within the cache
- the file path and hash of any external resources that were used to create the cache. In particular the original specification, the networks and the datasets passed to the compiler.

Next, for each property named <property> in the original specification the cache initially contains the following files:

<property>-query1.txt, <property>-query2.txt, etc. - these files are the list of queries that need to passed to the verifier to ascertain whether the property holds or not.
<property>.vcl-plan - this file contains all the information necessary to reconstruct the status of the property as written in the original specification file, from the results of the individual queries.

After verification, the cache will also contain the following files:

<property>.vcl-result - this file is generated after verification and stores whether the property was found to be true or false.
<property>-assignments - this folder contains .idx files that store any assignments found by the verifier for the infinite quantified variables in the original specification. These assignments represent either counter-examples to forall statements or witnesses to exists statements.

Generate a verification cache

A verification cache can be generated by passing a suitable target to the vehicle compile command, e.g.

vehicle compile \
  --target MarabouQueries
  --output my/project/robustness-cache
  --specification my/project/mnist-robustness.vcl \
  --network classify:my/project/mnist.onnx \
  --dataset trainingImages:my/project/mnist-trainingImages.idx \
  --dataset trainingLabels:my/project/mnist-trainingLabels.idx \
  --parameter epsilon:0.1 \

The --output argument determines where the verification cache will be written to.

The full list of relevant command line options are:

--target, -t: The compilation target. There is currently one query format supported: MarabouQueries.

--output, -o: The output directory in which to store the compiled queries and the verification plan.

Other arguments are the same as those described in verify mode above.

Calling the verifier

It is possible to use the verify command to verify a specification via its pre-generated verification cache. Suppose the folder my/project/robustness-cache was generated by the vehicle compile command as described above, then specification can be verified by using the vehicle verify command and passing the folder my/project/robustness-cache to the --specification argument instead of the original .vcl file, e.g.

vehicle verify \
  --specification my/project/robustness-cache
  --verifier Marabou

The full list of available command line arguments are as follows:

--specification, -p: The location of the verification cache previously generated by Vehicle.

--verifier, -v: See description above for verify mode.

--verifierLocation, -l: See description above for verify mode.

Unlike, the previous invocation of the verify mode, you do not need to pass in the location of the network and datasets as the cache already contains their location.

Validating a verification cache

The validate command can then be used to check the status of a verification cache as follows:

vehicle validate \
 --cache my/project/robustness-cache

Vehicle will read the verification cache, and use its contents to find and rehash the networks and datasets that were used during the original verification of the specification. If the new hashes match those stored in the verification cache then the check passes, otherwise the validate command will exit with an error.

Note

For obvious reasons, moving or renaming any of the networks or datasets or the original specification will result in the validate command failing.

Inspecting the queries

If you would like to inspect the queries for the verifier generated by Vehicle, there are two options:

Call the compile command and omit the --output argument entirely to print the queries to the command line.

Generate the verification cache as described above, and look at the queries contained within it.

Limitations of verification

As you might expect, verification is a very hard problem. Therefore there are several limitations that users should be aware of.

Linearity

Quantified variables in the specification must be used in a linear manner. For example, neither of the following is allowed:

@network
f : Vector Rat 2 -> Vector Rat 2

@property
p1 : Bool
p1 = forall x . x * x > 2 => f [ x , 2 ] >= 0.5

@property
p2 : Bool
p2 = forall x y . x * y > 2 => f [ x , y ] >= 0.5

In p1 the variable x is used to calculate a non-linear value x * x, and in p2 the variables x and y are used to create a non-linear value x * y.

In the case where you do try to verify a non-linear property, Vehicle will use its sophisticated auxiliary type-system to help you pinpoint the source of the non-linearity.

Quantifiers

While verifiers can be used to verify both universal properties (i.e. with forall) and existential properties (i.e. with exists) they cannot verify properties with alternating quantifiers where one type of quantifier is used within the scope of the other type of quantifier. Here are some examples.

@network
f : Vector Rat 2 -> Vector Rat 1

@property
good1 : Bool
good1 = forall x . f x ! 0 >= 0.5

@property
good2 : Bool
good2 = exists x . f x ! 0 >= 0.75

Property good1 and good2 can both be verified as they each only use a single type of quantifier.

@property
bad1 : Bool
bad1 = forall y . exists x . f x == y

In contrast property bad1 cannot be verified as it contains a alternating forall and exists.

@property
good3 : Bool
good3 = (forall x . f x ! 0 >= 0.5) and (exists y . f y ! 0 >= 0.75)

However, property good3 can be verified even though it contains both a forall and an exists as the quantifiers are not alternating (i.e. it can split into two to form good1 and good2.)

@property
bad2 : Bool
bad2 = forall x . not (forall x . f x != y)

Note, that as shown by property bad2 alternating quantifiers is not a syntactic property but a logical one. This property can also not be verified despite only containing forall quantifiers. This is because under the rules of classical first order logic, bad2 is logically equivalent to bad1.

In the case where you do try to verify a property with alternating quantifiers, Vehicle will use its sophisticated auxiliary type-system to help you pinpoint the source of the alternation.

Network architecture

Verifiers tend to only support certain layer types and activation functions. At the moment Vehicle doesn’t perform any compatability checking, so please consult the verifier’s own documentation.

Performance

Verification has been shown to be an NP-complete problem so in the worst-case all verification algorithms will take an infeasibly long time to run. However, as with many NP-complete algorithms, in the common case performance can be surprisingly good.

How long it takes to verify a property depends on several factors:

1. The complexity of the property. The more SAT queries that a property is compiled down to, the longer it will take to verify them all. Language features that are likely to increase the number of queries generated are

if statements

and statements underneath a forall quantifier

or statements underneath a exists quantifier

2. The complexity of the network. The larger the number of nodes in the network, the longer it will take the verifier to run the query. In general, networks with a small number of wide layers will be easier to verify than networks with a large number of narrow layers.

3. How “close” the network is to satisfying each query. If a query is easily satisfiable, or easily non-satisfiable then the verifier will return an answer quickly. The closer to the boundary the network lies with respect to the query, the longer it will take the verifier to make a decision. Unfortunately this is almost impossible to quantify to advance.