ROAD-R The Autonomous Driving Dataset with Logical Requirements

2025-04-15 0 0 4.6MB 17 页 10玖币

侵权投诉

ROAD-R: The Autonomous Driving Dataset with Logical Requirements

Eleonora Giunchiglia1∗,Mihaela C˘

at˘

alina Stoian1∗,Salman Khan2,

Fabio Cuzzolin2and Thomas Lukasiewicz3,1

1Department of Computer Science, University of Oxford, UK

2School of Engineering, Computing and Mathematics, Oxford Brookes University, UK

3Institute of Logic and Computation, TU Wien, Austria

eleonora.giunchiglia@cs.ox.ac.uk, mihaela.stoian@cs.ox.ac.uk, 19052999@brookes.ac.uk,

fabio.cuzzolin@brookes.ac.uk, thomas.lukasiewicz@cs.ox.ac.uk

Abstract

Neural networks have proven to be very power-

ful at computer vision tasks. However, they of-

ten exhibit unexpected behaviours, violating known

requirements expressing background knowledge.

This calls for models (i) able to learn from the

requirements, and (ii) guaranteed to be compliant

with the requirements themselves. Unfortunately,

the development of such models is hampered by

the lack of datasets equipped with formally speci-

ﬁed requirements. In this paper, we introduce the

ROad event Awareness Dataset with logical Re-

quirements (ROAD-R), the ﬁrst publicly available

dataset for autonomous driving with requirements

expressed as logical constraints. Given ROAD-R,

we show that current state-of-the-art models often

violate its logical constraints, and that it is possi-

ble to exploit them to create models that (i) have

a better performance, and (ii) are guaranteed to be

compliant with the requirements themselves.

1 Introduction

Neural networks have proven to be incredibly powerful at

processing low-level inputs, and for this reason they have

been extensively applied to computer vision tasks, such as

image classiﬁcation, object detection, and action detection

(see e.g., [Krizhevsky et al., 2012; Redmon et al., 2016]).

However, they can exhibit unexpected behaviors, contradict-

ing known requirements expressing background knowledge.

This can have dramatic consequences, especially in safety-

critical scenarios such as autonomous driving. To address

the problem, models should (i) be able to learn from the re-

quirements, and (ii) be guaranteed to be compliant with the

requirements themselves. Unfortunately, the development of

such models is hampered by the lack of datasets equipped

with formally speciﬁed requirements. A notable exception is

given by hierarchical multi-label classiﬁcation (HMC) prob-

lems (see, e.g., [Vens et al., 2008]) in which datasets are pro-

vided with binary constraints of the form (A→B)stating

that label Bmust be predicted whenever label Ais predicted.

∗Contact authors.

In this paper, we introduce multi-label classiﬁcation prob-

lems with propositional logic requirements, in which datasets

are provided with requirements ruling out non-admissible

predictions and expressed in propositional logic. In this new

formulation, given a multi-label classiﬁcation problem with

labels A,Band C, we can, for example, write the require-

ment:

(¬A∧B)∨C,

stating that for each datapoint in the dataset either the label

Cis predicted, or Bbut not Aare predicted. Obviously, any

constraint written for HMC problems can be represented in

our framework, and thus, our problem formulation represents

a generalisation of HMC problems.

Then, we present the ROad event Awareness Dataset with

logical Requirements (ROAD-R), the ﬁrst publicly available

dataset for autonomous driving with requirements expressed

as logical constraints. ROAD-R extends the ROAD dataset

[Singh et al., 2021], which consists of 22 relatively long (∼

8minutes each) videos annotated with road events. A road

event corresponds to a tube, i.e., a sequence of frame-wise

bounding boxes linked in time. Each bounding box is labeled

with a subset of the 41 labels speciﬁed in Table 1. The goal is

to predict the set of labels associated to each bounding box.

We manually annotated ROAD-R with 243 constraints, each

veriﬁed to hold for each bounding box. A typical constraint is

thus “a trafﬁc light cannot be red and green at the same time”,

while there are no constraints like “pedestrians should cross

at crossings”, which should always be satisﬁed in theory, but

which might not be in real-world scenarios.

Given ROAD-R, we considered 6 current state-of-the-art

(SOTA) models, and we showed that they are not able to learn

the requirements just from the data points, as more than 90%

of the times, they produce predictions that violate the con-

straints. Then, we faced the problem of how to leverage the

additional knowledge provided by constraints with the goal of

(i) improving their performance, measured by the frame mean

average precision (f-mAP) at intersection over union (IoU)

thresholds 0.5 and 0.75; see, e.g., [Kalogeiton et al., 2017;

Li et al., 2018]), and (ii) guaranteeing that they are compli-

ant with the constraints. To achieve the above two goals, we

propose the following new models:

1. CL models, i.e., models with a constrained loss allowing

them to learn from the requirements,

arXiv:2210.01597v2 [cs.LG] 5 Oct 2022

Agents Actions Locations

Pedestrian Move away AV lane

Car Move towards Outgoing lane

Cyclist Move Outgoing cycle lane

Motorbike Brake Incoming lane

Medium vehicle Stop Incoming cycle lane

Large vehicle Indicating left Pavement

Bus Indicating right Left pavement

Emergency vehicle Hazards lights on Right pavement

AV trafﬁc light Turn left Junction

Other trafﬁc light Turn right Crossing location

Overtake Bus stop

Wait to cross Parking

Cross from left

Cross from right

Crossing

Push object

Red trafﬁc light

Amber trafﬁc light

Green trafﬁc light

Table 1: ROAD labels.

2. CO models, i.e, models with a constrained output en-

forcing the requirements on the output, and

3. CLCO models, i.e., models with both a constrained loss

and a constrained output.

In particular, we consider three different ways to build CL

(resp., CO, CLCO) models. More speciﬁcally, we run the 9×

6models obtained by equipping the 6 current SOTA models

with a constrained loss and/or a constrained output, and we

show that it is always possible to

1. improve the performance of each SOTA model, and

2. be compliant with (i.e., strictly satisfy) the constraints.

Overall, the best performing model (for IoU = 0.5 and also

IoU = 0.75) is CLCO-RCGRU, i.e., the SOTA model RC-

GRU equipped with both constrained loss and constrained

output: CLCO-RCGRU (i) always satisﬁes the requirements

and (ii) has f-mAP = 31.81 for IoU = 0.5, and f-mAP = 17.27

for IoU = 0.75. RCGRU, (i) produces predictions that violate

the constraints at least 92% of the times, and (ii) has f-mAP

= 30.78 for IoU = 0.5 and f-mAP = 15.98 for IoU = 0.75.

The main contributions of the paper thus are:

• we introduce multi-label classiﬁcation problems with

propositional logic requirements,

• we introduce ROAD-R, which is the ﬁrst publicly avail-

able dataset whose requirements are expressed in full

propositional logic,

• we consider 6 SOTA models and show that on ROAD-

R, they produce predictions violating the requirements

more than 90% of the times,

• we propose new models with a constrained loss and/or

constrained output, and show that in our new models,

it is always possible to improve the performance of the

SOTA models and satisfy the requirements.

Figure 1: Example of violation of ¬RedTL ∨ ¬GreenTL.

The rest of this paper is organized as follows. After the

introduction to the problem, we present ROAD-R (Section

3), followed by the evaluation of the SOTA models (Section

4) and of the SOTA models incorporating the requirements

(Section 5) on ROAD-R. We end the paper with the related

work (Section 6) and the summary and outlook (Section 7).

2 Learning with Requirements

In ROAD, the detection of road events requires the following

tasks: (i) identify the bounding boxes, (ii) associate with each

bounding box a set of labels, and (iii) form a tube from the

identiﬁed bounding boxes with the same labels. Here, we

focus on the second task, and we formulate it as a multilabel

classiﬁcation problem with requirements.

Amulti-label classiﬁcation (MC) problem P= (C,X)con-

sists of a ﬁnite set Cof labels, denoted by A1, A2, . . ., and a

ﬁnite set Xof pairs (x, y), where x∈RD(D≥1) is a

data point, and y⊆ C is the ground truth of x. The ground

truth yassociated with a data point xcharacterizes both the

positive and the negative labels associated with x, deﬁned

to be yand {¬A:A∈ C \ y}, respectively. In ROAD-

R, a data point corresponds to a bounding box, and each

box is labeled with the positive labels representing (i) the

agent performing the actions in the box, (ii) the actions be-

ing performed, and (iii) the locations where the actions take

place. See Appendix A for a detailed description of each la-

bel. Consider an MC problem P= (C,X). A prediction pis

a set of positive and negative labels such that for each label

A∈ C, either A∈por ¬A∈p. A model mfor Pis a

function m(·,·)mapping every label Aand every datapoint

xto [0,1]. A datapoint xis predicted by mto have label

Aif its output value m(A, x)is greater than a user-deﬁned

threshold θ∈[0,1]. The prediction of mfor xis the set

{A:A∈ C, m(A, x)> θ} ∪ {¬A:A∈ C, m(A, x)≤θ}of

positive and negative labels.

An MC problem with propositional logic requirements

(P,Π) consists of an MC problem Pand a ﬁnite set Πof con-

straints ruling out non admissible predictions and expressed

in propositional logic.

Consider an MC problem with requirements (P,Π). Each

requirement delimits the set of predictions that can be asso-

Statistics

|C| 41

|Π|243

avgr∈Π(|r|)2.86

|{A∈ C :∃r∈Π.A ∈r}| 41

|{A∈ C :∃r∈Π.¬A∈r}| 38

minA∈C(|{r∈Π : {A, ¬A} ∩ r6=∅}|)2

avgA∈C(|{r∈Π : {A, ¬A} ∩ r6=∅}|)16.95

maxA∈C(|{r∈Π : {A, ¬A} ∩ r6=∅}|)31

Table 2: Constraint statistics. All the constraints have between 2

and 15 positive and negative labels, with an average of 2.86. All

(resp., 38 of) the labels appear positively (resp., negatively) in Π.

Each label appears either positively or negatively between 2and 31

times in Π, with an average of 16.95.

ciated with each data point by ruling out those that violate it.

A prediction pis admissible if each constraint rin Πis sat-

isﬁed by p. A model mfor Psatisﬁes (resp., violates) the

constraints on a data point xif the prediction of mfor xis

(resp., is not) admissible.

Example 2.1 The requirement that a trafﬁc light cannot be

both red and green corresponds to the constraint {¬RedTL,

¬GreenTL}. Any prediction with {RedTL,GreenTL}is non-

admissible. An example of such prediction is shown in Fig. 1.

Given an MC problem with requirements, it is possible to

take advantage of the constraints in two different ways:

• they can be exploited during learning to teach the model

the background knowledge that they express, and

• they can be used as post-processing to turn a non-admis-

sible prediction into an admissible one.

Models in the ﬁrst and second category are said to have a

constrained loss (CL) and constrained output (CO) respec-

tively. Constrained loss models have the advantage that the

constraints are deployed during the training phase, and this

should result in models (i) with a higher understanding of

the problem and a better performance, but still (ii) with no

guarantee that no violations will be committed. On the other

hand, constrained output models (i) do not exploit the addi-

tional knowledge during training, but (ii) are guaranteed to

have no violations in the ﬁnal outputs. These two options are

not mutually exclusive (i.e., can be used together), and which

one is to be deployed depends also on the extent to which a

system is available. For instance, there can be companies that

already have their own models (which can be black boxes)

and want to make them compliant with a set of requirements

without modifying the model itself. On the other hand, the

exploitation of the constraints in the learning phase can be an

attractive option for those who have a good knowledge of the

model and want to further improve it.

3 ROAD-R

ROAD-R extends the ROAD dataset1[Singh et al., 2021]by

introducing a set Πof 243 constraints that specify the space

1All the code will be released upon publication. ROAD is avail-

able at: https://github.com/gurkirt/road-dataset.

n|Πn|avgr∈Πn(|r∩ C|)avgr∈Πn(|r∩ C|)

2 215 1.995 0.005

3 5 1 2

7 1 1 6

8 6 1 7

9 6 1 8

10 1 0 10

12 1 1 11

14 1 0 14

15 7 1 14

Total 243 1.87 0.96

Table 3: Constraint statistics. Πnis the set of constraints rin Πwith

|r|=n, i.e., with npositive and negative labels. C={¬A:A∈

C}. Each row shows the number of rules rwith |r|=n, and the

average number of negative and positive labels in such rules.

of admissible outputs. In order to improve the usability of our

dataset, we write the constraints in a way that allows us to eas-

ily express Πas a single formula in conjunctive normal form

(CNF). The above can be done without any loss in generality,

as any propositional formula can be expressed in CNF, and is

important because many solvers expects formulas in CNF as

input. Thus, each requirement in Πhas form:

l1∨l2∨ · · · ∨ ln,(1)

where n≥1, and each liis either a negative label ¬Aor a

positive label A. The requirements have been manually spec-

iﬁed following three steps:

1. an initial set of constraints Π1was manually created,

2. a subset Π2⊂Π1was retained by eliminating all those

constraints that were entailed by the others,

3. the ﬁnal subset Π⊂Π2was retained by keeping only

those requirements that were always satisﬁed by the

ground-truth labels of the entire ROAD-R dataset.

Finally, redundancy in the constraints has been automati-

cally checked with RELSAT2. Note that our process of gath-

ering and further selecting the logical requirements follows

more closely the software engineering paradigm rather than

the machine learning view. To this end, we ensured that the

constraints were consistent with the provided labels from the

ROAD dataset in the sense that they were acting as strict con-

ditions to be absolutely satisﬁed by the ground-truth labels, as

emphasized in the third step of the annotation pipeline above.

Tables 2 and 3 give a high-level description of the properties

of the set Πof constraints. Notice that, with a slight abuse

of notation, in the tables we use a set based notation for the

requirements. Each requirement of form (1) thus becomes

{l1, l2, . . . , ln}.

Such notation allows us to express the properties of the re-

quirements in a more succinct way. In addition to the infor-

mation in the tables, we report that of the 243 constraints,

there are two in which all the labels are positive (expressing

2https://github.com/roberto-bayardo/relsat/

(a) Percentage of predictions violating at

least one constraint.

(b) Average number of violations committed

per prediction.

once.

Figure 2: ROAD-R and SOTA models. In the x-axis, there is the threshold θ∈[0.1,0.9], step 0.1.

that there must be at least one agent and that every agent but

trafﬁc lights has at least one location), and 214 in which all

the labels are negative (expressing mutual exclusion between

two labels). All the constraints with more than two labels

have at most one negative label, as they express a one-to-

many relation between actions and agents (like “if something

is crossing, then it is a pedestrian or a cyclist”). Constraints

like “pedestrians should cross at crossings”, which might not

be satisﬁed in practice, are not included. Additionally em-

bedding such logical constraints would require, e.g., using

modal operators and, while it would be an interesting study to

see the impact on the model’s predictions when adding more

complex layers to the expressivity of our logic, we opted for

using a simpler logic in this ﬁrst instance. This also provides

more transparency to the wider research community, as the

full propositional logic covers a vast range of applications that

do not require extra logical operators. The list with all the

243 requirements, with their natural language explanations,

is in Appendix B, Tables 8, 9, and 10. Notice that the 243

requirements restrict the number of admissible prediction to

4985868 ∼5×106, thus ruling out (241 −4985868) ∼1012

non-admissible predictions.3In principle, the set of admissi-

ble predictions can be further reduced by adding other con-

straints. Indeed, the 243 requirements are not guaranteed to

be complete from every possible point of view: as standard

in the software development cycle, the requirement speciﬁ-

cation process deeply involves the stakeholders of the system

(see, e.g., [Sommerville, 2011]). For example, we decided

not to include constraints like “it is not possible to both move

towards and move away”, which were not satisﬁed by all the

data points because of errors in the ground truth labels. In

these cases, we decided to dismiss the constraint in order to

maintain (i) consistency between the knowledge provided by

the constraints and by the data points, and (ii) backward com-

patibility.

As an additional point, we underline that, even though the

annotation of the requirements introduces some overhead in

the annotation process, it is also the case that the effort of

manually writing 243 constraints (i) is negligible when com-

pared to the effort of manually annotating the 22 videos, and

(ii) can improve such last process, e.g., allowing to prevent

errors in the annotation of the data points.

3The number of admissible predictions has been computed with

relsat: https://github.com/roberto-bayardo/relsat/.

4 ROAD-R and SOTA Models

As a ﬁrst step, we ran 6 SOTA temporal feature learning ar-

chitectures as part of a 3D-RetinaNet model [Singh et al.,

2021](with a 2D-ConvNet backbone made of Resnet50 [He

et al., 2016]) for event detection and evaluated to which ex-

tent constraints are violated. We considered:

1. 2D-ConvNet (C2D) [Wang et al., 2018]: a Resnet50-

based architecture with an additional temporal dimen-

sion for learning features from videos. The extension

from 2D to 3D is done by adding a pooling layer over

time to combine the spatial features.

2. Inﬂated 3D-ConvNet (I3D) [Carreira and Zisserman,

2017]: a sequential learning architecture extendable

to any SOTA image classiﬁcation model (2D-ConvNet

based), able to learn continuous spatio-temporal features

from the sequence of frames.

3. Recurrent Convolutional Network (RCN) [Singh and

Cuzzolin, 2019]: a 3D-ConvNet model that relies on

recurrence for learning the spatio-temporal features at

each network level. During the feature extraction phase,

RCNs exploit both 2D convolutions across the spatial

domain and 1D convolutions across the temporal do-

main.

4. Random Connectivity Long Short-Term Memory

(RCLSTM) [Hua et al., 2018]: an updated version

of LSTM in which the neurons are connected in a

stochastic manner, rather than fully connected. In our

case, the LSTM cell is used as a bottleneck in Resnet50

for learning the features sequentially.

5. Random Connectivity Gated Recurrent Unit (RCGRU)

[Hua et al., 2018]: an alternative version of RCLSTM

where the GRU cell is used instead of the LSTM one.

GRU makes the process more efﬁcient with fewer pa-

rameters than the LSTM.

6. SlowFast [Feichtenhofer et al., 2019]: a 3D-CNN ar-

chitecture that contains both slow and fast pathways for

extracting the sequential features. A Slow pathway com-

putes the spatial semantics at low frame rate while a Fast

pathway processes high frame rate for capturing the mo-

tion features. Both of the pathways are fused in a single

architecture by lateral connections.

We trained 3D-RetinaNet4using the same hyperparameter

4https://github.com/gurkirt/3D-RetinaNets.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ROAD-R:TheAutonomousDrivingDatasetwithLogicalRequirementsEleonoraGiunchiglia1,MihaelaCatalinaStoian1,SalmanKhan2,FabioCuzzolin2andThomasLukasiewicz3;11DepartmentofComputerScience,UniversityofOxford,UK2SchoolofEngineering,ComputingandMathematics,OxfordBrookesUniversity,UK3InstituteofLogicandCompu...

展开>> 收起<<

ROAD-R The Autonomous Driving Dataset with Logical Requirements.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ROAD-R The Autonomous Driving Dataset with Logical Requirements

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: