ROAD-R The Autonomous Driving Dataset with Logical Requirements

2025-04-15 0 0 4.6MB 17 页 10玖币
侵权投诉
ROAD-R: The Autonomous Driving Dataset with Logical Requirements
Eleonora Giunchiglia1,Mihaela C˘
at˘
alina Stoian1,Salman Khan2,
Fabio Cuzzolin2and Thomas Lukasiewicz3,1
1Department of Computer Science, University of Oxford, UK
2School of Engineering, Computing and Mathematics, Oxford Brookes University, UK
3Institute of Logic and Computation, TU Wien, Austria
eleonora.giunchiglia@cs.ox.ac.uk, mihaela.stoian@cs.ox.ac.uk, 19052999@brookes.ac.uk,
fabio.cuzzolin@brookes.ac.uk, thomas.lukasiewicz@cs.ox.ac.uk
Abstract
Neural networks have proven to be very power-
ful at computer vision tasks. However, they of-
ten exhibit unexpected behaviours, violating known
requirements expressing background knowledge.
This calls for models (i) able to learn from the
requirements, and (ii) guaranteed to be compliant
with the requirements themselves. Unfortunately,
the development of such models is hampered by
the lack of datasets equipped with formally speci-
fied requirements. In this paper, we introduce the
ROad event Awareness Dataset with logical Re-
quirements (ROAD-R), the first publicly available
dataset for autonomous driving with requirements
expressed as logical constraints. Given ROAD-R,
we show that current state-of-the-art models often
violate its logical constraints, and that it is possi-
ble to exploit them to create models that (i) have
a better performance, and (ii) are guaranteed to be
compliant with the requirements themselves.
1 Introduction
Neural networks have proven to be incredibly powerful at
processing low-level inputs, and for this reason they have
been extensively applied to computer vision tasks, such as
image classification, object detection, and action detection
(see e.g., [Krizhevsky et al., 2012; Redmon et al., 2016]).
However, they can exhibit unexpected behaviors, contradict-
ing known requirements expressing background knowledge.
This can have dramatic consequences, especially in safety-
critical scenarios such as autonomous driving. To address
the problem, models should (i) be able to learn from the re-
quirements, and (ii) be guaranteed to be compliant with the
requirements themselves. Unfortunately, the development of
such models is hampered by the lack of datasets equipped
with formally specified requirements. A notable exception is
given by hierarchical multi-label classification (HMC) prob-
lems (see, e.g., [Vens et al., 2008]) in which datasets are pro-
vided with binary constraints of the form (AB)stating
that label Bmust be predicted whenever label Ais predicted.
Contact authors.
In this paper, we introduce multi-label classification prob-
lems with propositional logic requirements, in which datasets
are provided with requirements ruling out non-admissible
predictions and expressed in propositional logic. In this new
formulation, given a multi-label classification problem with
labels A,Band C, we can, for example, write the require-
ment:
(¬AB)C,
stating that for each datapoint in the dataset either the label
Cis predicted, or Bbut not Aare predicted. Obviously, any
constraint written for HMC problems can be represented in
our framework, and thus, our problem formulation represents
a generalisation of HMC problems.
Then, we present the ROad event Awareness Dataset with
logical Requirements (ROAD-R), the first publicly available
dataset for autonomous driving with requirements expressed
as logical constraints. ROAD-R extends the ROAD dataset
[Singh et al., 2021], which consists of 22 relatively long (
8minutes each) videos annotated with road events. A road
event corresponds to a tube, i.e., a sequence of frame-wise
bounding boxes linked in time. Each bounding box is labeled
with a subset of the 41 labels specified in Table 1. The goal is
to predict the set of labels associated to each bounding box.
We manually annotated ROAD-R with 243 constraints, each
verified to hold for each bounding box. A typical constraint is
thus “a traffic light cannot be red and green at the same time”,
while there are no constraints like “pedestrians should cross
at crossings”, which should always be satisfied in theory, but
which might not be in real-world scenarios.
Given ROAD-R, we considered 6 current state-of-the-art
(SOTA) models, and we showed that they are not able to learn
the requirements just from the data points, as more than 90%
of the times, they produce predictions that violate the con-
straints. Then, we faced the problem of how to leverage the
additional knowledge provided by constraints with the goal of
(i) improving their performance, measured by the frame mean
average precision (f-mAP) at intersection over union (IoU)
thresholds 0.5 and 0.75; see, e.g., [Kalogeiton et al., 2017;
Li et al., 2018]), and (ii) guaranteeing that they are compli-
ant with the constraints. To achieve the above two goals, we
propose the following new models:
1. CL models, i.e., models with a constrained loss allowing
them to learn from the requirements,
arXiv:2210.01597v2 [cs.LG] 5 Oct 2022
Agents Actions Locations
Pedestrian Move away AV lane
Car Move towards Outgoing lane
Cyclist Move Outgoing cycle lane
Motorbike Brake Incoming lane
Medium vehicle Stop Incoming cycle lane
Large vehicle Indicating left Pavement
Bus Indicating right Left pavement
Emergency vehicle Hazards lights on Right pavement
AV traffic light Turn left Junction
Other traffic light Turn right Crossing location
Overtake Bus stop
Wait to cross Parking
Cross from left
Cross from right
Crossing
Push object
Red traffic light
Amber traffic light
Green traffic light
Table 1: ROAD labels.
2. CO models, i.e, models with a constrained output en-
forcing the requirements on the output, and
3. CLCO models, i.e., models with both a constrained loss
and a constrained output.
In particular, we consider three different ways to build CL
(resp., CO, CLCO) models. More specifically, we run the 9×
6models obtained by equipping the 6 current SOTA models
with a constrained loss and/or a constrained output, and we
show that it is always possible to
1. improve the performance of each SOTA model, and
2. be compliant with (i.e., strictly satisfy) the constraints.
Overall, the best performing model (for IoU = 0.5 and also
IoU = 0.75) is CLCO-RCGRU, i.e., the SOTA model RC-
GRU equipped with both constrained loss and constrained
output: CLCO-RCGRU (i) always satisfies the requirements
and (ii) has f-mAP = 31.81 for IoU = 0.5, and f-mAP = 17.27
for IoU = 0.75. RCGRU, (i) produces predictions that violate
the constraints at least 92% of the times, and (ii) has f-mAP
= 30.78 for IoU = 0.5 and f-mAP = 15.98 for IoU = 0.75.
The main contributions of the paper thus are:
we introduce multi-label classification problems with
propositional logic requirements,
we introduce ROAD-R, which is the first publicly avail-
able dataset whose requirements are expressed in full
propositional logic,
we consider 6 SOTA models and show that on ROAD-
R, they produce predictions violating the requirements
more than 90% of the times,
we propose new models with a constrained loss and/or
constrained output, and show that in our new models,
it is always possible to improve the performance of the
SOTA models and satisfy the requirements.
.
Figure 1: Example of violation of ¬RedTL ∨ ¬GreenTL.
The rest of this paper is organized as follows. After the
introduction to the problem, we present ROAD-R (Section
3), followed by the evaluation of the SOTA models (Section
4) and of the SOTA models incorporating the requirements
(Section 5) on ROAD-R. We end the paper with the related
work (Section 6) and the summary and outlook (Section 7).
2 Learning with Requirements
In ROAD, the detection of road events requires the following
tasks: (i) identify the bounding boxes, (ii) associate with each
bounding box a set of labels, and (iii) form a tube from the
identified bounding boxes with the same labels. Here, we
focus on the second task, and we formulate it as a multilabel
classification problem with requirements.
Amulti-label classification (MC) problem P= (C,X)con-
sists of a finite set Cof labels, denoted by A1, A2, . . ., and a
finite set Xof pairs (x, y), where xRD(D1) is a
data point, and y C is the ground truth of x. The ground
truth yassociated with a data point xcharacterizes both the
positive and the negative labels associated with x, defined
to be yand A:A C \ y}, respectively. In ROAD-
R, a data point corresponds to a bounding box, and each
box is labeled with the positive labels representing (i) the
agent performing the actions in the box, (ii) the actions be-
ing performed, and (iii) the locations where the actions take
place. See Appendix A for a detailed description of each la-
bel. Consider an MC problem P= (C,X). A prediction pis
a set of positive and negative labels such that for each label
A∈ C, either Apor ¬Ap. A model mfor Pis a
function m(·,·)mapping every label Aand every datapoint
xto [0,1]. A datapoint xis predicted by mto have label
Aif its output value m(A, x)is greater than a user-defined
threshold θ[0,1]. The prediction of mfor xis the set
{A:A∈ C, m(A, x)> θ} ∪ {¬A:A∈ C, m(A, x)θ}of
positive and negative labels.
An MC problem with propositional logic requirements
(P,Π) consists of an MC problem Pand a finite set Πof con-
straints ruling out non admissible predictions and expressed
in propositional logic.
Consider an MC problem with requirements (P,Π). Each
requirement delimits the set of predictions that can be asso-
Statistics
|C| 41
|Π|243
avgrΠ(|r|)2.86
|{A∈ C :rΠ.A r}| 41
|{A∈ C :rΠ.¬Ar}| 38
minA∈C(|{rΠ : {A, ¬A} ∩ r6=∅}|)2
avgA∈C(|{rΠ : {A, ¬A} ∩ r6=∅}|)16.95
maxA∈C(|{rΠ : {A, ¬A} ∩ r6=∅}|)31
Table 2: Constraint statistics. All the constraints have between 2
and 15 positive and negative labels, with an average of 2.86. All
(resp., 38 of) the labels appear positively (resp., negatively) in Π.
Each label appears either positively or negatively between 2and 31
times in Π, with an average of 16.95.
ciated with each data point by ruling out those that violate it.
A prediction pis admissible if each constraint rin Πis sat-
isfied by p. A model mfor Psatisfies (resp., violates) the
constraints on a data point xif the prediction of mfor xis
(resp., is not) admissible.
Example 2.1 The requirement that a traffic light cannot be
both red and green corresponds to the constraint RedTL,
¬GreenTL}. Any prediction with {RedTL,GreenTL}is non-
admissible. An example of such prediction is shown in Fig. 1.
Given an MC problem with requirements, it is possible to
take advantage of the constraints in two different ways:
they can be exploited during learning to teach the model
the background knowledge that they express, and
they can be used as post-processing to turn a non-admis-
sible prediction into an admissible one.
Models in the first and second category are said to have a
constrained loss (CL) and constrained output (CO) respec-
tively. Constrained loss models have the advantage that the
constraints are deployed during the training phase, and this
should result in models (i) with a higher understanding of
the problem and a better performance, but still (ii) with no
guarantee that no violations will be committed. On the other
hand, constrained output models (i) do not exploit the addi-
tional knowledge during training, but (ii) are guaranteed to
have no violations in the final outputs. These two options are
not mutually exclusive (i.e., can be used together), and which
one is to be deployed depends also on the extent to which a
system is available. For instance, there can be companies that
already have their own models (which can be black boxes)
and want to make them compliant with a set of requirements
without modifying the model itself. On the other hand, the
exploitation of the constraints in the learning phase can be an
attractive option for those who have a good knowledge of the
model and want to further improve it.
3 ROAD-R
ROAD-R extends the ROAD dataset1[Singh et al., 2021]by
introducing a set Πof 243 constraints that specify the space
1All the code will be released upon publication. ROAD is avail-
able at: https://github.com/gurkirt/road-dataset.
n|Πn|avgrΠn(|r∩ C|)avgrΠn(|r∩ C|)
2 215 1.995 0.005
3 5 1 2
7 1 1 6
8 6 1 7
9 6 1 8
10 1 0 10
12 1 1 11
14 1 0 14
15 7 1 14
Total 243 1.87 0.96
Table 3: Constraint statistics. Πnis the set of constraints rin Πwith
|r|=n, i.e., with npositive and negative labels. C=A:A
C}. Each row shows the number of rules rwith |r|=n, and the
average number of negative and positive labels in such rules.
of admissible outputs. In order to improve the usability of our
dataset, we write the constraints in a way that allows us to eas-
ily express Πas a single formula in conjunctive normal form
(CNF). The above can be done without any loss in generality,
as any propositional formula can be expressed in CNF, and is
important because many solvers expects formulas in CNF as
input. Thus, each requirement in Πhas form:
l1l2 · · · ln,(1)
where n1, and each liis either a negative label ¬Aor a
positive label A. The requirements have been manually spec-
ified following three steps:
1. an initial set of constraints Π1was manually created,
2. a subset Π2Π1was retained by eliminating all those
constraints that were entailed by the others,
3. the final subset ΠΠ2was retained by keeping only
those requirements that were always satisfied by the
ground-truth labels of the entire ROAD-R dataset.
Finally, redundancy in the constraints has been automati-
cally checked with RELSAT2. Note that our process of gath-
ering and further selecting the logical requirements follows
more closely the software engineering paradigm rather than
the machine learning view. To this end, we ensured that the
constraints were consistent with the provided labels from the
ROAD dataset in the sense that they were acting as strict con-
ditions to be absolutely satisfied by the ground-truth labels, as
emphasized in the third step of the annotation pipeline above.
Tables 2 and 3 give a high-level description of the properties
of the set Πof constraints. Notice that, with a slight abuse
of notation, in the tables we use a set based notation for the
requirements. Each requirement of form (1) thus becomes
{l1, l2, . . . , ln}.
Such notation allows us to express the properties of the re-
quirements in a more succinct way. In addition to the infor-
mation in the tables, we report that of the 243 constraints,
there are two in which all the labels are positive (expressing
2https://github.com/roberto-bayardo/relsat/
(a) Percentage of predictions violating at
least one constraint.
(b) Average number of violations committed
per prediction.
(c) Percentage of constraints violated at least
once.
Figure 2: ROAD-R and SOTA models. In the x-axis, there is the threshold θ[0.1,0.9], step 0.1.
that there must be at least one agent and that every agent but
traffic lights has at least one location), and 214 in which all
the labels are negative (expressing mutual exclusion between
two labels). All the constraints with more than two labels
have at most one negative label, as they express a one-to-
many relation between actions and agents (like “if something
is crossing, then it is a pedestrian or a cyclist”). Constraints
like “pedestrians should cross at crossings”, which might not
be satisfied in practice, are not included. Additionally em-
bedding such logical constraints would require, e.g., using
modal operators and, while it would be an interesting study to
see the impact on the model’s predictions when adding more
complex layers to the expressivity of our logic, we opted for
using a simpler logic in this first instance. This also provides
more transparency to the wider research community, as the
full propositional logic covers a vast range of applications that
do not require extra logical operators. The list with all the
243 requirements, with their natural language explanations,
is in Appendix B, Tables 8, 9, and 10. Notice that the 243
requirements restrict the number of admissible prediction to
4985868 5×106, thus ruling out (241 4985868) 1012
non-admissible predictions.3In principle, the set of admissi-
ble predictions can be further reduced by adding other con-
straints. Indeed, the 243 requirements are not guaranteed to
be complete from every possible point of view: as standard
in the software development cycle, the requirement specifi-
cation process deeply involves the stakeholders of the system
(see, e.g., [Sommerville, 2011]). For example, we decided
not to include constraints like “it is not possible to both move
towards and move away”, which were not satisfied by all the
data points because of errors in the ground truth labels. In
these cases, we decided to dismiss the constraint in order to
maintain (i) consistency between the knowledge provided by
the constraints and by the data points, and (ii) backward com-
patibility.
As an additional point, we underline that, even though the
annotation of the requirements introduces some overhead in
the annotation process, it is also the case that the effort of
manually writing 243 constraints (i) is negligible when com-
pared to the effort of manually annotating the 22 videos, and
(ii) can improve such last process, e.g., allowing to prevent
errors in the annotation of the data points.
3The number of admissible predictions has been computed with
relsat: https://github.com/roberto-bayardo/relsat/.
4 ROAD-R and SOTA Models
As a first step, we ran 6 SOTA temporal feature learning ar-
chitectures as part of a 3D-RetinaNet model [Singh et al.,
2021](with a 2D-ConvNet backbone made of Resnet50 [He
et al., 2016]) for event detection and evaluated to which ex-
tent constraints are violated. We considered:
1. 2D-ConvNet (C2D) [Wang et al., 2018]: a Resnet50-
based architecture with an additional temporal dimen-
sion for learning features from videos. The extension
from 2D to 3D is done by adding a pooling layer over
time to combine the spatial features.
2. Inflated 3D-ConvNet (I3D) [Carreira and Zisserman,
2017]: a sequential learning architecture extendable
to any SOTA image classification model (2D-ConvNet
based), able to learn continuous spatio-temporal features
from the sequence of frames.
3. Recurrent Convolutional Network (RCN) [Singh and
Cuzzolin, 2019]: a 3D-ConvNet model that relies on
recurrence for learning the spatio-temporal features at
each network level. During the feature extraction phase,
RCNs exploit both 2D convolutions across the spatial
domain and 1D convolutions across the temporal do-
main.
4. Random Connectivity Long Short-Term Memory
(RCLSTM) [Hua et al., 2018]: an updated version
of LSTM in which the neurons are connected in a
stochastic manner, rather than fully connected. In our
case, the LSTM cell is used as a bottleneck in Resnet50
for learning the features sequentially.
5. Random Connectivity Gated Recurrent Unit (RCGRU)
[Hua et al., 2018]: an alternative version of RCLSTM
where the GRU cell is used instead of the LSTM one.
GRU makes the process more efficient with fewer pa-
rameters than the LSTM.
6. SlowFast [Feichtenhofer et al., 2019]: a 3D-CNN ar-
chitecture that contains both slow and fast pathways for
extracting the sequential features. A Slow pathway com-
putes the spatial semantics at low frame rate while a Fast
pathway processes high frame rate for capturing the mo-
tion features. Both of the pathways are fused in a single
architecture by lateral connections.
We trained 3D-RetinaNet4using the same hyperparameter
4https://github.com/gurkirt/3D-RetinaNets.
摘要:

ROAD-R:TheAutonomousDrivingDatasetwithLogicalRequirementsEleonoraGiunchiglia1,MihaelaC at alinaStoian1,SalmanKhan2,FabioCuzzolin2andThomasLukasiewicz3;11DepartmentofComputerScience,UniversityofOxford,UK2SchoolofEngineering,ComputingandMathematics,OxfordBrookesUniversity,UK3InstituteofLogicandCompu...

展开>> 收起<<
ROAD-R The Autonomous Driving Dataset with Logical Requirements.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:学术论文 价格:10玖币 属性:17 页 大小:4.6MB 格式:PDF 时间:2025-04-15

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注