PETS-ICVS Datasets

Warning: you are strongly advised to view the smart meeting specification file available here before downloading any data.  This will allow you to determine which part of the data is most appropriate for you.  The total size of the dataset is 5.9 Gb.

The JPEG images for the PETS-ICVS may be obtained from Or alternatively access the FTP site anonymously in the usual way, i.e.:

 Name: ftp
 ftp> binary
 ftp> prompt
 ftp> cd PETS-ICVS
 ftp> cd DVD1/data
 ftp> ...

Note that it is crucial to employ "binary" transfer.

You can also download all files under one directory using wget i.e.
wget -m
will download all people datasets to the current directory.  Please see  for more details.

Note: there appears to be some problems accessing the ftp site using Netscape. If you are having problems, please try using Internet Explorer instead, or access via direct ftp as shown above.

Important instructions are given at the bottom of this page on processing the datasets - please read these carefully.

Annotation of Datasets (Ground Truth)

The following annotations are available for the datasets:

1.  Eye positions of people in Scenarios A, B and D.  The format is described in the specification file link above, i.e.
image0001.jpg  3  left_eye_center_x left_eye_center_y right_eye_center_x right_eye_center_y
(where left and right are as seen by the camera, rather than the persons left/right eyes).
Image coordinates: the origin is in the top left.
Every 10th frame is annotated.
The annotation is available here.

2. Facial expression and gaze estimation for Scenarios A and D, Cameras 1-2.
The annotation is available here.

3. Gesture/action annotations for Scenarios B and D, Cameras 1-2.
The annotation is available here.

You are strongly encouraged to evaluate your results against the appropiate data above and report in your paper submission.

PETS-ICVS consists of datasets for a smart meeting.

Two views of the smart meeting room without participants.  The environment consists of three cameras: one mounted on each of two opposing walls, and an omnidirectional camera positioned at the centre of the room.

View from Camera 1.  Click on the background image for the full resolution version.

View from Camera 2.  Click on the background image for the full resolution version.

View from Camera 3.  Click on the background image for the full resolution version.

The measurements for the smart meeting room may be found in the following calibration file (Powerpoint).

The Task

The overall task is to automatically annotate the smart meeting.

The dataset consists of four scenarios A, B, C and D.

Each scenario consists of a number of separate sub-tasks.  For each frame, the requirement is to perform:

Note that it is not a requirement of the your paper to address all of the tasks stated above.  You may address one or more of the tasks in any of the scenarios.  For example, if you specialise in action recognition, you may wish to submit a paper which addresses this aspect alone, i.e. annotation on a frame-by-frame basis of actions performed within one of the scenarios.  Your annotation may be based on one or more of the 3 camera views used.

A full specification of the dataset is available here including details of scenarios A-D, list of actions/gestures and facial expressions.

The results in your paper can be based on any of the data supplied in the dataset.
The images may be converted to any other format as appropriate, e.g. subsampled, or converted to monochrome. All results reported in the paper should clearly indicate which part of the test data is used, ideally with reference to frame numbers where appropriate, e.g. Scenario B, ...

There is no requirement to report results on all the test data, however you are encouraged to test your algorithms on as much of the test data as possible.

The results must be submitted along with the paper, with the  results generated in XML format.

The paper that you submit may be based on previously published tracking methods/algorithms (including papers submitted to the main ICVS conference).  The importance is that your paper MUST report results using the datasets above.

You are strongly encouraged to evaluate your results against the ground truth given above and report in your paper submission.

The sequences have been provided by the consortium of Project FGnet (IST-2000-26434)
with additional support provided by the Swiss National Centre of Competence in Research (NCCR) on Interactive MultiModal Information Management (IM)2.  The NCCR is managed by the Swiss National Science Foundation on behalf of the Federal Authorities.

If you have any queries please email