Jump to content

Large Scale Concept Ontology for Multimedia

From Wikipedia, the free encyclopedia

The Large-Scale Concept Ontology for Multimedia project was a series of workshops held from April 2004 to September 2006[1] for the purpose of defining a standard formal vocabulary for the annotation and retrieval of video.

Mandate

[edit]

The Large-Scale Concept Ontology for Multimedia project was sponsored by the Disruptive Technology Office and brought together representatives from a variety of research communities, such as multimedia learning, information retrieval, computational linguistics, library science, and knowledge representation, as well as "user" communities such as intelligence agencies and broadcasters, to work collaboratively towards defining a set of 1,000 concepts.[2] Individually, each concept was to meet the following criteria:[3]

  • Utility: the concepts must support realistic video retrieval problems
  • Feasibility: the concepts are capable or will be capable of detection given the near-term (5 year projected) state of technology
  • Observibility: the concepts occur with relatively high frequency in actual video data sets

Jointly, these concepts were to meet the additional criterion of providing broad (domain independent) coverage.[3] High-level target areas for coverage included physical objects, including animate objects (such as people, mobs, and animals), and inanimate objects, ranging from large-scale (such as buildings and highways) to small-scale (such as telephones and appliances); actions and events; locations and settings; and graphics. The effort was led by Dr. Milind Naphade, who was the principal investigator along with researchers from Carnegie Mellon University, Columbia University, and IBM.[1]

Development tracks

[edit]

The project had two main "tracks": the development and deployment of keyframe annotation tools (performed by CMU and Columbia), and the development of the Large-Scale Concept Ontology for Multimedia concept hierarchy itself. The second track was executed in two phases: The first consisted in the manual construction of an 884 concept hierarchy, was performed collaboratively among the research and user community representatives.

The second track, performed by knowledge representation experts at Cycorp, Inc., involved the mapping of the concepts into the Cyc knowledge base and the use of the Cyc inference engine to semi-automatically refine, correct, and expand the concept hierarchy. The mapping/expansion phase of the project was motivated by a desire to increase breadth—the mapping had the effect of moving from 884 concepts to well past the initial goal of 1000—and to move Large-Scale Concept Ontology for Multimedia from a one-dimensional hierarchy of concepts, to a full-blown ontology of rich semantic connections.[3]

Project results

[edit]

The outputs of the effort included:[1]

  1. A "lite" version of the Large-Scale Concept Ontology for Multimedia concept hierarchy consisting of a subset of 449 concepts.
  2. A corpus of 61,901 video keyframes, taken from the 2006 TRECVID data set, annotated using Large-Scale Concept Ontology for Multimedia "lite."
  3. The full taxonomy of 2,638 concepts, built semi-automatically by mapping 884 concepts, manually identified by collaborators, into the Cyc knowledge base, and querying the Cyc inference engine for useful additions.
  4. The full ontology, in the form of a 2006 ResearchCyc release that contained the Large-Scale Concept Ontology for Multimedia mappings into the Cyc ontology.

Public detectors

[edit]

Several sets of concept detectors were developed and released for public use:

  1. VIREO-374, 374 detectors developed by City University of Hong Kong.
  2. Columbia374, 374 detectors developed by Columbia University.
  3. Mediamill101, 101 detectors developed by The University of Amsterdam.

Use in the larger research community

[edit]

Since its release, Large-Scale Concept Ontology for Multimedia has begun to be used successfully in visual recognition research: Apart from research done by project participants, it has been used by independent research in concept extraction from images,[4][5] and has served as the basis for a video annotation tool.[6]

See also

[edit]

References

[edit]
[edit]