Last year, we read in Technology Review that “a prototype computer vision system can generate a live text description of what's happening in a feed from a surveillance camera. Although not yet ready for commercial use, the system demonstrates how software could make it easier to skim or search through video or image collections.”
With this image to text system, one could easily search for specific content in a video without relying on the surrounding texts, which, if they are even there to begin with, may or may not be relevant to the video. It has the potential to become a powerful forensic tool. For instance, instead of watching hours of video footage of relentlessly boring urban spectacle of a boring urban intersection to find out when a red car important to your criminal investigation might have sped through, you type in “red car” in the search field.
Perhaps an incredibly more robust system equipped with a specialized database could be used to analyze urban spaces through all hours of the day and all seasons, conceivably in a multi-year project annotating and cataloguing every mundane happenings. William H. Whyte meets Andy Warhol. Set it up overlooking a new neighborhood park, and what it spits out after two years of observation gets published in Landscape Architecture Magazine. Landscape criticism by CCTV.
Alternatively, it could be submitted to Poetry magazine. The final summary reports are written in natural language, if prosaic, but the preliminary descriptions are oddly poetic, as if written in some esoteric metric, which it might as well be to the non-computer scientist.
Land_vehicle_359 approaches intersection_0 along road_0 at 57:27. It stops at 57.29.
Land_vehicle_360 approaches intersection_0 along road_3 at 57:31.
Land_vehicle_360 moves at an above-than-normal average speed of 26.5 mph in zone_4 (approach of road_3 to intersection_0) at 57:32. It enters intersection_0 at 57:32. It leaves intersection_0 at 57:34.
There is a possible failure-to-yield violation between 57:27 to 57:36 by Land_vehicle_360.
One wonders how the lines might read for a major disaster. Or how the system might have summarized the events at an intersection in the Chicago suburb of Elmhurst when a man tossed a puppy from his car.