We structure and understand the world's video