We predict attention second by second from the scene's beats, and flag the exact moments people drop off. AI-simulated, not measured from real viewers.