
Soccer is one of the world’s most-watched sports and is covered in several languages around the globe. IBM has developed an AI that can potentially look at live streaming video and develop relevant commentary. While it doesn’t use an emotive text-to-voice system as yet, it does draw from statistics of the players and the teams in play to give a real-time, colorful analysis of what’s happening on the pitch.
Highlight Videos use Similar technology
IBM’s attempt at teaching the AI how to comment on soccer games is based on an automated highlight system that extracts exciting moments from a game to turn into highlight clips. The AI, as opposed to the highlight system, uses real-time tracking of players on the field to generate insightful commentary. The company is currently the official tech partner for both the Masters PGA golf tournament and Grand Slam tennis competitions like Wimbledon.
Using Existing Commentary to Train the AI
The AI commentator was trained using a series of around fifty video clips, drawing on the expertise of their commentary. It uses an end-to-end system, meaning that the system uses live video as an input, and attempts to develop a play-by-play commentary for output based on what it can predict will happen next. Soccer fans know the game itself is unpredictable. Still, the individual actions within a single match (especially over a short period as covered by play-by-play commentary) are a lot easier to predict.
Not Perfect Yet
The system’s competency was tested recently as a demonstration during the Neural Information Processing Systems (NeurIPS) artificial intelligence conference. The AI performed relatively well, although it did repeat a few phrases. However, that particular shortcoming may be worked out by adding more options for the AI to choose from for output. The system also misjudged one regular pass as a cross-field kick, but with enough training, it may be able to anticipate plays better. Despite that, it would still need proper emotional commentary and voice modulation to copy the intensity of ncurrent human commentators.