Amir Mazaheri
Computer Vision Research Scientist · PhD
Staff ML Engineer at Warner Bros. Discovery (HBO). Deep expertise in large-scale video understanding, Vision-Language Models, and multimodal AI.
About
I'm a Computer Vision Research Scientist with deep expertise in large-scale video understanding, Vision-Language Models (VLMs), and multimodal AI systems. I currently work as a Staff ML Engineer at Warner Bros. Discovery (HBO), where I lead video understanding and content moderation systems for the streaming platform.
I hold a PhD from UCF's Center for Research in Computer Vision (CRCV), advised by Prof. Mubarak Shah. My dissertation focused on Video Content Understanding Using Text. I have authored publications at CVPR, ICCV, ECCV, EMNLP, and AAAI, and hold multiple US patents.
Current focus
- Large-scale video understanding and temporal reasoning.
- Vision-Language Models (VLMs) for content moderation and in-video search.
- LLM-enhanced metadata generation for fine-grained video discovery.
- Production-scale multimodal AI — from research to real-world deployment.
Selected projects
MMFT-BERT: Multimodal Fusion for Video QA
2019 – 2020A multimodal fusion transformer with BERT encodings that achieves SOTA on the TVQA dataset.
Visual Text Correction
2017 – 2018A vision-and-language task for automatically detecting and correcting falsified words in video descriptions.