Responsibilities
- Conduct original research, develop novel methods and advance state of the art to enable AI systems to perceive, understand, and reason about human activities, behaviors, interactions, and intentions.
- Investigate multimodal representation learning, alignment, temporal reasoning, and long-horizon understanding of human activities and experiences.
- Develop methods for modeling latent human states, such as intent, goals, beliefs, attention, and other unobservable factors that influence behavior.
- Design and execute rigorous experiments, benchmarking studies, and ablation analyses to validate research hypotheses.
- Contribute to design, collection and annotation of new impactful video datasets.
- Collaborate with interdisciplinary teams of scientists and engineers to translate research advances into impactful AI systems.
- Publish research findings at leading AI, machine learning, and computer vision conferences and journals.
- Contribute to research strategy and intellectual property development.
Basic qualifications
- Ph.D. in Computer Science, Electrical Engineering, Cognitive Science, Computational Neuroscience, or a related field.
- Strong publication record at leading AI, machine learning, or computer vision venues such as CVPR, ICCV, ECCV, NeurIPS, ICLR, AAAI, or equivalent.
- Expertise in generative models and latent-variable learning, including Variational Autoencoders (VAEs), Diffusion Models, Generative Adversarial Networks (GANs), or related approaches.
- Strong foundation in machine learning, deep learning, and modern AI methodologies.
- Research experience in video understanding, activity recognition, and temporal modeling.
- Expertise in multimodal representation learning and alignment, vision and text encoders, and large-scale video datasets.
- Excellent communication, presentation, and collaboration skills.
- 1 - 3 years of relevant work experience.
Preferred qualifications
- Research expertise in long-range action understanding, including action segmentation, temporal alignment, action anticipation and other long-horizon activity modeling.
- Experience developing methods for inferring human activities, intentions, or future actions from incomplete, ambiguous, or partially observed data.
- Experience with human pose estimation, hand pose estimation, or their application to activity understanding.
- Research background of multimodal models, including Vision-Language Models (VLMs), Multimodal Large Language Models (MLLMs), and related architectures.
- Understanding of multimodal training methodologies, architectures, adapters, objectives, and data curation strategies.
- Experience in modeling latent human states, intentions, goals, beliefs, attention, or other unobservable factors that influence behavior.
- Demonstrated ability to initiate and lead impactful research projects.
Tags & focus areas
Used for matching and alerts on DevFound Ai