Vision Language Talks
Want to keep up with the rapidly evolving world of AI research? Join our reading group, where we collaborate with authors to delve deep into their work, promote their research, and explore potential collaborations. Many sessions include primers on various topics. With over 50 past sessions, you can explore our content on our YouTube channel and join our professional community on LinkedIn. Stay updated by signing up for our mailing list here.

Graph-enhanced Large Language Models
Author: Fangru Lin, DPhil NLP @UniversityOfOxford
TL;DR: This talk presents "Plan Like a Graph" (PLaG), a novel technique that integrates graphs with natural language prompts to enhance large language models' (LLMs) performance in asynchronous plan reasoning tasks. Despite improvements, challenges persist as task complexity increases, highlighting the limitations of current LLMs in simulating digital devices.
Keywords: Graph-enhanced LLMs, asynchronous planning, Plan Like a Graph (PLaG)

Computationally Budgeted Continual Learning
Author: Ameya Prabhu PhD@Torr Vision Group, UniversityOfOxford
TL;DR: This presentation explores continual learning under computational constraints, emphasizing the need for models to adapt to new data streams efficiently. The study reveals that traditional continual learning methods struggle when computational budgets are limited, underscoring the importance of developing strategies that balance performance with resource constraints.
Keywords: Continual learning, computational budget, data streams, model adaptation, resource constraints

Learning Object Recognition with Rich Language Descriptions
Author: Liunian Li, PhD@UCLA
TL;DR: This talk discusses methods for enhancing object recognition systems by incorporating detailed language descriptions, aiming to improve model accuracy and robustness in understanding visual content.
Keywords: Object recognition, language descriptions, visual understanding, model accuracy

Memory-Economic Continual Test-TimeAdaptation
Author: Junyuan Hong, PhD @MichiganStateUni, Intern@SonyAI
TL;DR: The presentation introduces approaches for continual test-time adaptation that are memory-efficient, enabling models to adapt to new data without extensive computational resources.
Keywords: Continual learning, test-time adaptation, memory efficiency, model adaptability

On the Impact of Estimating Example Difficulty
Author: Chirag Agarwal, Research Scientist@Adobe
TL;DR: This talk examines how assessing the difficulty of training examples can influence model training and performance, providing insights into curriculum learning strategies.
Keywords: Example difficulty estimation, curriculum learning, model training, performance optimization

SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Author: Omiros Pantazis PhD @UCL.
TL;DR: This work introduces SVL-Adapter, a method that enhances vision-language models like CLIP by integrating self-supervised learning. This approach improves classification accuracy, especially in low-shot settings, by combining the strengths of vision-language pretraining with self-supervised representation learning.
Keywords: Vision-language models, self-supervised learning, low-shot learning, model adaptation

Master of All : Simultaneous Generalization of Urban-Scene Segmentation
Author: Nikhil Reddy @PhD student at UQ-IIT Delhi Research Academy
TL;DR: The "Master of ALL" (MALL) technique is a test-time adaptation method designed to improve semantic segmentation of urban scenes across various adverse weather conditions. MALL updates pre-trained models during inference to enhance performance without requiring access to source data.
Keywords: Semantic segmentation, test-time adaptation, adverse weather conditions, domain generalization

Contrastive Test-Time Adaptation
Author: Dian Chen @ToyotaResearchInstitute
TL;DR: This work introduces AdaContrast, a novel approach to test-time adaptation using self-supervised contrastive learning combined with an online pseudo-labeling scheme. It enhances target feature learning and achieves state-of-the-art performance on major benchmarks, offering benefits like memory efficiency and better model calibration.
Keywords: Test-Time Adaptation, Contrastive Learning, Pseudo-Labeling, Domain Adaptation, Self-Supervised Learning

Spatio-temporal Relation Modeling for Few-shot Action Recognition
Author: Anirudh Thatipelli, PhD @CRCV, UCF
TL;DR: The authors propose STRM, a framework that enhances class-specific feature discriminability while learning higher-order temporal representations. It introduces a spatio-temporal enrichment module that aggregates spatial and temporal contexts and achieves state-of-the-art results on few-shot action recognition benchmarks.
Keywords: Few-Shot Learning, Action Recognition, Spatio-Temporal Modeling, Feature Enrichment, Temporal Relations

Margin-based Label Smoothing for Network Calibration, CVPR 2022
Author: Bingyuan Liu @Amazon
TL;DR: This work presents a margin-based label smoothing technique aimed at improving network calibration. By adjusting the label smoothing process based on class margins, the method enhances the reliability of model predictions.
Keywords: Label Smoothing, Network Calibration, Margin-Based Techniques, Model Reliability

Role Of Shannon Entropy As A Regularizer Of DeepNNs
Author: Prof. Jose Dolz, École de technologie supérieure (ETS) Montreal
TL;DR: This talk explores the application of Shannon Entropy as a regularization technique in deep neural networks, aiming to improve model robustness and generalization by managing uncertainty during training.
Keywords: Shannon Entropy, regularization, deep neural networks, model robustness

Self-Supervising Occlusions For Vision
Author: Dinesh Reddy, PhD @CMU Robotics
TL;DR: This talk addresses the challenges posed by occlusions in visual scenes and introduces self-supervised methodologies to predict and handle occluded regions using multi-view supervision and longitudinal data.
Keywords: Occlusions, Self-Supervised Learning, Multi-View Supervision, Longitudinal Data

Mathematical Models of Brain Connectivity and Behavior
Author: Niharika S. D’Souza @IBM Research, Almaden
TL;DR: This presentation explores the use of mathematical and machine learning models to link the structural and functional organization of the brain with behavioral patterns, aiming to predict clinical severity from neuroimaging data.
Keywords: Brain Connectivity, Behavior Prediction, Machine Learning, Neuroimaging, Clinical Severity

Mixture-Based Feature Space Learning for Few-Shot Classification
Author: Arman Afrasiyabi @MILA
TL;DR: This talk introduces a mixture-based approach to feature space learning, enhancing the performance of few-shot classification tasks by effectively modeling complex data distributions.
Keywords: Few-Shot Classification, Feature Space Learning, Mixture Models, Machine Learning

Generalized and Incremental Few-Shot Learning by Explicit Learning & Calibration without Forgetting
Author: Anna Kukleva @PhD at Computer Vision and Machine Learning department at Max Plank Institute for Informatics
TL;DR: What is few-shot learning? What is generalized few-shot learning? What are the difficulties? Our framework to address these difficulties? Extension to incremental learning?
Keywords: Few-shot learning, incremental learning, model calibration, catastrophic forgetting
![Discriminative Region-based Multi-Label Zero-Shot Learning [ICCV 2021] Akshita Gupta @IIAI](https://img.youtube.com/vi/0MZxWozdRiM/hqdefault.jpg)
Discriminative Region-based Multi-Label Zero-Shot Learning
Author: Akshita Gupta @IIAI
TL;DR: The talk presents a discriminative approach to region-based multi-label zero-shot learning, enabling models to recognize and localize multiple unseen classes simultaneously by leveraging region-level features and semantic embeddings.
Keywords: Zero-shot learning, multi-label classification, region-based learning, semantic embeddings

PAWS : Semi-Supervised Learning of Visual Features
Author: Mido Assran @Facebook AI Research (FAIR) and Mila – Quebec AI Institute.
TL;DR: Propose PAWS, a novel method of learning, extending the distance-metric loss used in self-supervised methods such as BYOL and SwAV to a semi-supervised setting Set new state-of-the-art for ResNet-50 on ImageNet trained with either 10% or 1% of the labels, reaching 75% and 66% top-1 respectively (achieved with 4x — 12x less training) Match performance of fully supervised learning with bigger networks, while using 10x fewer labels
Keywords: Semi-Supervised Learning, Self-Supervised Learning, PAWS, Contrastive Learning, Metric Learning, ImageNet, ResNet-50, BYOL, SwAV, Few-Label Training

Using Progressive Context Encoders for Anomaly Detection
Author: Quincy Gu @Mayo Clinic
TL;DR: label-free anomaly detection pipeline
Keywords: Anomaly detection, context encoders, unsupervised learning

What Can We Learn From Subtitled Sign Language Data? Gül Varol, Asst. Prof@École des Ponts ParisTech
Author: Gül Varol, Assistant Professor at École des Ponts ParisTech
TL;DR: This talk explores the insights and opportunities presented by subtitled sign language data, focusing on how such data can be utilized to improve sign language recognition and translation systems.
Keywords: Sign language recognition, subtitled data, language translation, multimodal learning

ViTGAN : Training GANs with Vision Transformers
Author: Paper Discussion with the Author
TL;DR: The presentation introduces ViTGAN, a novel approach that integrates Vision Transformers into Generative Adversarial Networks to enhance image generation quality by capturing long-range dependencies in visual data.
Keywords: Vision Transformers, Generative Adversarial Networks, image generation

Federated Learning in Vision Tasks
Author: Umberto Michieli, PhD@Uni of Padova, Intern@Samsung Research
TL;DR: Federated Learning (FL) enables distributed model training across decentralized data sources, addressing privacy concerns and data heterogeneity. This talk explores FL's application to computer vision tasks, highlighting challenges like system and statistical heterogeneity, and introduces methods such as FedProto, which leverages prototypical representations to enhance federated optimization.
Keywords: Federated Learning, Distributed Training, Prototypical Representations, Data Privacy

PLOP : Learning continuously without forgetting for Continual SemSeg
Author: Arthur Douillard @DeepMind
TL;DR: Continual Semantic Segmentation (CSS) involves updating models to recognize new classes without forgetting previously learned ones. PLOP addresses challenges like catastrophic forgetting and background shift by introducing a multi-scale pooling distillation scheme and an entropy-based pseudo-labeling strategy, significantly improving performance in CSS scenarios.
Keywords: Continual Learning, Semantic Segmentation, Catastrophic Forgetting, Background Shift, Pseudo-Labeling

An Identifiability Perspective on Representation Learning
Author: Yash Sharma, PhD@(MPI-IS)
TL;DR: This talk delves into the identifiability aspects of representation learning, discussing conditions under which learned representations can be considered identifiable and the implications for model robustness and interpretability.
Keywords: Representation Learning, Identifiability, Model Robustness, Interpretability

Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams
Author: Matthias De Lange, PhD @KU Leuven
TL;DR: The talk addresses the challenge of learning from non-stationary data streams, proposing a method for continual prototype evolution to adapt models online without catastrophic forgetting.
Keywords: Continual Learning, Non-Stationary Data, Prototype Evolution, Online Learning

SeqNet: Learning Descriptors for Hierarchical Place Recognition
Author: Sourav Garg, PostDoc @QUT
TL;DR: SeqNet introduces a method for learning descriptors tailored for hierarchical place recognition, enhancing the accuracy and efficiency of localization systems in robotics and autonomous vehicles.
Keywords: Place Recognition, Descriptor Learning, Hierarchical Localization, Robotics

Scale Equivariant Siamese Tracking
Author: Ivan Sosnovik & Artem Moskalev
TL;DR: This talk presents a scale-equivariant Siamese network for object tracking, ensuring that the model's predictions are consistent across different scales, thereby improving tracking robustness.
Keywords: Object Tracking, Scale Equivariance, Siamese Networks

Conformal Inference of Counterfactuals and Individual Treatment effects(Stanford)
Author: Lihua Lei, Post Doc at Stanford University
TL;DR: The presentation explores conformal inference methods for estimating counterfactuals and individual treatment effects, providing a framework for making reliable causal inferences in observational studies.
Keywords: Conformal Inference, Counterfactuals, Treatment Effects, Causal Inference

Self-Supervised Few-Shot Learning on Point Clouds
Author: Charu Sharma, PhD @IIT Hyderabad
TL;DR: The talk introduces a self-supervised approach to few-shot learning on point clouds, enabling models to learn representations without extensive labeled data, which is particularly beneficial for 3D vision tasks.
Keywords:Self-Supervised Learning, Few-Shot Learning, Point Clouds, 3D Vision

Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images
Author:Aayush Tyagi, PhD @IIT-Delhi
TL;DR: This presentation discusses techniques for unsupervised domain adaptation in the context of semantic segmentation of Near-Infrared (NIR) images, addressing the challenges posed by domain shifts between visible and NIR spectra.
Keywords: Unsupervised Domain Adaptation, Semantic Segmentation, NIR Imaging, Domain Shift

GOCor : Bringing Globally Optimized Correspondence Volumes into Your Neural Network
Author: Prune Truong, PhD Student at the Computer Vision Lab of ETH Zurich
TL;DR: GOCor introduces a novel approach to correlation-based methods in computer vision, enhancing the performance of tasks like object tracking and alignment by learning optimal correlation filters.
Keywords: Correlation Filters, Object Tracking, GOCor

Contrastive Learning of Global & Local Features
Author: Krishna Chaitanya, PhD @Computer Vision Lab, ETH Zurich
TL;DR: This talk presents strategies to extend the contrastive learning framework for segmentation of volumetric medical images in semi-supervised settings with limited annotations, leveraging domain-specific and problem-specific cues.
Keywords: Contrastive learning, medical image segmentation, semi-supervised learning, global and local features

Forecasting Characteristic 3D Poses of Human Actions
Author: Christian Diller, PhD @3D AI Lab at the Technical University of Munich (TUM)
TL;DR: Introduces a probabilistic approach to predict future characteristic 3D poses from short sequence observations, aiming for goal-oriented understanding of human actions.
Keywords: 3D pose forecasting, human motion prediction, probabilistic modeling

A Broad Overview of Social Robotics
Author: Chinmay Mishra, Marie Skłodowska-Curie Actions ITN Fellow
TL;DR: This talk provides a comprehensive overview of social robots, discussing their history, physical characteristics, principles to enhance positive public perception, potential threats arising with ubiquity, and their impacts in various industries.
Keywords: Social robots, human-robot interaction, ethical considerations, public perception, industry applications

Improving Autonomous Driving Pipeline using Graph Neural Networks
Author: Xinshuo Weng, PhD @Robotics Institute - Carnegie Mellon University
TL;DR: This talk explores the application of graph neural networks to enhance the autonomous driving pipeline, focusing on improving perception, prediction, and planning modules by effectively modeling the relationships between various entities in a driving scenario.
Keywords: Autonomous Driving, Graph Neural Networks (GNNs), Perception, Prediction, Planning, Scene Understanding, Object Interactions, Spatiotemporal Modeling, Motion Forecasting, Robotics, Deep Learning

End to end accelerated MRI acquisition and processing with deep learning
Author: Francesco Caliva @Amazon
TL;DR: This talk explores the application of deep learning techniques to accelerate MRI acquisition and processing, aiming to improve efficiency and accuracy in medical imaging workflows.
Keywords: MRI, Deep Learning, Medical Imaging, Accelerated Imaging, Image Processing

Introduction to Representation learning: Approaches, Challenges and Applications
Author: Shuyu Lin @University of Oxford
TL;DR: This talk introduces the fundamentals of representation learning, covering various approaches, addressing challenges such as interpretability and generalization, and highlighting applications across different domains.
Keywords: Representation learning, machine learning, interpretability, generalization, applications

Wasserstein Distances for Stereo Disparity Estimation
Author: PhD @Stanford University
TL;DR:The fact that this distribution is usually learned indirectly through a regression loss causes further problems in ambiguous regions around object boundaries. We address these issues using a new neural network architecture that is capable of outputting arbitrary depth values, and a new loss function that is derived from the Wasserstein distance between the true and the predicted distributions.
Keywords: Stereo disparity estimation, Wasserstein distances, depth perception

Learning a Neural Solver for Multiple Object Tracking
Author: Guillem Brasó, PhD @Dynamic Vision and Learning group (DVL), Technical University of Munich (TUM)
TL;DR: This talk introduces a novel approach to Multiple Object Tracking (MOT) by leveraging Message Passing Networks (MPNs) within a differentiable framework. By directly operating on graph structures, the method enables global reasoning over detections, leading to improved data association and tracking performance.
Keywords: Multiple Object Tracking, Message Passing Networks, Graph Neural Networks, Data Association, Differentiable Framework

Black Magic in Deep Learning: How Human Skill Impacts Network Training
Author: Dr. Jan van Gemert, Head of the Computer Vision Lab, Delft University of Technology
TL;DR: This study investigates the subjective human factors in deep learning, focusing on how a user's prior experience impacts the accuracy of network training. Based on a study with 31 participants of varying experience levels, the results show a strong positive correlation between experience and performance, with experienced participants finding better solutions using fewer resources.
Keywords: Deep Learning, Human Factors, Network Training, Hyperparameter Optimization, Machine Learning

Sign Language Translation with Transformers
Author: Kayo Yin, Master's @Language Technologies Institute - Carnegie Mellon University
TL;DR: Sign Language Translation (SLT) first uses a Sign Language Recognition (SLR) system to extract sign language glosses from videos. Then, a translation system generates spoken language translations from the sign language glosses. This paper focuses on the translation system and introduces the STMC-Transformer.
Keywords: Sign Language Translation, Transformers, Sign Language Recognition (SLR), Natural Language Processing (NLP), Multimodal Learning, Gloss-Based Translation, Sequence-to-Sequence Models, Accessibility

C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis
Author:Joseph K J, PhD @IIT Hyderabad
TL;DR: This talk introduces C4Synth, a framework for text-to-image synthesis that ensures consistency between generated images and their corresponding captions through a cycle-consistent approach.
Keywords:Text-to-image synthesis, cycle-consistency, generative models

'AdvPC: Transferable Adversarial Perturbations' , 'Towards Analyzing Semantic Robustness of DeepNN'
Author: PhD @KAUST Computer Vision Lab (IVUL)
TL;DR:These presentations delve into creating adversarial perturbations that transfer across models and analyzing the semantic robustness of deep neural networks, highlighting vulnerabilities and proposing mitigation strategies.
Keywords:Adversarial perturbations, semantic robustness, deep neural networks, security

Attributional Robustness Training using Input-Gradient Spatial Alignment
Author: Puneet Mangla, Undergrad @IIT Hyderabad
TL;DR: This talk addresses the vulnerability of neural network explanations to imperceptible input perturbations. The authors propose a training methodology that enhances attributional robustness by aligning input gradients spatially with the original image, utilizing a soft-margin triplet loss. Their approach not only improves the stability of attribution maps but also enhances performance in weakly supervised object localization tasks.
Keywords: Attributional Robustness, Interpretability, Neural Networks, Input-Gradient Alignment, Soft-Margin Triplet Loss, Weakly Supervised Object Localization

Siam R-CNN - Visual Tracking by Re-Detection
Author:Jonathon Luiten, PhD @ RWTH Aachen + Carnegie Mellon + Uni Oxford, Research Scientist at Meta Reality Labs
TL;DR:This presentation introduces Siam R-CNN, a framework that combines Siamese networks with region-based convolutional neural networks for robust visual tracking through re-detection mechanisms.
Keywords: Visual tracking, Siamese networks, R-CNN, re-detection

Full-Body Awareness from Partial Observations
Author: Chris Rockwell, PhD @University of Michigan
TL;DR: This talk explores methods to achieve full-body awareness in systems from partial observations, enhancing the understanding and prediction of human poses in various applications.
Keywords: Full-body awareness, partial observations, human pose estimation

Real- Time Sign Language Detection for Video Conferencing Applications.
Author: Amit Moryossef, PhD @Bar-Ilan University + Intern @Google Zurich
TL;DR: This presentation discusses the development of real-time sign language detection systems tailored for video conferencing platforms, aiming to improve accessibility and communication.
Keywords: Sign language detection, real-time systems, video conferencing, accessibility

Butterflies in Hyperbolic Space : Leveraging Label Hierarchy to Improve Image Classification
Ankit Dhall, Graduate Student, Robotic Systems and Control @ETH Zurich TBD
TL;DR:The paper proposes methods to enhance image classification by incorporating the semantic hierarchy of class labels. The authors introduce order-preserving embeddings, utilizing both Euclidean and hyperbolic geometries, to model label-label and label-image interactions. These approaches are validated on the ETHEC dataset, demonstrating improved performance over hierarchy-agnostic models.
Keywords: Hyperbolic Space, Label Hierarchy, Image Classification, Order-Preserving Embeddings, Euclidean Geometry, Hierarchical Learning, Semantic Relationships, ETHEC Dataset

GradSLAM: Differentiable Dense SLAM
Author: Krishna Murthy Jatavallabhula, Ganesh Iyer, and Liam Paull
TL;DR: gradSLAM is a fully differentiable dense simultaneous localization and mapping (SLAM) framework that integrates gradient-based learning with SLAM systems. By making SLAM components differentiable, it allows for end-to-end optimization, enabling gradients to flow from 3D maps back to 2D sensor inputs. :contentReference[oaicite:0]{index=0}
Keywords: Differentiable SLAM, Automatic Differentiation, Dense Mapping, Gradient-Based Learning, Computational Graphs

Mish: A Self Regularized Non-Monotonic Activation Function
Author: Diganta Misra
TL;DR: Mish is a novel neural activation function defined as f(x) = x * tanh(softplus(x)). It is smooth, continuous, and non-monotonic, offering advantages over traditional functions like ReLU and Swish. Empirical results demonstrate that networks utilizing Mish achieve higher accuracy and better generalization across various benchmarks. :contentReference[oaicite:1]{index=1}
Keywords: Activation Function, Neural Networks, Deep Learning, Non-Monotonic Function, Smooth Activation

Hand-Object Contact During Grasping: Capture, Analysis and Applications
Author: Samarth Brahmbhatt, Postdoctoral researcher at Intelligent Systems lab at Intel in Santa Clara
TL;DR: This talk focuses on the methodologies for capturing and analyzing hand-object contact during grasping tasks. It explores the applications of this analysis in improving robotic manipulation and understanding human grasping behaviors.
Keywords: Hand-Object Interaction, Grasping, Contact Analysis, Robotic Manipulation, Human-Computer Interaction

Improving Machine Vision using Human Perceptual Representations
Author: Pramod RT, Postdoctoral Researcher @MIT
TL;DR: This presentation discusses leveraging human perceptual representations to enhance machine vision systems. By integrating insights from human perception, the talk highlights strategies to improve the accuracy and robustness of computer vision models.
Keywords: Machine Vision, Human Perception, Perceptual Representations, Visual Processing

Learning Data Augmentation Using Online BiLevel Data Optimisation for Image Classification
Author: Issam Laradji @ElementAI/ServiceNow Research + Saypraseuth Mounsaveng, PhD @ETS Montreal
TL;DR: The talk introduces an online bi-level optimization approach to learn data augmentation policies for image classification tasks. This method aims to enhance model generalization by optimizing augmentation strategies during training.
Keywords: Data Augmentation, Bi-Level Optimization, Image Classification, Machine Learning, Model Generalization
Graduate Studies Series

Masters (MS) in CS in USA | MITACS Globalink Research Internship | MS in NYU
Author: Sidharth Purohit
TL;DR: This webinar provides insights into pursuing a Master's degree in Computer Science in the USA, covering application processes, program structures, and opportunities available to students.
Keywords: MS in Computer Science, USA, Graduate Studies, Application Process, Program Structure

Graduate School Applications | Fully Funded Masters + PhD | IELTS vs TOEFL | Statement of Purpose
Author: Raman Dutt
TL;DR: This session offers guidance on the graduate school application process, including tips on crafting compelling statements of purpose, securing strong recommendation letters, and avoiding common pitfalls.
Keywords: Graduate Applications, Statements of Purpose, Recommendation Letters, Application Tips

Research Scholarships for Undergrads, DAAD WISE (Uni of Hamburg), IAS SRFP, Job as SDE @Microsoft
Author: Shravan Nayak
TL;DR: This webinar discusses various research scholarships available for undergraduates, such as DAAD WISE and IAS SRFP, and shares insights into securing positions like Software Development Engineer at Microsoft.
Keywords: Undergraduate Research Scholarships, DAAD WISE, IAS SRFP, Microsoft SDE, Career Opportunities

PhD in USA | Fully Funded PhD @UCSC | Cold Mailing | Research Internship @GeorgiaTech, IISc
Author: Sai Siddartha Maram
TL;DR: This webinar provides an overview of pursuing a PhD in the USA, discussing application strategies, funding opportunities, and insights into the life of a doctoral student.
Keywords: PhD Programs, USA, Doctoral Studies, Application Strategies, Funding Opportunities