About
I grew up in Jiaganj on the Ganges. My mom became a…
Articles by Arjun
Activity
-
I just crossed a milestone that's been months in the making. Defense contractors lose DoD contracts because compliance tools can't handle their…
I just crossed a milestone that's been months in the making. Defense contractors lose DoD contracts because compliance tools can't handle their…
Liked by Arjun Jain
-
NeurIPS 2025 starts December 2nd. In 2024, 🇨🇳 China silently crossed into every single domain—CV, NLP, ethics, GNNs, RL, and robotics. Let's see…
NeurIPS 2025 starts December 2nd. In 2024, 🇨🇳 China silently crossed into every single domain—CV, NLP, ethics, GNNs, RL, and robotics. Let's see…
Shared by Arjun Jain
-
The Journey, The Lessons, The Becoming — A Designer’s Story PART 1: The First Spark I didn’t grow up dreaming of being an interior designer. There…
The Journey, The Lessons, The Becoming — A Designer’s Story PART 1: The First Spark I didn’t grow up dreaming of being an interior designer. There…
Liked by Arjun Jain
Experience & Education
Publications
-
VRU Pose-SSD: Multiperson pose estimation for automated driving
Proceedings of the AAAI Conference on Artificial Intelligence
We present a fast and efficient approach for joint person detection and pose estimation optimized for automated driving (AD) in urban scenarios. We use a multitask weight sharing architecture to jointly train detection and pose estimation. This modular architecture allows us to accommodate different downstream tasks in the future. By systematic large-scale experiments on the Tsinghua-Daimler Urban Pose Dataset (TDUP), we obtain multiple models with varying accuracy-speed trade-offs. We then…
We present a fast and efficient approach for joint person detection and pose estimation optimized for automated driving (AD) in urban scenarios. We use a multitask weight sharing architecture to jointly train detection and pose estimation. This modular architecture allows us to accommodate different downstream tasks in the future. By systematic large-scale experiments on the Tsinghua-Daimler Urban Pose Dataset (TDUP), we obtain multiple models with varying accuracy-speed trade-offs. We then quantize and optimize our network for deployment and present a detailed analysis of the efficacy of the algorithm. We introduce a two-stage evaluation strategy, which is more suitable for AD and achieve a significant performance improvement in comparison to state-of-the-art approaches. Our optimized model runs at 52~ fps on full HD images and still reaches a competitive performance of 32.25~ LAMR. We are confident that our work serves as an enabler to tackle higher-level tasks like VRU intention estimation and gesture recognition, which rely on stable pose estimates and will play a crucial role in future AD systems.
Other authorsSee publication -
Multiview-consistent semi-supervised learning for 3d human pose estimation
Proceedings of the ieee/cvf conference on computer vision and pattern recognition
The best performing methods for 3D human pose estimation from monocular images require large amounts of in-the-wild 2D and controlled 3D pose annotated datasets which are costly and require sophisticated systems to acquire. To reduce this annotation dependency, we propose Multiview-Consistent Semi Supervised Learning (MCSS) framework that utilizes similarity in pose information from unannotated, uncalibrated but synchronized multi-view videos of human motions as additional weak supervision…
The best performing methods for 3D human pose estimation from monocular images require large amounts of in-the-wild 2D and controlled 3D pose annotated datasets which are costly and require sophisticated systems to acquire. To reduce this annotation dependency, we propose Multiview-Consistent Semi Supervised Learning (MCSS) framework that utilizes similarity in pose information from unannotated, uncalibrated but synchronized multi-view videos of human motions as additional weak supervision signal to guide 3D human pose regression. Our framework applies hard-negative mining based on temporal relations in multi-view videos to arrive at a multi-view consistent pose embedding and when jointly trained with limited 3D pose annotations, our approach improves the baseline by 25% and state-of-the-art by 8.7%, whilst using substantially smaller networks. Lastly, but importantly, we demonstrate the advantages of the learned embedding and establish view-invariant pose retrieval benchmarks on two popular, publicly available multi-view human pose datasets, Human 3.6 M and MPI-INF-3DHP, to facilitate future research.
Other authorsSee publication -
Theano: A Python framework for fast computation of mathematical expressions
arXiv e-prints
See publicationTheano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers-especially in the machine learning community-and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art…
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers-especially in the machine learning community-and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models.
-
Joint training of a convolutional network and a graphical model for human pose estimation
Advances in neural information processing systems
This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform…
This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.
Other authorsSee publication
Patents
-
Computer-implemented method and apparatus for tracking and reshaping a human shaped figure in a digital world video
US9191579 B2
See patentThe invention concerns a computer-implemented method for tracking and reshaping a human-shaped figure in a digital video comprising the steps: acquiring a body model of the figure from the digital video, adapting a shape of the body model, modifying frames of the digital video, based on the adapted body model and outputting the digital video.
-
Method and System for Triggering an Event in a Vehicle
EP3895064
See patentThe invention as defined relates to a method for triggering an event in a vehicle, using a hand gesture.
-
Method for Identifying a Hand Pose in a Vehicle
WO2020048814
See patentEmbodiments of present disclosure relate to method for identifying a hand pose in a vehicle, and a system for performing an event in a vehicle. Initially, for the identification, a hand image for a hand in the vehicle, is extracted from a vehicle image of the vehicle. Plurality of contextual images of the hand image is obtained based on the single point. Further, each of the plurality of contextual images are processed using one or more layers of a neural network to obtain a plurality of…
Embodiments of present disclosure relate to method for identifying a hand pose in a vehicle, and a system for performing an event in a vehicle. Initially, for the identification, a hand image for a hand in the vehicle, is extracted from a vehicle image of the vehicle. Plurality of contextual images of the hand image is obtained based on the single point. Further, each of the plurality of contextual images are processed using one or more layers of a neural network to obtain a plurality of contextual features associated with the hand image. A hand pose associated with the hand is identified based on the plurality of contextual features using a classifier model.
-
System and method for deployment of airbag based on head pose estimation
INA201911039220
See patentAn intelligent airbag deployment control system implemented in a vehicle is disclosed. An input unit receives input images of an occupant in a vehicle from an image sensor unit. A processing unit processes the images to determine and track head localization information based on amplitude and depth parameter of the image. Further, the head localization information is predicted to determine future position and orientation of the passengers head. The future head localization information is…
An intelligent airbag deployment control system implemented in a vehicle is disclosed. An input unit receives input images of an occupant in a vehicle from an image sensor unit. A processing unit processes the images to determine and track head localization information based on amplitude and depth parameter of the image. Further, the head localization information is predicted to determine future position and orientation of the passengers head. The future head localization information is predicted by
processing the determined head localization information using Long Short Term Memory (LSTM) neural network architecture. The processing unit generates a control signal to indicate direction of removal of flap of an airbag and amount of pressure in the airbag, while deployment of the airbag
Projects
-
Automatic recognition of advertising trademarks in sports videos
-
This project was conducted in collaboration with Sport System Europe s.r.l. (www.sportsystem.com). We developed a semi-automatic system for automatic recognition and annotation of logos in sports videos. I was involved in the project both for scientific research and development of the logo recognition prototype; we used C/C++ and the OpenCV library.
Other creatorsSee project
Recommendations received
9 people have recommended Arjun
Join now to viewMore activity by Arjun
-
Thought does not require language. Language is an expression of thought. Intelligence requires thought more than it requires language…
Thought does not require language. Language is an expression of thought. Intelligence requires thought more than it requires language…
Liked by Arjun Jain
-
I’m pleased to share that 6 of our papers were accepted at AAAI-26, distributed as: • 2 Bridge – to appear in the PMLR proceedings • 2 Workshop – to…
I’m pleased to share that 6 of our papers were accepted at AAAI-26, distributed as: • 2 Bridge – to appear in the PMLR proceedings • 2 Workshop – to…
Liked by Arjun Jain
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More