Capsule Networks for 3D Data


Stanford University        Technical University of Munich        Google


We introduce 3D Capsule Networks as a generic framework that is suited to capture extrinsic or intrinsic object properties such as object orientation or part labels. Our quest is to learn robust, flexible and generalizable 3D object representations without requiring heavy annotation efforts or supervision. The use of capsule networks allows us to go beyond conventional 3D generative models by constructing a structured latent space where certain factors of shape variations, such as object parts, can be disentangled into independent sub-spaces. Such embeddings can either be explicit like quaternions representing part poses enabling equivariant network architectures or purely implicit. While the learned representations can already be used in typical tasks like classification or pose estimation, they also allow for generation. Our novel decoders then act on the latent capsules to reconstruct 3D points in a self-supervised manner. Currently, we present two approaches, one for (un)supervised, part-based 3D shape processing across rigid / non-rigid shapes and cross-/within-category [CVPR'19, IJCV'22] and the other for classification and pose estimation. Thanks to the use of capsule networks, our methods enjoy an added structure on the latent space which enables multiple applications such as part pose estimation, part interpolation or replacement. Please refer to the papers and videos for futher information.

ECCV'20 Oral Presentation on Equivariant Networks


CVPR'19 Tutorial on 3D Point Capsule Networks


Citation

@inproceedings{zhao20193d,
author={Zhao, Yongheng and Birdal, Tolga and Deng, Haowen and Tombari, Federico},
booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
title={3D Point Capsule Networks},
organizer={IEEE/CVF},
year={2019}
}

@article{zhao2020quaternion,
title={Quaternion Equivariant Capsule Networks for 3D Point Clouds},
author={Zhao, Yongheng and Birdal, Tolga and Lenssen, Jan Eric and Menegatti, Emanuele and Guibas, Leonidas and Tombari, Federico},
booktitle={European Conference on Computer Vision (ECCV)},
pages={1--19},
year={2020},
organization={Springer}
}
@article{zhao20223dpointcapspp,
title={3DPointCaps++: Learning 3D Representations with Capsule Networks},
author={Zhao, Yongheng and Fang, Guangchi and Guo, Yulan and Guibas, Leonidas and Tombari, Federico and Birdal, Tolga},
booktitle={International Journal of Computer Vision (IJCV)},
year={2022},
organization={Springer}
}

Funding

This joint effort is supported by Stanford-Ford Alliance, NSF grant IIS-1763268, Vannevar Bush Faculty Fellowship, Samsung GRO program, the Stanford SAIL Toyota Research, and the PRIME programme of the German Academic Exchange Service (DAAD) with funds from the German Federal Ministry of Education and Research (BMBF).

Interested in Collaborating with Us?

We would like this project to evolve towards a repository of methods addressing shape representations for 3D computer vision. Therefore, we look for contributors and collaborators with great coding and mathematical skills as well as good knowledge in 3D vision, machine (deep) learning. If you are interested please send an e-mail to Tolga: tbirdal@imperial.ac.uk.