Chapter 14. Deep Computer Vision Using Convolutional Neural Networks

🍪

Chapter 14. Deep Computer Vision Using Convolutional Neural Networks

This is a short companion page to our internal reading group of the book “Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition”. However I unashamedly used a lot of PyTorch examples.

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

Chapter 14. Deep Computer Vision Using Convolutional Neural Networks Although IBM's Deep Blue supercomputer beat the chess world champion Garry Kasparov back in 1996, it wasn't until fairly recently that ... - Selection from Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition [Book]

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

https://learning.oreilly.com/library/view/hands-on-machine-learning/9781492032632/ch14.html

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

Intuition: visual cortex

Week 3 - Lecture: Convolutional neural networks

Course website: http://bit.ly/pDL-homePlaylist: http://bit.ly/pDL-YouTubeSpeaker: Yann LeCunWeek 3: http://bit.ly/pDL-en-030:00:00 - Week 3 - LectureLECTURE ...

Week 3 - Lecture: Convolutional neural networks

https://youtu.be/FW5gFiJb-ig?t=2616

Week 3 - Lecture: Convolutional neural networks

Convolution

Convolution - Wikipedia

In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions (f and g) that produces a third function ( convolution refers to both the result function and to the process of computing it. It is defined as the ) that expresses how the shape of one is modified by the other.

Convolution - Wikipedia

https://en.wikipedia.org/wiki/Convolution

Convolution - Wikipedia

Filters, kernels, padding, strides, and pooling layers

An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs).

https://poloclub.github.io/cnn-explainer/

CNN Explainer

Putting it altogether: LeNet5

Week 3 - Lecture: Convolutional neural networks

Course website: http://bit.ly/pDL-homePlaylist: http://bit.ly/pDL-YouTubeSpeaker: Yann LeCunWeek 3: http://bit.ly/pDL-en-030:00:00 - Week 3 - LectureLECTURE ...

Week 3 - Lecture: Convolutional neural networks

https://youtu.be/FW5gFiJb-ig

Week 3 - Lecture: Convolutional neural networks

CNN as a strong prior: locality, stationarity, compositionality

ConvNet Evolutions, Architectures, Implementation Details and Advantages.

🎙️ Yann LeCun Inspired by Fukushima's work on visual cortex modelling, using the simple/complex cell hierarchy combined with supervised training and backpropagation lead to the development of the first CNN at University of Toronto in '88-'89 by Prof. Yann LeCun. The experiments used a small dataset of 320 'mouser-written' digits.

ConvNet Evolutions, Architectures, Implementation Details and Advantages.

https://atcold.github.io/pytorch-Deep-Learning/en/week03/03-2/

Residue block and skip connections

An Overview of ResNet and its Variants

As ResNet gains more and more popularity in the research community, its architecture is getting studied heavily. In this section, I will first introduce several new architectures based on ResNet, then introduce a paper that provides an interpretation of treating ResNet as an ensemble of many smaller networks. Xie et al.

An Overview of ResNet and its Variants

https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035

An Overview of ResNet and its Variants

Deep CNN model architectures

AlexNet

GoogLeNet

ResNet 50

Other models

torchvision.models - Torchvision 0.10.0 documentation

The models subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection and video classification. The models subpackage contains definitions for the following model architectures for image classification: You can construct a model with random weights by calling its constructor: We provide pre-trained models, using the PyTorch .

torchvision.models - Torchvision 0.10.0 documentation

https://pytorch.org/vision/stable/models.html

Classification, detection, and semantic segmentation

Detection and Segmentation through ConvNets

There are wide variety of applications of neural networks in the realm of computer vision. And with a bit of twist same tools and techniques can be applied across wide range of tasks effectively. In this article we'll walk through few of those applications and way to approach towards them.

Detection and Segmentation through ConvNets

https://towardsdatascience.com/detection-and-segmentation-through-convnets-47aa42de27ea

Detection and Segmentation through ConvNets

IoU and NMS

Non-maximum Suppression (NMS)

Typical Object detection pipeline has one component for generating proposals for classification. Proposals are nothing but the candidate regions for the object of interest. Most of the approaches employ a sliding window over the feature map and assigns foreground/background scores depending on the features computed in that window.

Non-maximum Suppression (NMS)

https://towardsdatascience.com/non-maximum-suppression-nms-93ce178e177c

Non-maximum Suppression (NMS)

R-CNN, Feature Pyramid Network (FPN), Region Proposal Networks (RPN)

R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms

Computer vision is an interdisciplinary field that has been gaining huge amounts of traction in the ecent years(since CNN) and self-driving cars have taken centre stage. Another integral part of computer vision is object detection. Object detection aids in pose estimation, vehicle detection, surveillance etc.

R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms

https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e

R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms

Mask RCNN

GitHub - matterport/Mask_RCNN: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone. The repository includes: Source code of Mask R-CNN built on FPN and ResNet101.

GitHub - matterport/Mask_RCNN: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

https://github.com/matterport/Mask_RCNN

GitHub - matterport/Mask_RCNN: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Towards realtime, embedded: NAS and MobileNets

MobileNets: Open-Source Models for Efficient On-Device Vision

Deep learning has fueled tremendous progress in the field of computer vision in recent years, with neural networks repeatedly pushing the frontier of visual recognition technology.

MobileNets: Open-Source Models for Efficient On-Device Vision

https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html

MobileNets: Open-Source Models for Efficient On-Device Vision

Towards low prior: vision without CNN

Transformers for Image Recognition at Scale

While convolutional neural networks (CNNs) have been used in computer vision since the 1980s, they were not at the forefront until 2012 when AlexNet surpassed the performance of contemporary state-of-the-art image recognition methods by a large margin.

Transformers for Image Recognition at Scale

https://ai.googleblog.com/2020/12/transformers-for-image-recognition-at.html

Transformers for Image Recognition at Scale

Towards low data: contrastive representation learning, zero shots or few shots

https://lilianweng.github.io/lil-log/2021/05/31/contrastive-representation-learning.html

https://github.com/facebookresearch/moco

https://lightning-bolts.readthedocs.io/en/latest/self_supervised_models.html

CLIP: Connecting Text and Images

We're introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the "zero-shot" capabilities of GPT-2 and GPT-3.

CLIP: Connecting Text and Images

https://openai.com/blog/clip/

CLIP: Connecting Text and Images

Towards explainability

Feature Visualization

There is a growing sense that neural networks need to be interpretable to humans. The field of neural network interpretability has formed in response to these concerns. As it matures, two major threads of research have begun to coalesce: feature visualization and attribution. This article focuses on feature visualization.

Feature Visualization

https://distill.pub/2017/feature-visualization/

Feature Visualization

Multimodal Neurons in Artificial Neural Networks

We are deeply grateful to Sandhini Agarwal, Daniela Amodei, Dario Amodei, Tom Brown, Jeff Clune, Steve Dowling, Gretchen Krueger, Brice Menard, Reiichiro Nakano, Aditya Ramesh, Pranav Shyam, Ilya Sutskever and Martin Wattenberg. Gabriel Goh: Research lead.

Multimodal Neurons in Artificial Neural Networks

https://distill.pub/2021/multimodal-neurons/

Multimodal Neurons in Artificial Neural Networks

📧

jiayu@hey.com

July 27, 2021