A detailed breakdown of a new paper from Google Research and Johns Hopkins University — DetectoRS: Detecting Objects using Recursive Feature Pyramids and Switchable Atrous Convolutions.

Photo by Markus Spiske on Unsplash

There has been extensive research going on in finding novel techniques, algorithms, and new end-to-end trainable pipelines for Object Detection and Image Segmentation tasks in the field of Computer Vision.

Year by year, different research institutes/organizations come up with some new ideas to tackle the pertaining problems of these tasks with robust and real-time solutions.

Given the premise, 2020 is being one of the most exciting years for the field of Computer Vision and Deep Learning.

In June 2020, Google Research Team with Johns Hopkins University came up with an exciting new architecture design of Feature Pyramid Networks and modulation…

An understanding of a new paradigm of depthwise convolution operation developed by Google Research Team

Photo by Devon Janse van Rensburg on Unsplash

Convolutional Neural Networks are complex computational models. Deeper the model, higher will be the complexity. Due to this unfortunate property, it becomes very nontrivial to use these models for real-time purposes.

First released in the paper Xception: Deep Learning with Depthwise Separable Convolutions by Google, introduced the concept of Depthwise Separation Convolutional Kernels, which helped to accelerate the speed of a convolution operation. It proved to be one of the crucial factors in getting efficient modern ConvNets and deploying them onto real-time edge low-compute devices.

Recently, Google released a new paradigm of Depthwise Convolutional Kernels in the paper MixConv: Mixed…

Understanding the core architecture of RegNet from Facebook AI

Photo by Nadine Shaabana on Unsplash

Getting to know about the new 2020 version of ResNet /ResNeXtRegNet from Facebook AI.

This article will mainly focus on the architectural design of RegNet mentioned in paper Designing Network Design Spaces¹.

After finishing this blog, you will get to know the core skeleton of RegNet Architecture and its different network families viz. RegNetX and RegNetY.


Step 1 — Generic ResNet Architecture

Step 2 — Making of AnyNet Population Models

Step 3 — Making of RegNet Models: RegNetX and RegNetY

Step4 — Pytorch Implementation of RegNetX/RegNetY Models

Step 1 — Generic ResNet Architecture

Let’s quickly refresh the general structure of ResNet. This will help us…

Understanding one of the interesting attention mechanisms in convolutional neural networks.

In this article, we will be going through two articles quickly viz. Bottleneck Attention Modules(BAM)¹ and Convolutional Block Attention Modules(CBAM)².

Recently, many different SOTA networks have leveraged these attention mechanisms that have significantly improved and refined real-time results.

Lightweight network and straightforward implementations have made it easier to incorporate directly into the feature extraction part of convolutional neural networks.

So, let’s get started by paying some ATTENTION to these topics.

PS: Stay tuned for exciting new articles like this in future by following VisionWizard Page.

1. Attention Module: What is?

  • Attention modules are used to make CNN learn and focus more on the important information…

A place to share quality research ideas in AI for everyone

Vision Wizard

“Life can only be understood backwards; but it must be lived forwards.”
Søren Kierkegaard

Hello, from VisionWizard. We a team of engineers/researchers working in AI started VisionWizard in one of the worst human adversaries in the history of mankind — Covid19 Pandemic, with a basic need to bridge the gap between research and development by simplifying the explanations of research papers in the field of Computer Vision and Deep Learning.

We ourselves are working every day to solve complex computer vision/deep learning problems as a part of our daily jobs. We know first hand, how much time is wasted…

An Introductory Guide on the Fundamentals and Algorithmic Flows of YOLOv4 Object Detector

Source: Photo by Joanna Kosinska on Unsplash

Welcome to the final part of YOLOv4¹ mini-series.

YOLOv4 — Version 0: Introduction

YOLOv4 — Version 1: Bag of Freebies

YOLOv4 — Version 2: Bag of Specials

YOLOv4 — Version 3: Proposed Workflow

YOLOv4 — Version 4: Final Verdict

I hope we were able to do a thorough walk through of all nuts and bolts of this amazing research.

This article’s main focus is on analytical results rather than any informative explanations. One last ride, let’s begin the finale.

This article will state the analytical comparisons between yolov4 and other object detectors.

1. Finalizing Bag of Freebies attributes

  • As discussed in the introduction of this series…

An Introductory Guide on the Fundamentals and Algorithmic Flows of YOLOv4 Object Detector

Source: Photo by Joanna Kosinska on Unsplash

Welcome to the mini-series on YOLOv4. This article will be addressing all the components authors have presented in the part Bag of Specials. So, breathe in, breathe out, and enjoy learning.

YOLOv4 — Version 0: Introduction

YOLOv4 — Version 1: Bag of Freebies

YOLOv4 — Version 2: Bag of Specials

YOLOv4 — Version 3: Proposed Workflow

YOLOv4 — Version 4: Final Verdict

This article will specifically target all the questions about the different methods present in this Special Bag.

Once progressed to the end, you will get a very demanding understanding of the working and advantages of the same.


An Introductory Guide on the Fundamentals and Algorithmic Flows of YOLOv4 Object Detector

Source: Photo by Joanna Kosinska on Unsplash

First introduced in 2015, YOLO quickly rose to fame as one of the fastest dense object detectors with its surprisingly fast inference speed and decent results. Till last year, it has remained the king of one stage object detectors.

This year, it has manifested itself as the boss of One-Shot Object Detectors. Yes, YOLOv4¹ has arrived with the demanding and most interesting upgrades proving the best state-of-the-art detectors in terms of accuracy and inference speed.

Revised and novel algorithms/modules are added in this version, and thus we have decided to breakdown this research into five different explanatory parts. …

A detailed technical breakdown on interesting research of CenterNet: Objects as Points

Original Source: Photo by Jacek Dylag on Unsplash

CenterNet: Objects as Points¹ is interesting research which disrupts one of the hot topics of Anchor Free Object Detection. We will be doing an exhaustive study of this paper with a concise modularized paradigm so that it will be easy for you to relate this article with an original research paper.

Also, I strongly urge you to go through the basics of Focal Loss before reading this blog, as it will help you to doubtlessly acknowledge the algorithmic flow of loss calculation used in this proposed method. You can get a detailed explanation of Focal Loss from this link.


Focal Loss implications on solving the class imbalance problem

Photo by Paul Skorupskas on Unsplash

Yes, you might have got an idea of what I will be discussing in this blog ;p. But before going to the main topic, I want to implant some of the prerequisite points.

  1. In the case of object detection
  • Positive examples: Target Class or foreground information such as ground-truths.
  • Negative examples: Non-Target Class or background information such as anchors whose IoU with ground-truths is less than a given threshold.
  • Easy positives/negatives: Samples classified as positive/negative examples.
  • Hard positives/negatives: Samples misclassified as negative/positive examples.

2. Class Imbalance Problem

  • This is observed when information related to one class in a dataset or…

Shreejal Trivedi

Deep Learning || Computer Vision || AI || Editor — VisionWizard

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store