Detecting breathing rate of a person from video

A technique to estimate the respiratory rate of a person in real time using just the video and some OpenCV MathMagic! We apply the concepts Fourier Transformation, Laplacian pyramids, multithreading. See how C++ beats Python when it comes to speed


Breathing rate is a very important monitoring measure which doctors usually evaluate for a patient by counting the number of time the chest inflates in a given period of time. This is not always accurate and sometimes doctors require constant monitoring of respiratory rates, in that case manually counting the rate becomes a tedious task.

In this project, I present a technique to evaluate the respiratory rate of a person, whose is sleeping in a video. All the movements in the video are caused by the respiration of the person, who is otherwise assumed to be still. We find the respiratory rate by analysing the subtle respiration induced movements. We implement a technique which is very much inspired by this research work on video motion magnification.


To calculate the respiratory rate, a human eye would observe a particular part of the person and count the number of times the body moves up and down in a given time interval. If we just focus on one pixel, say near the boundary of stomach of the subject and plot the intensity variation vs time, we can see a regular motion or oscillation of intensity. This oscillation is in sync with the respiratory rate, hence to deduce the respiratory rate, we can find the frequency of oscillation. That can be easily done by applying Fourier Transformation over the time series of intensity and selecting the most dominant one.

Above is a frame taken from a video in which the baby is sleeping. If we observe the stomach area of the baby carefully in the video, we see very subtle movements of breadth inhale and exhale. The movements occur in a regular pace causing uniform variations in pixel intensities. For a pixel near stomach area, let us plot its intensity variation against time.

We can clearly see how the intensity changes in a uniform manner, just like a sine wave. This can be considered as a signal which, in general, is composed of many sine and cosine waves including noise. This composition of a signal into many waves is called Fourier Series and we can decompose a signal into its sine waves components by applying Fourier Transformation.

The above plot is the frequency spectrum obtained after applying Fourier Transformation to the intensity signal of the pixel considered earlier. The x-axis represents all the frequencies in Hz while the y-axis represents the amplitude or strength of the corresponding frequency. From the spectrum, it is clear that 0.5 Hz is the most dominating frequency which is equal to 30 beats per minute. As it turns out, this is almost equal to the breathing rate we obtain by manually counting breaths in the video!

Pipeline Stages

The whole system for calculating breathing rate involves some processing stages, in which the output of one stage becomes the input of another. This is called a pipeline system and representing it this way helps in modularising our implementation. Below is the description of each stage.

Sampling: Sample N frames from the video feed. N is chosen such that the sample contains enough information of oscillation of pixel intensities to deduce its frequency accurately. We choose N as 100-200 in our tests.

Region of Interest: Extract bounding rectangles to regions where oscillation is most prominent. This reduces computations time and noise from calculations. Most likely, it is the stomach or chest area where movement is largest.

Pre-processing: For each frame, convert to gray scale and build Laplacian pyramid and select the highest level image. The highest level image will have less noise and more sharp features.

Frequency Extraction: For each pixel in a frame, apply Fast Fourier Transformation of its intensity over sampled time to extract frequencies of its intensity oscillation.

Band Pass: Filter out those frequencies, which cannot correspond to human breathing rate. (Valid frequencies are taken between 0.2 Hz and 0.8 Hz)

Evaluating Breathing Rate: For each pixel, find the most dominant frequency. Of all the chosen frequencies of each pixel, choose that frequency which is most dominant among all. This is chosen to be our respiratory frequency.

Further Improvements and Optimisations

Detection of Region of Interest: A rough estimation (area around the stomach/lungs) can be manually entered through a user interface. We can also use AI to detect possible human body region.

Selection of Oscillating Pixels: For each pixel in frame, we find its most dominating frequency. The challenge is to decide which pixel’s frequency corresponds to the actual breathing rate. I currently detect the pixel whose dominating frequency has the highest amplitude. We can shortlist pixels by considering only those pixels which lies on edges or on high contrast regions.

Speed of Calculations: This implementation has lots of calculations that can be parallelised easily and efficiently. Libraries like OpenMP and CUDA can be easily used to speed up the overall calculation process.


A very primitive proof of concept application can been implemented in Python using the OpenCV and SciPy. There are very efficient methods available to perform Fast Fourier Transformation in Numpy as well as calculating Image Pyramids in OpenCV. I would suggest the readers to try implementing the whole system themselves, its really interesting to see the challenges that come across while implementing! In future posts, I will discuss an implementation in Python and C++ and we will compare their performance.


A very basic implementation in Python is available here. To run, openCV must be installed with Python bindings, along with SciPy. I would still suggest first trying to implement this system yourself or see if you can improve on the existing one. To run this app on a video file, simply run the command ”python video.mp4” in the root directory. If no filename is provided, then input is taken from the default webcam.

A much faster C++ implementation is available here. The code implements multithreading to perform calculations in parallel, hence is much more smooth than the Python implementation. In the next post, we discuss this implementation in detail.

Rohan Raja

Recently graduated, majoring in Mathematics and Computing from IIT Kharagpur, Rohan is a technology enthusiast and passionate programmer. Likes to apply Mathematics and Artificial Intelligence to devise creative solutions to common problems.