How I am Learning Computer Vision — Pt.2 — Mathematics and Colors
Previous post: https://igorcomune.medium.com/how-i-am-learning-computer-vision-pt-1-b1bf22804e0c
I barely started my study about Computer Vision and my mathematicians skills started killing me… I really do not remember if I studied Linear Algebra in my school days, but I am sure that during my Bachelors I did not.
Eigen Values, matrices, scalar values, span… what are those things?
And colors, what is a color? Well I have some cues, like RGB (Red, Green and Blue), but I want to understand it better. Why do they range from 0 to 255? Why RGB? Color scale… gray slace, channel, time to go further.
When I started this very post, I begun writing about my learning of Linear Algebra, but, along with linear algebra, I’ve encountered few problems understanding Vectors and Matrices.
I probably wrote six versions of this very post.
The hardest aspect of being a self-taught student is: how to plan your learning pathway. When it comes to Machine Learning, everything become a mess. So, I decided to stop my progress and make some changes.
It’s time clean up this mess.
As a self-taught, I always learn everything “on-demand”.
For an example, I don’t know anything about running. So, I made a overview about running, understood what I wanted and my objective… 5k, a marathon? Iron Man? Each “class” would change completely the pathway. In case I would’ve chosen 5k. Then I would start running at the same time I would start read everything about 5k, then listen to tips about posture, pain, heart rate… everything.
Why I gave this example?
Because I always learn with practice.
This time around, I chose to focus on mathematics foundation… It was not a waste of my time, I learned a lot about Linear Algebra, but … it’s not my learning method, it’s not the way I used to learn, or the learning method I used to learn.
Back to my roots, today (24/07/2022), I decided that I’ll learn with practice again. I’ll find a problem and I’ll understand it, fast, I will no longer focus hours and hours of my weekends studying mathematics foundations.
What will be done?
Well, I bought 2 e-books:
Now, when I’ll have any doubt about mathematics, I’ll learn it focused on Data Science field of knowledge, and of course, internet sites about it.
Trial and Error
This is the most important “truth” about the learning proccess, you’ll always wrong, you’ll encounter many problems, some of them, even if you try, you’ll not understand at first. And it is no problem at all.
The following video makes me shift back to my roots:
In order to present not only my path until now, and my trials and errors, I’ll will show you what was going on before I completely rewrite this post:
I chose a well known channel on Youtube to learn about Linear Algebra, it is called 3Blue1Brown.
- Site: https://www.3blue1brown.com
- Youtube Channel: https://www.youtube.com/3blue1brown
- Linear Algebra playlist: https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
In this channel, there is a playlist called “Essence of linear algebra”, which there are 16 videos about linear algebra.
I started learning Linear Algebra, but it was not a good idea, I had to step back to its foundations.
Vectors and Matrices
Well, I know, it is a mess, but try to imagine the time I lost to finally find out how I would proceed with my planning.
That’s the part where I stopped and rethinked about my learning path.
(This part was not editted or rewritten)
Before we understand how we see, we should understand WHAT we see. The image below shows the electromagnetic spectrum.
What we see is an eletromagnetic wave, and the waves are classified according to their lengths. Human can only see the “visible light”, which ranges from 700nm to 400nm (red to violet). With other types of technologies is possible to “see” other wavelengths, but, for simplicity, we’ll focus on “visible light”.
As every sense of human, they enter (in our eyes, in this case), they are convert to electromagnetic pulses for our brain to interpret.
Trichromacy — according to wikipedia “[…] or trichromatism is the possessing of three independent channels for conveying color information, ‘derived from the three different types of cone cells in the eye”
“Cone cells are photoreceptor cells in the retinas of vertebrate eyes including the human eye. Cones are normally one of three types, each with a different pigment, namely: S-cones, M-cones and L-cones.” — Wikipedia
The L-cones contains a Red pigment, the M, contains the Green and the S the blue. So, in order to proccess an image, our brain receives information from these 3 cone cells, the information is the intensity of Red, Green and Blue. Or RGB.
Now everything is making sense. Our eyes sees red, green and blue, so the cameras, screens, computers… does too.
Answering one my question, images ranges from 0 to 255 because they are binary!
2⁸ = 256, so there are 256 of colors intensity (in Computer Science we count the 0, this is why it ranges from 0 to 255 and not 1 to 256)
Returning… digital images are made up of “pixels”. Each pixel can only contain a single color, and its intensity ranges from… 1 to 256.
The pixel word is made up of “Pics” of Picture, and “El” of element. Or, the smallest element of a image.
Wavelenghts and Numbers
Humans sees lectromagnetic wavelengths and computers, sees numbers.
An image with 25x25 have 625 pixels. The image below is 25x25 pixels.
Mathematically speaking, images are matrices, like geographic coordinates.
If a pixel can only contain a single color, how can we any other color that is not Red, Blue or Green (as the example above)?
Coloured images are processed in 3 layers, Red, Green and Blue, with its intensities and then they are overlayed to represent any color, like the following example:
In digital images, these layers are called “channels”
Another example, this is how a computer sees:
Human eyes sees electromagnetic wavelengths, computer sees columns, rows and channels.
Each image is processed as in terms of mathematical operations using columns, rows and channels, each value, of each intersection between X and Y axis receives a number according to their channel and intensity in each channel.
As job is my top 1 priority, this month I’ll start to focus on two other subjects (Dashboards using Python and Natural Language Processing) that are necessary for my job, so, I do not know how long I’ll stay away from Computer Vision and my blog posts.
The good part of it, is that many things I’ll learn in this “sprint” will be useful for Computer Vision, like NLP.
I also have a lot to share about everything I have been learning on my job, maybe soon I’ll be able to start writing about it.