▲Foundations of Computer Vision (2024)visionbook.mit.edu

224 points by tzury 1 days ago | 19 comments

pantulis 18 hours ago [-]

There is a very interesting section in the book, "On Research, Writing and Speaking", which includes gems like:

“This sounds like hard work.” Yes. It’s no longer about being smart. By now, everyone around you is smart. In graduate school, it’s the hard workers who pull ahead.

bonoboTP 18 hours ago [-]

That's definitely insightful. Everyone reaches a level where coasting on smarts is no longer sufficient.

Many reach this realization when starting university, but some can still coast okay in college since the material to learn is well defined and upper bounded. A PhD is not really upper bounded. There's no set out amount of papers to read per week like in a college course. There's no "this won't be part of the exam". Anything is fair game. The returns on being smarter never flatten out, but simply there's no ceiling. You can always do more, read more to keep up with the literature firehose, improve your experiments, your method, etc.

You also need soft skills and a network. You need to keep your finger on the pulse of the community by going to conferences and getting to know people, grabbing coffee or going out to dinner with them. You also need to be slef driven instead of waiting for instructions like it was in college. You need to be just the right amount of skeptical and critical regarding existing methods to be able to come up with new things while being also understood and accepted and seen relevant and exciting by the community.

You also need to manage your time and set your own deadlines and maintain a routine without the external sync given by university lectures and exams. All this basically has no upper limit and even the expectations are vaguely defined. You face rejections maybe for the first time despite having done a thorough work because the reviewers don't see enough novelty or it doesn't slot neatly into what is in fashion at the moment.

My point is that a PhD can push everyone to meet their mental limits. It can be frustrating and it's a notoriously hard period of time for many PhD students. Of course if your only goal is to graduate to get the doctorate, there are possible strategies to "coast", but those who go for the academic path often expect to achieve more than the bare minimum, especially if they managed to coast with good results in college.

VladVladikoff 1 hours ago [-]

In third year of undergrad it felt like I couldn’t even keep up with the class despite my hard work. Granted this was an engineering program which had an average entrance from highschool marks of 90%, and had 75% of the students drop out by 2nd year because it was so hard.

criddell 6 minutes ago [-]

I had a different experience. My high school wasn't very good so I was behind everybody else from the start. It took the first two years for me to catch up. My grades were terrible and I didn't find the coursework all that interesting. It was the first time in my life I had homework and I needed learn how to study. If not for the 100% final exam option, I would have flunked out.

Then in my final two years I was able to make more and more choices about what I studied. I found the coursework interesting and my grades finally got back to where I wanted them to be. I don't think my overall GPA for all my years at university were an accurate reflection of the student I was at graduation.

AdieuToLogic 15 hours ago [-]

Another great book in this field is:

  Computer Vision, Fifth Edition
  E.R. Davies
  Academic Press
  ISBN-13  978-0128092842

bonoboTP 4 hours ago [-]

The other main one is Szeliski's Computer Vision 2nd Ed from 2022 https://szeliski.org/Book/

Forsyth & Ponce is also good but somewhat old by now. And for 3d, the classic is still Hartley & Zisserman's Multiple View Geometry.

oytis 4 hours ago [-]

Can someone working in the field comment on how relevant the content still is? A lot of ML including CV seems (from the outside at least) to be completely disrupted by the developments of the last two years.

Greamy 39 minutes ago [-]

It still is super relevant. Most computer vision done outside academia is still based on older stuff, or classical computer vision algorithms. You don't really get so many chances to use the latest models and techniques, as most often than not, they are not that relevant, or are only for extremely specific cases, or you just don't need something that complex.

bonoboTP 4 hours ago [-]

Very relevant. None of the recent techniques are truly revolutionary. It's all based on these same foundations. I'd say it would do good to read even older ones. There are lots of real, profitable computer vision applications built on classic methods like Hough transforms, canny edges, sift, Harris corners, etc. You should be familiar with these if you want to come across as a serious professional as opposed to a hype boy vibe coder who can just rattle off buzzwords and glue apis without fundamental understanding.

walterlw 4 hours ago [-]

there are still a lot of problems to be solved using "classical" computer vision, especially in systems where you don't have easy access to GPU acceleration. I am a practitioner doing Simultaneous localization and mapping on compute-restricted platforms, so definitely going to read the Structure from Motion chapter.

hananova 7 hours ago [-]

The "Writing this book" section accidentally implies that LLM's were used for 2/3rds of the manuscript.

I think they probably mean that LLM's just gave them a lot more to write about, but I think it would be a good idea to clarify.

oytis 6 hours ago [-]

I am not reading it like this - in fact ChatGPT was the first thing out there that would be able to assist them in writing, and less than a third of this book was written after release of ChatGPT. To me it just looks like marking important events in ML/AI field on the graph.

la_fayette 18 hours ago [-]

Unbelievable that this book is freely available! Thanks to the authors, publishers or whoever.

bonoboTP 17 hours ago [-]

The machine learning, computer vision and robotics communities are really great at publishing their books online for free access. You can get the absolute top textbooks of these fields for free online. Quite a contrast to other fields where profs kinda require you to buy the latest edition for hundreds of dollars in the US. Not to mention that this gives access to the best resources everyone around the world in poorer countries as well. Many also share their course materials and videos online.

walterlw 4 hours ago [-]

Very true and joining in on the thanks. Did you find a way to download it as a pdf though? I believe it is essential to be able to add notes and references when reading any learning material.

vincenthwt 13 hours ago [-]

Can anyone recommend a good book on Machine Vision? I believe the foundation of effective machine vision, and even computer vision, lies in selecting the right camera, optics, and lighting. High-quality images are essential because poor input leads to poor output.

ack_inc 8 hours ago [-]

Hi, could you mention a use-case or two where these things made a real difference?

bonoboTP 3 hours ago [-]

The term "machine vision" is mainly used in highly controlled, narrow industrial applications, think factory assembly lines, steel inspection, monitoring for cracks in materials, shape or size classification of items, etc. The task is usually very well defined, and the same thing needs to be repeated under essentially the same conditions over and over again with high reliability.

But many other things exist outside the "glue some GPT4o vision api stuff together for a mobile app to pitch to VCs" space. Like inspecting and servicing airplanes (Airbus has vision engineers who make tools for internal use, you don't have datasets of a billion images for that). There are also things like 3D motion capture of animals, such as mice or even insects like flies, which requires very precise calibration and proper optical setups. Or estimating the meat yield of pigs and cows on farms from multi-view images combined with weight measurements. There are medical things, like cell counting, 3D reconstruction of facial geometry for plastic surgery, dentistry applications, and a million other things other than chatting with ChatGPT about images or classifying cats vs dogs or drawing bounding boxes of people in a smartphone video.

jeffreygoesto 7 hours ago [-]

Any serious production inspection.

19 hours ago [-]

Loading comments...

pantulis 18 hours ago [-]

There is a very interesting section in the book, "On Research, Writing and Speaking", which includes gems like:

“This sounds like hard work.” Yes. It’s no longer about being smart. By now, everyone around you is smart. In graduate school, it’s the hard workers who pull ahead.

bonoboTP 18 hours ago [-]

That's definitely insightful. Everyone reaches a level where coasting on smarts is no longer sufficient.

VladVladikoff 1 hours ago [-]

criddell 6 minutes ago [-]

AdieuToLogic 15 hours ago [-]

Another great book in this field is:

  Computer Vision, Fifth Edition
  E.R. Davies
  Academic Press
  ISBN-13  978-0128092842

bonoboTP 4 hours ago [-]

The other main one is Szeliski's Computer Vision 2nd Ed from 2022 https://szeliski.org/Book/

Forsyth & Ponce is also good but somewhat old by now. And for 3d, the classic is still Hartley & Zisserman's Multiple View Geometry.

oytis 4 hours ago [-]

Greamy 39 minutes ago [-]

bonoboTP 4 hours ago [-]

walterlw 4 hours ago [-]

hananova 7 hours ago [-]

The "Writing this book" section accidentally implies that LLM's were used for 2/3rds of the manuscript.

I think they probably mean that LLM's just gave them a lot more to write about, but I think it would be a good idea to clarify.

oytis 6 hours ago [-]

la_fayette 18 hours ago [-]

Unbelievable that this book is freely available! Thanks to the authors, publishers or whoever.

bonoboTP 17 hours ago [-]

walterlw 4 hours ago [-]

Very true and joining in on the thanks. Did you find a way to download it as a pdf though? I believe it is essential to be able to add notes and references when reading any learning material.

vincenthwt 13 hours ago [-]

ack_inc 8 hours ago [-]

Hi, could you mention a use-case or two where these things made a real difference?

bonoboTP 3 hours ago [-]

jeffreygoesto 7 hours ago [-]

Any serious production inspection.

19 hours ago [-]