강의

멘토링

커뮤니티

Edited

Review 1

Average rating 1.0

Completed 80% of course

I didn't take diffusion 1 and 2. I work in the ML field and know diffusion to some extent, but I took this course to save time studying on my own. Honestly, the lecture quality is quite disappointing for the price. Overall issues: There's a lot of stuttering, making it hard to concentrate. At 60,000 won per hour, this was quite disappointing. Easy parts are explained in too much detail, while difficult and important parts are glossed over. Specifically lacking areas: CLIP/T5 The course description says "CLIP/T5 integration and token flow understanding," but it just mentions loading and using them, and that's it. There's no explanation of how CLIP and T5 differ, why they're used together, or why the sequence length is set to 77. RoPE There's almost no explanation of RoPE itself. There are cases where RoPE is used in attention blocks and cases where it isn't, but there's no explanation of this difference, and while caching is in the code, there's no explanation of when or why it's done. AdaLN SA and CA, which were already covered, are explained in detail again, but important concepts like AdaLN-single are only described as "same as before, using zero initialization in cross attention projection." I don't understand what this means or why it's done. When I looked it up separately, zero initialization refers to AdaLN-Zero, which seems to be a different concept from AdaLN-Single... but the lecture had no such distinction or explanation at all. Linear Attention (SANA) The preliminary explanation is okay, but when explaining the code, you don't explain how it differs from vanilla attention and only point out the same parts (qkv) before moving on. Errors: When explaining the SANA scheduler, I think you said "0.5 to x" when it should have been "0.5 to t." It's a small mistake, but it's disappointing that a 60,000 won per hour lecture wasn't even reviewed. Conclusion: I can get a few keywords and study by reading papers and code, but I wonder if it's worth paying 60,000 won per hour. The satisfaction is lower than free YouTube lectures, which is very disappointing... Even the responses to course reviews seem automated using LLM...

sotaaz님의 프로필 이미지
sotaaz
Instructor

Hello. First, I apologize for not meeting your expectations when you enrolled in the course with anticipation. I read your feedback with gratitude. Regarding the insufficient explanation of CLIP and T5 that you mentioned, I think there may have been a misunderstanding due to the structure of this course. Since this course aims for a practical stage of directly implementing and learning the latest architectures called PixArt and SANA, rather than focusing on the theory of text encoders themselves, I intended to cover how these models receive text information and how they connect to the image generation process through flow — in other words, focusing on integration and token flow. Also, based on what you've shared, I feel regretful that you may have felt more frustrated by the omitted basic concepts after skipping parts 1 and 2. This course is designed based on the previous parts, so the explanations you consider important may have felt relatively brief. I will definitely refer to your points when supplementing the course in the future. I also gratefully accept your feedback on delivery, and I will improve with clearer and more stable explanations in future courses. Thank you once again for taking your valuable time to share your opinion.

Pixart & SANA, Complete Mastery of Diffusion III: Learning Through Implementation thumbnail
sotaaz

·

5 lectures

·

9 students

Pixart & SANA, Complete Mastery of Diffusion III: Learning Through Implementation thumbnail
sotaaz

·

5 lectures

·

9 students