Master crawling by following along with Data Workshop
Python Crawling Master. This one is all you need for everything from installation to application. I've packed in only the essential content for real use.
140 learners
Level Beginner
Course period Unlimited
Notice of change in Netflix section information
Due to the Netflix site reorganization, the tags in the title section have changed.
I'll add the edit code below the post.
section_title = section.select('h3')[0].text #Before change)
section_title = section.select('h2')[0].text # Modification) Change the section title part tag
-----------
2022.01.01 Additional modifications
When retrieving image files and program URL portions from Netflix
In cases where there is no information or there is different information, we added a code to organize it.
If it is image file information
1. If it contains image file information,
2. If it is in a format other than a file (data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==) (the image is not displayed on the screen)
3. There are cases where the image file information itself is missing.
Check each step above one by one, and if it is not the information you are looking for, organize it into the next information.
I modified it as follows using try, except statements, if conditional statements, etc.
-------------------------------------------------- ----------------------
try:
program_img = program.select('img')[0]['src']
if 'https' not in program_img:
program_img = '' # If the image file location is not displayed (not visible on the screen), enter a blank space.
except:
program_img = '' # If there is no image information itself, enter blank space.
-------------------------------------------------- ----------------------
As there are cases where there is no information at all in the program link tag, we have organized it so that a blank space is entered in such cases.
-------------------------------------------------- ----------------------
try:
program_link = program.select('a')[0]['href']
except:
program_link = '' # If there is no link address, enter blank space
-------------------------------------------------- ----------------------




