강의

멘토링

커뮤니티

Data Science

/

Data Analysis

Master crawling by following along with Data Workshop

Python Crawling Master. This one is all you need for everything from installation to application. I've packed in only the essential content for real use.

(4.7) 6 reviews

140 learners

Level Beginner

Course period Unlimited

  • datago0ba0
Python
Python
Web Crawling
Web Crawling
Big Data
Big Data
Python
Python
Web Crawling
Web Crawling
Big Data
Big Data

Notice of change in Netflix section information

Due to the Netflix site reorganization, the tags in the title section have changed.

I'll add the edit code below the post.

 

section_title = section.select('h3')[0].text #Before change)

section_title = section.select('h2')[0].text # Modification) Change the section title part tag

 

 

-----------

2022.01.01 Additional modifications

 

When retrieving image files and program URL portions from Netflix

In cases where there is no information or there is different information, we added a code to organize it.

If it is image file information

1. If it contains image file information,

2. If it is in a format other than a file (data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==) (the image is not displayed on the screen)

3. There are cases where the image file information itself is missing.

Check each step above one by one, and if it is not the information you are looking for, organize it into the next information.

I modified it as follows using try, except statements, if conditional statements, etc.

 

-------------------------------------------------- ----------------------

try:

program_img = program.select('img')[0]['src']

if 'https' not in program_img:

program_img = '' # If the image file location is not displayed (not visible on the screen), enter a blank space.

except:

program_img = '' # If there is no image information itself, enter blank space.

-------------------------------------------------- ----------------------

 

As there are cases where there is no information at all in the program link tag, we have organized it so that a blank space is entered in such cases.

-------------------------------------------------- ----------------------

try:

program_link = program.select('a')[0]['href']

except:

 

program_link = '' # If there is no link address, enter blank space

-------------------------------------------------- ----------------------

Comment