ãŸãã«ããã°ããŒã¿æä»£ïŒ ðšâð»
Hadoopã§å°éå®¶ã«ãªããã
ããŒã¿ãµã€ãšã³ã¹ã®äžå¿ã
倧å¢ã¯ããã¥ãïŒ
è€æ°ã®ITå€§äŒæ¥ããœãŒã·ã£ã«ã¡ãã£ã¢ãµãŒãã¹ãªã©ã§ãããã°ããŒã¿ã®åæãšåŠçã«HadoopïŒApache HadoopïŒãå
åãããŠäœ¿çšããŠããŸãã Hadoopã¯ã倧éã®ããŒã¿ãå°ãªãã³ã¹ãã§åŠçã§ããããã«äœãããJavaèšèªããŒã¹ã®ãã¬ãŒã ã¯ãŒã¯ã§ãå€§èŠæš¡ãªããŒã¿ã»ããã忣ä¿åããŠåŠçããŸãããšããã§ããã®ãããªHadoopãéããŠããã°ããŒã¿ã®å°éå®¶ã¬ãã«ã®ã¯ã©ã¹ã«äžããããšãã§ãããã©ãã§ããããã
äŒæ¥ã¯ããŒã¿åæãéããŠæ°ããåžå Žãéæããåžå°ãªäŸ¡å€ãäžããæ°ããæ¶è²»è
ã«å¿
èŠãªæ
å ±ããªã¢ã«ã¿ã€ã ã§æäŸã§ããå¿«æãäžããããšãã§ããããã«ãªããŸããäžå°äŒæ¥ ãŸãããã°ããŒã¿ã¯å¿
ãåãæ±ãã¹ãå¿
é äºé
ã§ããã ãã«ãããã°ããŒã¿é¢é£è·åã§å°±è·/転è·ã倢èŠãæ¹ã«ã¯æå ±ã§ã¯ãããŸããã
BigData with Hadoop
GoogleãYahooãFacebookãIBMãInstagramãTwitterãªã©
è€æ°ã®äŒæ¥ãããŒã¿åæã«äœ¿çšããŠãã
代衚çãªããã°ããŒã¿ãœãªã¥ãŒã·ã§ã³ãHadoopãéããŠ
ããã°ããŒã¿åæ£åã·ã¹ãã ã€ã³ãã©ã¹ãã©ã¯ãã£ãæ§ç¯ããŸãã
ãã®ã¬ãã¹ã³ã§ã¯ãããã°ããŒã¿ã®çšèªãçè§£ãããªãŒãã³ãœãããŠã§ã¢Hadoopãä»ããŠããã°ããŒã¿ãæ±ãããã»ã¹ã鿥çã«äœéšããŸãããã®è¬çŸ©ãéããŠãåè¬çã®çããã¯ããã°ããŒã¿ãã¯ãããžãŒïŒBig Data TechnologyïŒã®äžçããããŠç¬¬4次é©åœã®äžçãåæã«çµéšã§ããããã«ãªããŸãã
- Hadoopã¯äžè¬çãªãœãããŠã§ã¢ã§ã誰ã§ãç¡æã§å©çšã§ãããªãŒãã³ãœãŒã¹ã§ãã
ãã®ã¬ãã¹ã³ã§ã¯ãHadoop 3.2.1ããŒãžã§ã³ãéããŠããã°ããŒã¿ãæ±ããŸãã
ããã°ããŒã¿ã®çè§£ãã
Hadoopã®äœ¿ãæ¹ãŸã§
äžåºŠã«OKã
ããã°ããŒã¿
çšèªã«ã€ããŠ
äžå¯æ¬ ãªçè§£
Hadoopã®
æŠå¿µãšçšéã«
ã«ã€ããŠ
Hadoopã«ãã
ããã°ããŒã¿åŠç
åŠç¿ãã¥ãŒããªã¢ã«
ãããªæ¹ã«ããããã§ãïŒ
ãã¡ãããããã«è©²åœããªãæ¹ãæè¿ããŸãã ïŒåå¿è
ã¯2åã«æè¿ããŸãâïŒ
å°±è·/転è·
æ€èšããæªæ¥åIT
ããŒã¿ãµã€ãšã³ã¹æºåç
Java / Pythonçµç±
ããã°ããŒã¿ãæ±ãã
ãå©çšã®æ¹
èå³ãšèå³
ããã°ããŒã¿ã«ã€ããŠ
äœéšãããæ¹
Hadoop 3.xããŒãžã§ã³
ããŒã¿ç°å¢ãªã©
çµéšè±å¯ãªäŒç€Ÿå¡ åè¬åãéžæã®ç¥èãã確èªãã ããïŒ
- ãã¬ãŒã€ãŒã®ç¥èãšããŠãJavaããã°ã©ãã³ã°èšèªã®åºç€ãããã°ããŒã¿ãããã³ä»®æ³ãã·ã³/ããŒã¿ã»ããã«é¢ããçšèªã®ç¥èãšLinux Ubuntuã®åºæ¬çãªçè§£ãå¿
èŠã§ãã
次ã®å
容
åŠç¿ããŸãã
1. ä»®æ³åæè¡ã®èª²é¡ãšã²ã¹ããªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã®çè§£
ãµãŒããŒçµ±åã«æå©ãªä»®æ³åæè¡ãåŠã³ãOSã¬ãã«ã®ä»®æ³åãä»ããŠ1ã€ã®OSã«è€æ°ã®ãµãŒããŒãåé¢ããæ¹æ³ã«åºã¥ããŠåŠã³ãŸãã Linuxã«é©çšã§ããä»®æ³åæ¹åŒã§ãããªãŒãã³ãœãŒã¹ãœãªã¥ãŒã·ã§ã³UbuntuãéããŠã誰ããææŠããŠå€§éã®ãµãŒããŒã補äœéå¶ããããšã«ãªãã§ããããããã«ãã²ã¹ããªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã®ç¥èã¯ãã¡ããã倧éã®ãµãŒããŒãéããŠããã°ããŒã¿ã忣æè¡ã«å€ããããšã«ãªããåºããå€éã®æè¡çµéšãèç©ã§ããããã«ãªããŸãããµãŒããŒä»®æ³åã䜿çšããŠã1ã€ã®ç©çãµãŒããŒäžãŸãã¯ãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã§éåžžã«å¹ççãªä»®æ³ãã·ã³ã®é£ãããªãã¬ãŒãã£ã³ã°ã·ã¹ãã ãäœéšããããšãã§ããŸãã
- ããã°ããŒã¿ã®å®çŸ©ãšå®éã®é©çšäŸã«ã€ããŠåŠã³ãŸãã
- äŒæ¥ã奜ãããŒã¿åŠçãœãããŠã§ã¢ã§ããHadoopã«é¢ããçšèªãçè§£ããŠãã ããã
The Landscape: ããã°ããŒã¿ 2. Ubuntu 20.04 LTSã®äžã«Hadoopãã€ã³ã¹ããŒã«ããåœä»€ãæäœããæ¹æ³
ããã³ããšã³ãïŒFrontEndïŒéçºè
ãWebã¢ããªã±ãŒã·ã§ã³ãéçºããéã«èªç¶ã«ééããLinux CLIïŒCommand Line InterfaceïŒæ¹åŒã®ããŒã«ã䜿çšããåºç€çãªæ¹æ³ãããHadoopãæ±ãLinuxã¿ãŒããã«ãèªç¶ã«åŠã¶äºå®ã§ãããã¡ãããé Windows ããŒã¹ã® GUI ç°å¢ã§åëã®ããã« Ubuntu ã䜿çšããããã®åæäºé
ãåŠã³ãªãããã»ã«ã®èšå®ãã¡ã€ã«ãªã©ã® Linux ã·ã¹ãã ã®çè§£ãè¶
ããŠäžçŽè
æ¹åã«èªç¶ã«å°ããŸãã
- Windows 10ããŒã¹ã®ããŒãããã¯ã«ä»®æ³ãã·ã³ãšããŠLinuxïŒUbuntu 20.04 LTSïŒãã€ã³ã¹ããŒã«ããŠèšå®ããŸãã
- Linuxä»®æ³ãã·ã³ã®äžã«Hadoop 3.2.1ããŒãžã§ã³ãã€ã³ã¹ããŒã«ããŸãã
3. Hadoop 3.2.1ææ°ã®æ¹åã¬ã€ããšã³ã¢ã¢ãŒããã¯ãã£æ§é ã«ã€ããŠ
éå®åããŒã¿åŠçã®ããã®ããã°ããŒã¿ã®å§ãŸãã¯ãGoogleã®ãã¡ã€ã«ã·ã¹ãã ã®ã¢ãã«ã§ããHadoop忣ãã¡ã€ã«ã·ã¹ãã ïŒHDFSïŒãšMapReduceïŒMapReduceïŒããããŠYanïŒYARNïŒãšããã¯ã©ã¹ã¿æ¡åŒµãšãªãœãŒã¹ç®¡çã®çè§£ã§ãã Hadoop Version 1ã2ã3ã®ã¢ãŒããã¯ãã£æ§é ã«ã€ããŠäžã€äžã€èŠãŠãHadoopæè¡ã®æŽå²ãã©ããªãã®ãåè¬çã®çããã«çµµãæããŸãã
- Hadoop忣ãã¡ã€ã«ã·ã¹ãã ïŒHDFSïŒãçè§£ããŠé£æºããŠã¿ãŠãã ããã
- ããããªãã¥ãŒã¹(Map/Reduce) ãã¬ãŒã ã¯ãŒã¯ã®åçãçè§£ããããã«åºã¥ããŠããŒã¿ãåæããŠã¿ãŸãã
4. HDFSã·ã§ã«æäœã¬ã€ããšJava / Pythonã§MapReduceã¢ããªã±ãŒã·ã§ã³ãäœæãã
ããŒã¿æäœã«äœ¿ãããæè¡ã¯å€æ§ã§ãããããã°ããŒã¿åæã®åºç€ã¯ããããªãã¥ãŒã¹ã¢ããªã±ãŒã·ã§ã³å¶äœã«ãããŸããããã°ã©ãã³ã°èšèªPythonïŒPythonïŒã§ãåºæ¬çãªWordCount MapReduceã¢ããªã±ãŒã·ã§ã³ããEclipseããŒã¹ã®Javaèšèªã§COVID-19ã¢ããªã±ãŒã·ã§ã³ãäœæãããŸã§ãããŸããŸãªããã°ããŒã¿ããããªãã¥ãŒã¹ã¢ããªã±ãŒã·ã§ã³ã®äœæã¯ãéžæãè¶
ããŠå¿
é ã«é²ãã¹ãæ¹åãæç€ºããŸãã
- Javaã§Hadoopãé£åããã¢ããªã±ãŒã·ã§ã³ãå®è£
ããŠã¿ãŸãã
- Pythonã§Hadoopã飿ºããã¢ããªã±ãŒã·ã§ã³ãå®è£
ããŠã¿ãŸãã
Python Map/Reduce WordCount Application Java Map/Reduce WordCount Application
äºæ³ããã質åQïŒAïŒ
Q. ããã°ããŒã¿ãšã¯äœã§ããïŒ Hadoopã䜿çšãããšãã¯ããã®å®çŸ©ãå¿
èŠã§ããïŒ
ã¯ãããã¡ããHadoopãæ±ããšãã¯å¿
ãããã°ããŒã¿ã®ç°¡åãªå®çŸ©ãšçè§£ãæ±ããŸãããã¡ãããå®ç§ã§æ·±ãã¬ãã«ã®çç¥ãå¿
èŠãšããã»ã©ã§ã¯ãããŸããããã ãHadoopãæ±ãéã«å¿
ãå¿
èŠãªçè§£åºŠãæ±ãã圢ã§ãããã
ããã°ããŒã¿ã¯HadoopããŒã«ãåããéåžžã«å€§ããªããŒã¿ã»ãããæ±ããŸãããã®ããŒã¿ã»ããã¯ã倿°ã®äŒæ¥ãæ±ãããŸããŸãªãã¿ãŒã³ããã¬ã³ããç¹å®ããããã«åæããåºç€ããŒã¿ã§ãã人éã®ç€ŸäŒçè¡åãšãã¿ãŒã³ããããŠçžäºäœçšã®äžã§ç¹°ãåºãããã人é¡ã®äŸ¡å€åµé ãšé¢é£ãç«ã£ãŠããŸãã
ç»åãœãŒã¹ïŒTechTarget ïŒãªãªãžãã«ã·ã§ãŒãã«ããïŒ
Q. Hadoopãšã¯äœã§ããïŒã³ã³ããŒãã³ãã¯äœã§ãHadoopã¹ã¿ãã¯ã¯äœã§ããïŒ
ãã©ãã€ããè¶
ããŠãã¿ïŒPetta/ZettabyteïŒã«è³ãå€§èŠæš¡ãœãŒã·ã£ã«ãµã€ãã®ããŒã¿ã åŠçããªããã°ãªããªã䜿åœãHadoopãå©ããŠããŸãã Hadoop Stackãšã¯ããã®ãããªããã°ããŒã¿ãæ±ããªãŒãã³ãœãŒã¹ã®ãã¬ãŒã ã¯ãŒã¯æ¹åŒã§ãã
åã«ãHadoopãã¯ãHadoop StackããšåŒã°ããŸããå®äŸ¡ã§æ¥åžžçãªã³ããã£ãã£ããŒããŠã§ã¢ã䜿çšããŠã¯ã©ã¹ã¿ãæ§ç¯ãããã®èšå€§ãªãµãŒããŒã®éåäœã§ããã¯ã©ã¹ã¿å
ã§å€§å®¹éããã»ã¹ãåŠçããã®ãå©ããã®ã¯ãHadoopãšHadoopã¹ã¿ãã¯ã§ãã Hadoopã¹ã¿ãã¯ã¯ãåçŽãªãããããã»ã¹ããšãåŒã°ããJavaããŒã¹ã®ã忣ã³ã³ãã¥ãŒãã£ã³ã°ãã©ãããã©ãŒã ãã§ããã ããå人ãæãã ãã®ããŒã¿ãåšæå¥ã«ããããåããŠåŠçããªãããããŒã¿ãææã®åœ¢ã«å 工忣ããŠçµæå€ãç®åºããã®ã§ãã
Q. ããã°ã©ãã³ã°ã®ç¥èãå¿
èŠã§ããïŒ
ããã°ã©ãã³ã°ã®ç¥èãã³ãŒããæžãçµéšããªããŠã倧äžå€«ã§ãã JavaãPythonãåããŠäœéšãããšèããŠæããããã«ãæ·±ãçè§£ãããšã«ææ¥ãé²ããŸããè¬çŸ©ã«æžãããææžã¯è±èªã§æžãããŠããŸãããåŸãããšã«ã¯æ¯éããªãããã«éåœèªã§è¬çŸ©ããŸããããŸã«è±èªã§èª¬æãããã®ã§ããã髿 ¡ã¬ãã«ã§ããã°è§£éã§ããªãã§ããããïŒ ïŒç§ã®äœãè±èªåã§ã倢ãå¶ããããã§ããïŒ
Q. Hadoopãæ±ãã®ã«ããã°ããŒã¿ã¯ã©ã®ãããé¢é£ããããŸããïŒ
ãã®è¬çŸ©ã¯åœç¶ããã¥ããæ±ã£ãŠããŸããåã«RDMSãšããOracleãMSSQLããããã¯MYSQLãè¶ããŠå€§å®¹éåŠçãã¯ãããããŒã¿åŠçé床ã®åé¡ãäœã³ã¹ã广ãšããäŒæ¥ã®å¿
é èŠçŽ ãåµåºããããšæããŸããç¹ã«ãœãŒã·ã£ã«ãæ±ããªããã°ãªããªãäŒæ¥ãã€ãŸãããã«è¡ãšåã«åºã¥ãããŒã¿RDMSã§æ±ããªã¬ãŒã·ã§ãã«ããŒã¿ãæ±ãã¹ãã©ã¯ãã£ããŒã¿(Structured data)ã ãã§ãªããç»åããªãŒãã£ãªãã¯ãŒãããã»ã¹ãã¡ã€ã«ãã®ãã®ãæ±ããªããã°ãªããªãã¢ã³ã¹ãã©ã¯ãã£ããŒã¿(Unstructred data)ãªã©ãããã¥ãæ±ããŸãã
ãµãŒãã¹ã¹ãã©ã¯ãã£ãŒããŒã¿ãæ±ããšãã¯ãEmailãCSVãXMLãããã³JSONãªã©ã®WebãµãŒããŒãšã®éä¿¡ãšããŒã¿é£æºã«é¢ããããŒã¿ãèšã£ãŠããŸãã HTMLãWeb SitesãNoSQL Databasesãããã«å«ãŸããŠããŸãããã¡ãããEDIãšããããžãã¹æžé¡é¢é£ã®èšç®ç§»åãããã³ã³ãã¥ãŒã¿å¯Ÿã³ã³ãã¥ãŒã¿éã®ç§»ååŠçåé¡ãæ±ãéã«äœ¿ãããŒã¿ã»ããã®çޝç©ããã¯ãããã«å±ããŸãã
ç»åãœãŒã¹ïŒMonkeyLearn Blog ïŒãªãªãžãã«ã·ã§ãŒãã«ããïŒ
Q.ã©ã®çšåºŠã¬ãã«ãŸã§å
å®¹ãæ±ããŸããïŒ
ãã®ã¬ãã¹ã³ã§ã¯ãUbuntuïŒUbuntuïŒ20.04 LTSããŒã¹ã«HadoopïŒHadoopïŒ3.2.1ããŠãŒã¶ãŒãçŽæ¥ã€ã³ã¹ããŒã«ããã®ã«åœ¹ç«ã¡ãŸãã UnixãLinuxã®çµéšããªããŠãèªç¶ã«è¿œãã€ããšãLinuxãåºã«ã€ãªããã€ã³ã¹ããŒã«ã®ãã³ããšLinuxãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ãèªç¶ã«çç¥ããããšã«ãªããŸãããŸããHadoopãæ±ãCLIèšèªããŠãŒã¶ãŒèšèªãç¿åŸããåºæ¬çãªéšåãè¶
ããŠãGoogleãæã£ãŠããæè¡ã§ããDFS and MapReduceæè¡ã«æ
£ããã®ã«åœ¹ç«ã¡ãŸãã YARNïŒã€ã³ïŒã«ã€ããŠã®çè§£ã¯åºç€çè«ã ããæã€ããšã«ãªããŸãã åŸã«Hadoop 3.3.0äžçŽã³ãŒã¹ã§ã¯ã©ã¹ã¿ãŒãèšçœ®ããªããã€ã³ã«ã€ããŠã®ããæ·±ãåŠç¿ãæåŸ
ããŠãã ããã
Q. Ubuntu 20.04 LTSãç·Žç¿ç°å¢ãšããŠäœ¿çšããçç±ã¯ãããŸããïŒ
Ubuntuã¯ç¡æã§å©çšå¯èœã§ãLTSïŒLong-Term ServiceïŒãéããŠé·æãµãŒãã¹ã®ãµããŒãã倢èŠãŠããäŒæ¥ã察象ã«ãHadoopãLinuxã«ã€ã³ã¹ããŒã«ããªãããèªç¶ã«äŒæ¥ãèŠæ±ãããªãã¬ãŒãã£ã³ã°ã·ã¹ãã ãéçºç°å¢ãæ§ç¯ããã®ã«åœ¹ç«ã¡ãŸããåãç°å¢å
ã§EclipseãIntelligentã䜿çšããããšã§ãããã°ããŒã¿ãæ±ãããŒã¿ãµã€ãšã³ã¹ã®å€¢ãå®çŸããã®ã«åœ¹ç«ã€æéããããŸãã
Ubuntuã¯Windowsãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã®ã€ã³ã¹ããŒã«ãšéå¶ã§ãã
åæ§ã®ç°å¢ãã€ãŸãã°ã©ãã£ã«ã«ãŠãŒã¶ãŒã€ã³ã¿ãŒãã§ãŒã¹ïŒGUIïŒ
ç°å¢ãéããŠãŠãŒã¶ãŒãå©ããŠããŸãã