hadoop - Queries about YARN (failure modes, container size, practical example) -


i want ask few questions understand working of yarn:

  1. anyone can explain or refer document can failure modes in yarn (i.e. task failure, application master failure, node manager failure, resource manager failure)
  2. what container size in yarn? same slot in map reduce 1?
  3. any practical/working example of yarn ? thank you

refer hadoop definitive guide text book ... apart there lot of info in apache web site.

container size not fixed dynamically allocated based on requirement resource manager.

from developer perspective same old map-reduce work on yarn.

resourcemanager failures

in initial versions of yarn framework, resourcemanager failures meant total cluster failure, single point of failure. resourcemanager stores state of cluster, such metadata of submitted application, information on cluster resource containers, information on cluster’s general configurations, , on. therefore, if resourcemanager goes down because of hardware failure, there no way avoid manually debugging cluster , restarting resourcemanager. during time resourcemanager down, cluster unavailable, , once gets restarted, jobs need restart, half-completed jobs lose data , need restarted again. in short, restart of resourcemanager used restart running applicationmasters. latest versions of yarn address problem in 2 ways. 1 way creating active-passive resourcemanager architecture, when 1 goes down, becomes active , takes responsibility cluster. way using zookeeper resourcemanager quorum, resourcemanager state stored externally on zookeeper, , 1 resourcemanager in active state , 1 or more resourcemanagers in passive mode, waiting happen brings them active state.

applicationmaster failures when applicationmaster fails, resourcemanager starts container new applicationmaster running in application attempt. responsibility of new applicationmaster recover state of older applicationmaster, , possible when applicationmasters persist states in external location can used future reference. applicatoinmaster store state persisitant disk status till failure can recovered.

nodemanager failures if node manager fails, resourcemanager detects failure using time-out (that is, stops receiving heartbeats nodemanager). resourcemanager removes nodemanager pool of available nodemanagers. kills containers running on node & reports failure running ams. ams responsible reacting node failures, redoing work done containers running on node during fault.

container failures

container failures reported node manager resource manager , resource manager informs same application master. application restart container.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -