{channel.title!channel.name} - 楚淮公司周东林

Could not parse Master URL:

遇到Could not parse Master URL:问题,百度了好多便都说是解析不了master需要设置hosts 文件,检查了半天也找不出原因。最后看了官方,提交方法测试了下成功了。


报错信息:
17/05/22 11:23:12 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Could not parse Master URL: 'Master:8088'
org.apache.spark.SparkException: Could not parse Master URL: 'Master:8088'
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2550)
	at org.apache.spark.SparkContext.(SparkContext.scala:501)
	at app$.main(app.scala:6)
	at app.main(app.scala)

根据网上搜索到的方法:

具体实现为: val conf = new SparkConf().setAppName("AppName").setMaster("local[3] or masterIP:PORT")
val sc = new SparkContext(conf)

如下代码:

import org.apache.spark.{SparkConf, SparkContext}
object app{
  def main(args: Array[String]): Unit = {
    val logFile = "/user/root/README.md"
    val conf =new SparkConf().setAppName("Simple Application").setMaster("Master")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

加不加端口,都报上面的Could not parse Master URL错:几乎崩溃,改了其他的几个YARN端口也不行。看了下spark官方文档

To launch a Spark application in cluster mode:

$ ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options]  [app options]

For example:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue thequeue \
    lib/spark-examples*.jar \
    10

看到master 后面有个yarn,心想是不是自动扫描的把代码里面

.setMaster("Master")

给去掉,重新生成jar文件在执行

[root@python-vim spark]# ./bin/spark-submit --class app    --master yarn     --deploy-mode cluster     --driver-memory 2g     --executor-memory 1g     --executor-cores 1     /home/MySpark/myfirt.jar

成功了,下面是执行结果:

17/10/19 23:32:42 INFO yarn.Client: Application report for application_1508404051407_0009 (state: ACCEPTED)
17/10/19 23:32:43 INFO yarn.Client: Application report for application_1508404051407_0009 (state: ACCEPTED)
17/10/19 23:32:44 INFO yarn.Client: Application report for application_1508404051407_0009 (state: ACCEPTED)
17/10/19 23:32:45 INFO yarn.Client: Application report for application_1508404051407_0009 (state: RUNNING)
17/10/19 23:32:45 INFO yarn.Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.134.100
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1508427160526
         final status: UNDEFINED
         tracking URL: http://Master:8088/proxy/application_1508404051407_0009/
         user: root
17/10/19 23:32:46 INFO yarn.Client: Application report for application_1508404051407_0009 (state: RUNNING)
17/10/19 23:32:47 INFO yarn.Client: Application report for application_1508404051407_0009 (state: RUNNING)
17/10/19 23:32:48 INFO yarn.Client: Application report for application_1508404051407_0009 (state: RUNNING)
17/10/19 23:32:49 INFO yarn.Client: Application report for application_1508404051407_0009 (state: RUNNING)
17/10/19 23:32:50 INFO yarn.Client: Application report for application_1508404051407_0009 (state: RUNNING)
17/10/19 23:32:51 INFO yarn.Client: Application report for application_1508404051407_0009 (state: RUNNING)
17/10/19 23:32:52 INFO yarn.Client: Application report for application_1508404051407_0009 (state: RUNNING)
17/10/19 23:32:53 INFO yarn.Client: Application report for application_1508404051407_0009 (state: FINISHED)
17/10/19 23:32:53 INFO yarn.Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.134.100
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1508427160526
         final status: SUCCEEDED
         tracking URL: http://Master:8088/proxy/application_1508404051407_0009/
         user: root
17/10/19 23:32:53 INFO yarn.Client: Deleted staging directory hdfs://Master:9000/user/root/.sparkStaging/application_1508404051407_0009
17/10/19 23:32:53 INFO util.ShutdownHookManager: Shutdown hook called
17/10/19 23:32:53 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-404737d2-618e-4881-b738-b3abbbb5a756

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

17/05/24 01:43:43 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
17/05/24 01:43:43 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/05/24 01:43:43 INFO memory.MemoryStore: MemoryStore cleared
17/05/24 01:43:43 INFO storage.BlockManager: BlockManager stopped
17/05/24 01:43:43 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/05/24 01:43:43 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/05/24 01:43:43 INFO spark.SparkContext: Successfully stopped SparkContext
17/05/24 01:43:43 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
17/05/24 01:43:43 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
17/05/24 01:43:43 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://Master:9000/user/root/.sparkStaging/application_1508404051407_0009
17/05/24 01:43:43 INFO util.ShutdownHookManager: Shutdown hook called
17/05/24 01:43:43 INFO util.ShutdownHookManager: Deleting directory /usr/local/hadoop/tmp/nm-local-dir/usercache/root/appcache/application_1508404051407_0009/spark-0d6d0c34-3e10-4139-aa22-a6875b0b0dd6
Lines with a: 62, Lines with b: 30

重点说明下这里的

logFile = "/user/root/README.md"

是hdfs里面的地址不是本机地址。不然运行提示会找不到文件,下面继续学习了。还是安心先学好scala语言在看源码,这样更效率,一口吃成胖子是不行的。

下面这段python就可以正常运行,鄙视下java

from pyspark import SparkContext,SparkConf

if __name__ == "__main__":
    conf = SparkConf().setAppName("MyFirstApp").setMaster("master")
    sc = SparkContext(conf=conf)
    rizhiRDD = sc.textFile("/user/root/README.md")
    filterRDD=rizhiRDD.filter(lambda x: "w" in x)
    print  (filterRDD.count())