01.创建Maven工程maven配置:配置依赖的包groupId=org.apache.sparkartifactId=spark-core_2.11version=2.2.002.配置与连接04.构建DatasetDataFrame构建数据并计算方式一:使用Dataset或者DataFrameimportjava.io.Fileimportorg.apache.spark.sql.Rowimportorg.apache.spark.sql.SparkSessionimportspark.sql//getAbsolutePath():返回抽象路径名的绝对路径名字符串valwarehouseLocation=newFile("spark-warehouse").getAbsolutePathvalspd=SparkSession.builder().master("local").appName("Word Count").config("spark.sql.warehouse.dir",warehouseLocation).enableHiveSupport().getOrCreate()//从hive中读取数据spd.sql("use testdatabase")sql("SELECT * FROM src").show()//valdf=spd.read.json("examples/src/main/resources/people.json")importspark.implicits._df.show()df.printSchema()df.createOrReplaceTempView("people")valsqlDF=spark.sql("SELECT * FROM people")sqlDF.show()//aglobaltemporaryviewdf.createGlobalTempView("people")spark.sql("SELECT * FROM global_temp.people").show()方式二,使用RDD:valSparkConf=newSparkConf().setMaster("mr.master")valsc=newSparkContext(SparkConf)04.将程序打包---手动模式步骤、1>File---->ProjectStructure----->Artifact-->+<playDayCount>步骤、2>Build------>BUildArtifact在这个文件夹下C:\Items\Spark_Odr\out\artifacts\Spark_Odr_jar会有Spark_Order.jar的jar包05.上传Jar包,并运行001.上传jar包:Spark_Order.jar002.集群运行Spark---设定系统任务提交bin/spark-submit脚本执行应用配置开发环境spark-submit--num-executors4--executor-cores1--executor-memory4G--masteryarn-cluster--queueqmc--classcom.text.Score/opt/clrDir/Spark_test.jar
参考:
Hive在spark2.0.0启动时无法访问spark-assembly-*.jar的解决办法
http://blog.csdn.net/wjqwinn/article/details/52692308
Spark SQL, DataFrames and Datasets Guide
http://spark.apache.org/docs/2.2.0/sql-programming-guide.html