Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
161 views
in Technique[技术] by (71.8m points)

apache spark sql - How do I select columns " from the below schema?

Read a JSON file and registered a temporary table with the below schema(inferred from JSON file with Native Spark SQL inference).

df = spark.read.json('/path/to/json', multiLine=True)
babynames.registerTempTable("babynames")

Now I would like to select columns

"sid", "id", "position", "created_at", "created_meta", "updated_at", "updated_meta", "meta", "year", "first_name", "county", "sex", "count"

using Spark SQL select statement.

Here is the data source: https://data.cityofnewyork.us/api/views/25th-nujf/rows.json?accessType=DOWNLOAD


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Once you have the json file located at specific location you can read the column names as under but you need to have a better understanding of the json elements.

Using spark Sql :

val df = spark.read.option("multiline",true).json("/path/to/json")
df.createOrReplaceTempView("TestTable")
val selectedColumnsDf = spark.sql(""" Select meta.view.columns.id ,meta.view.columns.position, meta.view.createdAt  from TestTable """)

Using DataFrame Api it can be done as below :

val df = spark.read.option("multiline",true).json("/path/to/json")
val selectedColumnsDf = df.select("meta.view.columns.id","meta.view.columns.position","meta.view.createdAt")

I am just selecting the three columns just to give you an idea. you can add remaining columns as per your requirement.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...