Skip to content Skip to sidebar Skip to footer

Pyspark: How To Covert Column With Ljava.lang.object

I created data frame in PySpark by reading data from HDFS like this: df = spark.read.parquet('path/to/parquet') I expect the data frame to have two column of strings: +-----------

Solution 1:

Jaroslav,

I tried with the following code, and have used a sample parquet file from here. I am able to get the desired output from the dataframe, can u please chk your code using the code snippet below and also sample file referred above to see if there's any other issue:

from pyspark.sql importSparkSessionspark= SparkSession.builder.appName("Read a Parquet file").getOrCreate()
df = spark.read.parquet('E:\\...\\..\\userdata1.parquet')
df.show(10)
df.printSchema()

Replace the path to your HDFS location.

Dataframe output for your reference:

enter image description here

Post a Comment for "Pyspark: How To Covert Column With Ljava.lang.object"