Skip to content Skip to sidebar Skip to footer
Showing posts with the label Pyspark

Pivot Row To Column Level

I have a spark dataframe t which is the result of a spark.sql('...') query. Here is the fir… Read more Pivot Row To Column Level

Flatten Nested Array In Spark Dataframe

I'm reading in some JSON on the from: {'a': [{'b': {'c': 1, 'd'… Read more Flatten Nested Array In Spark Dataframe

Unable To Write Pyspark Dataframe Created From Two Zipped Dataframes

I am trying to follow the example given here for combining two dataframes without a shared join key… Read more Unable To Write Pyspark Dataframe Created From Two Zipped Dataframes

Gcp Dataproc Custom Image Python Environment

I have an issue when I create a DataProc custom image and Pyspark. My custom image is based on Data… Read more Gcp Dataproc Custom Image Python Environment

Spark-submit With Specific Python Librairies

I have a pyspark code depending on third party librairies. I want to execute this code on my cluste… Read more Spark-submit With Specific Python Librairies

Selecting Empty Array Values From A Spark Dataframe

Given a DataFrame with the following rows: rows = [ Row(col1='abc', col2=[8], col3=[18]… Read more Selecting Empty Array Values From A Spark Dataframe

Get Value Out Of Dataframe

In Scala I can do get(#) or getAs[Type](#) to get values out of a dataframe. How should I do it in … Read more Get Value Out Of Dataframe

How To Use Pandas Udf Functionality In Pyspark

I have a spark frame with two columns which looks like: +------------------------------------------… Read more How To Use Pandas Udf Functionality In Pyspark