Skip to content Skip to sidebar Skip to footer
Showing posts with the label Apache Spark

Flatten Nested Array In Spark Dataframe

I'm reading in some JSON on the from: {'a': [{'b': {'c': 1, 'd'… Read more Flatten Nested Array In Spark Dataframe

Unable To Write Pyspark Dataframe Created From Two Zipped Dataframes

I am trying to follow the example given here for combining two dataframes without a shared join key… Read more Unable To Write Pyspark Dataframe Created From Two Zipped Dataframes

Sparkexception: Only One Sparkcontext May Be Running In This Jvm (see Spark-2243)

I see several post that contain the same error as the error that I am receiving, but none are leadi… Read more Sparkexception: Only One Sparkcontext May Be Running In This Jvm (see Spark-2243)

Selecting Empty Array Values From A Spark Dataframe

Given a DataFrame with the following rows: rows = [ Row(col1='abc', col2=[8], col3=[18]… Read more Selecting Empty Array Values From A Spark Dataframe

Elasticsearch Analyze() Not Compatible With Spark In Python?

I'm using the elasticsearch-py client within PySpark using Python 3 and I'm running into a … Read more Elasticsearch Analyze() Not Compatible With Spark In Python?

Wrapping Pyspark Pipeline.__init__ And Decorators

I am trying to wrap the constructor for pyspark Pipeline.init constructor, and monkey patch in the … Read more Wrapping Pyspark Pipeline.__init__ And Decorators

Assertionerror: Col Should Be Column

How to create a new column in PySpark and fill this column with the date of today? This is what I t… Read more Assertionerror: Col Should Be Column

Splitting A Column In Pyspark

I am trying to split a dataframe in pyspark This is the data i have df = sc.parallelize([[1, '… Read more Splitting A Column In Pyspark