PySpark: Creating DataFrame with one column - TypeError: Can not infer schema for type: <type 'int'>
I’ve been playing with PySpark recently, and wanted to create a DataFrame containing only one column. I tried to do this by writing the following code:
spark.createDataFrame([(1)], ["count"])
If we run that code we’ll get the following error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/markhneedham/projects/graph-algorithms/spark-2.4.0-bin-hadoop2.7/python/pyspark/sql/session.py", line 748, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/home/markhneedham/projects/graph-algorithms/spark-2.4.0-bin-hadoop2.7/python/pyspark/sql/session.py", line 416, in _createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File "/home/markhneedham/projects/graph-algorithms/spark-2.4.0-bin-hadoop2.7/python/pyspark/sql/session.py", line 348, in _inferSchemaFromList
schema = reduce(_merge_type, (_infer_schema(row, names) for row in data))
File "/home/markhneedham/projects/graph-algorithms/spark-2.4.0-bin-hadoop2.7/python/pyspark/sql/session.py", line 348, in <genexpr>
schema = reduce(_merge_type, (_infer_schema(row, names) for row in data))
File "/home/markhneedham/projects/graph-algorithms/spark-2.4.0-bin-hadoop2.7/python/pyspark/sql/types.py", line 1062, in _infer_schema
raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <type 'int'>
The problem we have is that createDataFrame
expects a tuple of values, and we’ve given it an integer.
Luckily we can fix this reasonably easily by passing in a single item tuple:
spark.createDataFrame([(1,)], ["count"])
If we run that code we’ll get the expected DataFrame:
count |
---|
1 |
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.