pyspark.sql.DataFrameReader.jdbc¶
-
DataFrameReader.
jdbc
(url, table, column=None, lowerBound=None, upperBound=None, numPartitions=None, predicates=None, properties=None)[source]¶ Construct a
DataFrame
representing the database table namedtable
accessible via JDBC URLurl
and connectionproperties
.Partitions of the table will be retrieved in parallel if either
column
orpredicates
is specified.lowerBound
,upperBound
andnumPartitions
is needed whencolumn
is specified.If both
column
andpredicates
are specified,column
will be used.New in version 1.4.0.
- Parameters
- urlstr
a JDBC URL of the form
jdbc:subprotocol:subname
- tablestr
the name of the table
- columnstr, optional
the name of a column of numeric, date, or timestamp type that will be used for partitioning; if this parameter is specified, then
numPartitions
,lowerBound
(inclusive), andupperBound
(exclusive) will form partition strides for generated WHERE clause expressions used to split the columncolumn
evenly- lowerBoundstr or int, optional
the minimum value of
column
used to decide partition stride- upperBoundstr or int, optional
the maximum value of
column
used to decide partition stride- numPartitionsint, optional
the number of partitions
- predicateslist, optional
a list of expressions suitable for inclusion in WHERE clauses; each one defines one partition of the
DataFrame
- propertiesdict, optional
a dictionary of JDBC database connection arguments. Normally at least properties “user” and “password” with their corresponding values. For example { ‘user’ : ‘SYSTEM’, ‘password’ : ‘mypassword’ }
- Returns
Notes
Don’t create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.