IO interfaces. More...

Classes
class	arrow_io_source
	Implementation class for reading from an Apache Arrow file. The file could be a memory-mapped file or other implementation supported by Arrow. More...

class	avro_reader_options
	Settings to use for `read_avro()`. More...

class	avro_reader_options_builder
	Builder to build options for `read_avro()`. More...

class	csv_reader_options
	Settings to use for `read_csv()`. More...

class	csv_reader_options_builder
	Builder to build options for `read_csv()`. More...

class	csv_writer_options
	Settings to use for `write_csv()`. More...

class	csv_writer_options_builder
	Builder to build options for `writer_csv()` More...

class	data_sink
	Interface class for storing the output data from the writers. More...

class	datasource
	Interface class for providing input data to the readers. More...

struct	schema_element
	Allows specifying the target types for nested JSON data via json_reader_options' `set_dtypes` method. More...

class	json_reader_options
	Input arguments to the `read_json` interface. More...

class	json_reader_options_builder
	Builds settings to use for `read_json()`. More...

class	json_writer_options
	Settings to use for `write_json()`. More...

class	json_writer_options_builder
	Builder to build options for `writer_json()` More...

class	orc_reader_options
	Settings to use for `read_orc()`. More...

class	orc_reader_options_builder
	Builds settings to use for `read_orc()`. More...

class	orc_writer_options
	Settings to use for `write_orc()`. More...

class	orc_writer_options_builder
	Builds settings to use for `write_orc()`. More...

class	chunked_orc_writer_options
	Settings to use for `write_orc_chunked()`. More...

class	chunked_orc_writer_options_builder
	Builds settings to use for `write_orc_chunked()`. More...

class	orc_chunked_writer
	Chunked orc writer class writes an ORC file in a chunked/stream form. More...

struct	raw_orc_statistics
	Holds column names and buffers containing raw file-level and stripe-level statistics. More...

struct	minmax_statistics
	Base class for column statistics that include optional minimum and maximum. More...

struct	sum_statistics
	Base class for column statistics that include an optional sum. More...

struct	integer_statistics
	Statistics for integral columns. More...

struct	double_statistics
	Statistics for floating point columns. More...

struct	string_statistics
	Statistics for string columns. More...

struct	bucket_statistics
	Statistics for boolean columns. More...

struct	decimal_statistics
	Statistics for decimal columns. More...

struct	timestamp_statistics
	Statistics for timestamp columns. More...

struct	column_statistics
	Contains per-column ORC statistics. More...

struct	parsed_orc_statistics
	Holds column names and parsed file-level and stripe-level statistics. More...

struct	orc_column_schema
	Schema of an ORC column, including the nested columns. More...

struct	orc_schema
	Schema of an ORC file. More...

class	orc_metadata
	Information about content of an ORC file. More...

class	parquet_reader_options
	Settings for `read_parquet()`. More...

class	parquet_reader_options_builder
	Builds parquet_reader_options to use for `read_parquet()`. More...

class	chunked_parquet_reader
	The chunked parquet reader class to read Parquet file iteratively in to a series of tables, chunk by chunk. More...

class	parquet_writer_options
	Settings for `write_parquet()`. More...

class	parquet_writer_options_builder
	Class to build `parquet_writer_options`. More...

class	chunked_parquet_writer_options
	Settings for `write_parquet_chunked()`. More...

class	chunked_parquet_writer_options_builder
	Builds options for chunked_parquet_writer_options. More...

class	parquet_chunked_writer
	chunked parquet writer class to handle options and write tables in chunks. More...

struct	parquet_column_schema
	Schema of a parquet column, including the nested columns. More...

struct	parquet_schema
	Schema of a parquet file. More...

class	parquet_metadata
	Information about content of a parquet file. More...

class	writer_compression_statistics
	Statistics about compression performed by a writer. More...

struct	column_name_info
	Detailed name (and optionally nullability) information for output columns. More...

struct	table_metadata
	Table metadata returned by IO readers. More...

struct	table_with_metadata
	Table with table metadata used by io readers to return the metadata by value. More...

struct	host_buffer
	Non-owning view of a host memory buffer. More...

struct	source_info
	Source information for read interfaces. More...

struct	sink_info
	Destination information for write interfaces. More...

class	column_in_metadata
	Metadata for a column. More...

class	table_input_metadata
	Metadata for a table. More...

struct	partition_info
	Information used while writing partitioned datasets. More...

class	reader_column_schema
	schema element for reader More...

Typedefs
using	no_statistics = std::monostate
	Monostate type alias for the statistics variant.

using	date_statistics = minmax_statistics< int32_t >
	Statistics for date(time) columns.

using	binary_statistics = sum_statistics< int64_t >
	Statistics for binary columns. More...

Enumerations
enum class	json_recovery_mode_t { FAIL , RECOVER_WITH_NULL }
	Control the error recovery behavior of the json parser. More...

enum class	compression_type { NONE , AUTO , SNAPPY , GZIP , BZIP2 , BROTLI , ZIP , XZ , ZLIB , LZ4 , LZO , ZSTD }
	Compression algorithms. More...

enum class	io_type { FILEPATH , HOST_BUFFER , DEVICE_BUFFER , VOID , USER_IMPLEMENTED }
	Data source or destination types. More...

enum class	quote_style { MINIMAL , ALL , NONNUMERIC , NONE }
	Behavior when handling quotations in field data. More...

enum	statistics_freq { STATISTICS_NONE = 0 , STATISTICS_ROWGROUP = 1 , STATISTICS_PAGE = 2 , STATISTICS_COLUMN = 3 }
	Column statistics granularity type for parquet/orc writers. More...

enum	dictionary_policy { NEVER = 0 , ADAPTIVE = 1 , ALWAYS = 2 }
	Control use of dictionary encoding for parquet writer. More...

Functions
table_with_metadata	read_avro (avro_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
	Reads an Avro dataset into a set of columns. More...

table_with_metadata	read_csv (csv_reader_options options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
	Reads a CSV dataset into a set of columns. More...

void	write_csv (csv_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
	Writes a set of columns to CSV format. More...

table_with_metadata	read_json (json_reader_options options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
	Reads a JSON dataset into a set of columns. More...

void	write_json (json_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
	Writes a set of columns to JSON format. More...

table_with_metadata	read_orc (orc_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
	Reads an ORC dataset into a set of columns. More...

void	write_orc (orc_writer_options const &options)
	Writes a set of columns to ORC format. More...

raw_orc_statistics	read_raw_orc_statistics (source_info const &src_info)
	Reads file-level and stripe-level statistics of ORC dataset. More...

parsed_orc_statistics	read_parsed_orc_statistics (source_info const &src_info)
	Reads file-level and stripe-level statistics of ORC dataset. More...

orc_metadata	read_orc_metadata (source_info const &src_info)
	Reads metadata of ORC dataset. More...

table_with_metadata	read_parquet (parquet_reader_options const &options, rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
	Reads a Parquet dataset into a set of columns. More...

std::unique_ptr< std::vector< uint8_t > >	write_parquet (parquet_writer_options const &options)
	Writes a set of columns to parquet format. More...

std::unique_ptr< std::vector< uint8_t > >	merge_row_group_metadata (std::vector< std::unique_ptr< std::vector< uint8_t >>> const &metadata_list)
	Merges multiple raw metadata blobs that were previously created by write_parquet into a single metadata blob. More...

parquet_metadata	read_parquet_metadata (source_info const &src_info)
	Reads metadata of parquet dataset. More...

template<typename T >
constexpr auto	is_byte_like_type ()
	Returns `true` if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes. More...

Variables
constexpr size_t	default_stripe_size_bytes = 64 * 1024 * 1024
	64MB default orc stripe size

constexpr size_type	default_stripe_size_rows = 1000000
	1M rows default orc stripe rows

constexpr size_type	default_row_index_stride = 10000
	10K rows default orc row index stride

constexpr size_t	default_row_group_size_bytes = 128 * 1024 * 1024
	128MB per row group

constexpr size_type	default_row_group_size_rows = 1000000
	1 million rows per row group

constexpr size_t	default_max_page_size_bytes = 512 * 1024
	512KB per page

constexpr size_type	default_max_page_size_rows = 20000
	20k rows per page

constexpr int32_t	default_column_index_truncate_length = 64
	truncate to 64 bytes

constexpr size_t	default_max_dictionary_size = 1024 * 1024
	1MB dictionary size

constexpr size_type	default_max_page_fragment_size = 5000
	5000 rows per page fragment

Detailed Description

IO interfaces.

Typedef Documentation

◆ binary_statistics

using cudf::io::binary_statistics = typedef sum_statistics<int64_t>

Statistics for binary columns.

The sum is the total number of bytes across all elements.

Definition at line 135 of file orc_metadata.hpp.

Enumeration Type Documentation

◆ compression_type

enum cudf::io::compression_type

strong

Compression algorithms.

Enumerator
NONE	No compression.
AUTO	Automatically detect or select compression format.
SNAPPY	Snappy format, using byte-oriented LZ77.
GZIP	GZIP format, using DEFLATE algorithm.
BZIP2	BZIP2 format, using Burrows-Wheeler transform.
BROTLI	BROTLI format, using LZ77 + Huffman + 2nd order context modeling.
ZIP	ZIP format, using DEFLATE algorithm.
XZ	XZ format, using LZMA(2) algorithm.
ZLIB	ZLIB format, using DEFLATE algorithm.
LZ4	LZ4 format, using LZ77.
LZO	Lempel–Ziv–Oberhumer format.
ZSTD	Zstandard format.

Definition at line 50 of file io/types.hpp.

◆ dictionary_policy

enum cudf::io::dictionary_policy

Control use of dictionary encoding for parquet writer.

Enumerator
NEVER	Never use dictionary encoding.
ADAPTIVE	Use dictionary when it will not impact compression.
ALWAYS	Use dictionary regardless of impact on compression.

Definition at line 197 of file io/types.hpp.

◆ io_type

enum cudf::io::io_type

strong

Data source or destination types.

Enumerator
FILEPATH	Input/output is a file path.
HOST_BUFFER	Input/output is a buffer in host memory.
DEVICE_BUFFER	Input/output is a buffer in device memory.
VOID	Input/output is nothing. No work is done. Useful for benchmarking.
USER_IMPLEMENTED	Input/output is handled by a custom user class.

Definition at line 68 of file io/types.hpp.

◆ quote_style

enum cudf::io::quote_style

strong

Behavior when handling quotations in field data.

Enumerator
MINIMAL	Quote only fields which contain special characters.
ALL	Quote all fields.
NONNUMERIC	Quote all non-numeric fields.
NONE	Never quote fields; disable quotation parsing.

Definition at line 79 of file io/types.hpp.

◆ statistics_freq

enum cudf::io::statistics_freq

Column statistics granularity type for parquet/orc writers.

Enumerator
STATISTICS_NONE	No column statistics.
STATISTICS_ROWGROUP	Per-Rowgroup column statistics.
STATISTICS_PAGE	Per-page column statistics.
STATISTICS_COLUMN	Full column and offset indices. Implies STATISTICS_ROWGROUP.

Definition at line 89 of file io/types.hpp.

Function Documentation

◆ is_byte_like_type()

template<typename T >

constexpr auto cudf::io::is_byte_like_type ( )

inlineconstexpr

Returns true if the type is byte-like, meaning it is reasonable to pass as a pointer to bytes.

Template Parameters

T	The representation type

Returns: true if the type is considered a byte-like type

Definition at line 277 of file io/types.hpp.

Classes

Typedefs

Enumerations

Functions

Variables

Detailed Description

Typedef Documentation

◆ binary_statistics

Enumeration Type Documentation

◆ compression_type

◆ dictionary_policy

◆ io_type

◆ quote_style

◆ statistics_freq

Function Documentation

◆ is_byte_like_type()