Column Copy#
- group column_copy
Enums
-
enum class out_of_bounds_policy : bool#
Policy to account for possible out-of-bounds indices.
NULLIFY
means to nullify output values corresponding to out-of-bounds gather_map values.DONT_CHECK
means do not check whether the indices are out-of-bounds, for better performance.Values:
-
enumerator NULLIFY#
Output values corresponding to out-of-bounds indices are null.
-
enumerator DONT_CHECK#
No bounds checking is performed, better performance.
-
enumerator NULLIFY#
-
enum class mask_allocation_policy : int32_t#
Indicates when to allocate a mask, based on an existing mask.
Values:
-
enumerator NEVER#
Do not allocate a null mask, regardless of input.
-
enumerator RETAIN#
Allocate a null mask if the input contains one.
-
enumerator ALWAYS#
Allocate a null mask, regardless of input.
-
enumerator NEVER#
Functions
-
std::unique_ptr<table> reverse(table_view const &source_table, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Reverses the rows within a table.
Creates a new table that is the reverse of
source_table
. Example:source = [[4,5,6], [7,8,9], [10,11,12]] return = [[6,5,4], [9,8,7], [12,11,10]]
- Parameters:
source_table – Table that will be reversed
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
Reversed table
-
std::unique_ptr<column> reverse(column_view const &source_column, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Reverses the elements of a column.
Creates a new column that is the reverse of
source_column
. Example:source = [4,5,6] return = [6,5,4]
- Parameters:
source_column – Column that will be reversed
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
Reversed column
-
std::unique_ptr<column> empty_like(column_view const &input)#
Initializes and returns an empty column of the same type as the
input
.- Parameters:
input – [in] Immutable view of input column to emulate
- Returns:
An empty column of same type as
input
-
std::unique_ptr<column> empty_like(scalar const &input)#
Initializes and returns an empty column of the same type as the
input
.- Parameters:
input – [in] Scalar to emulate
- Returns:
An empty column of same type as
input
-
std::unique_ptr<column> allocate_like(column_view const &input, mask_allocation_policy mask_alloc = mask_allocation_policy::RETAIN, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Creates an uninitialized new column of the same size and type as the
input
.Supports only fixed-width types.
If the
mask_alloc
allocates a validity mask that mask is also uninitialized and the validity bits and the null count should be set by the caller.- Parameters:
input – Immutable view of input column to emulate
mask_alloc – Optional, Policy for allocating null mask. Defaults to RETAIN
mr – Device memory resource used to allocate the returned column’s device memory
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
A column with sufficient uninitialized capacity to hold the same number of elements as
input
of the same type asinput.type()
-
std::unique_ptr<column> allocate_like(column_view const &input, size_type size, mask_allocation_policy mask_alloc = mask_allocation_policy::RETAIN, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Creates an uninitialized new column of the specified size and same type as the
input
.Supports only fixed-width types.
If the
mask_alloc
allocates a validity mask that mask is also uninitialized and the validity bits and the null count should be set by the caller.- Parameters:
input – Immutable view of input column to emulate
size – The desired number of elements that the new column should have capacity for
mask_alloc – Optional, Policy for allocating null mask. Defaults to RETAIN
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
A column with sufficient uninitialized capacity to hold the specified number of elements as
input
of the same type asinput.type()
-
std::unique_ptr<table> empty_like(table_view const &input_table)#
Creates a table of empty columns with the same types as the
input_table
Creates the
cudf::column
objects, but does not allocate any underlying device memory for the column’s data or bitmask.- Parameters:
input_table – [in] Immutable view of input table to emulate
- Returns:
A table of empty columns with the same types as the columns in
input_table
-
void copy_range_in_place(column_view const &source, mutable_column_view &target, size_type source_begin, size_type source_end, size_type target_begin, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Copies a range of elements in-place from one column to another.
Overwrites the range of elements in
target
indicated by the indices [target_begin
,target_begin
+ N) with the elements fromsource
indicated by the indices [source_begin
,source_end
) (where N = (source_end
-source_begin
)). Use the out-of-place copy function returning std::unique_ptr<column> for uses cases requiring memory reallocation. For example for strings columns and other variable-width types.If
source
andtarget
refer to the same elements and the ranges overlap, the behavior is undefined.- Throws:
cudf::logic_error – if memory reallocation is required (e.g. for variable width types).
cudf::logic_error – for invalid range (if
source_begin
>source_end
,source_begin
< 0,source_begin
>=source.size()
,source_end
>source.size()
,target_begin
< 0, target_begin >=target.size()
, ortarget_begin
+ (source_end
-source_begin
) >target.size()
).cudf::logic_error – if
target
andsource
have different types.cudf::logic_error – if
source
has null values andtarget
is not nullable.
- Parameters:
source – The column to copy from
target – The preallocated column to copy into
source_begin – The starting index of the source range (inclusive)
source_end – The index of the last element in the source range (exclusive)
target_begin – The starting index of the target range (inclusive)
stream – CUDA stream used for device memory operations and kernel launches
-
std::unique_ptr<column> copy_range(column_view const &source, column_view const &target, size_type source_begin, size_type source_end, size_type target_begin, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Copies a range of elements out-of-place from one column to another.
Creates a new column as if an in-place copy was performed into
target
. A copy oftarget
is created first and then the elements indicated by the indices [target_begin
,target_begin
+ N) were copied from the elements indicated by the indices [source_begin
,source_end
) ofsource
(where N = (source_end
-source_begin
)). Elements outside the range are copied fromtarget
into the returned new column target.If
source
andtarget
refer to the same elements and the ranges overlap, the behavior is undefined.- Throws:
cudf::logic_error – for invalid range (if
source_begin
>source_end
,source_begin
< 0,source_begin
>=source.size()
,source_end
>source.size()
,target_begin
< 0, target_begin >=target.size()
, ortarget_begin
+ (source_end
-source_begin
) >target.size()
).cudf::logic_error – if
target
andsource
have different types.
- Parameters:
source – The column to copy from inside the range
target – The column to copy from outside the range
source_begin – The starting index of the source range (inclusive)
source_end – The index of the last element in the source range (exclusive)
target_begin – The starting index of the target range (inclusive)
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
The result target column
-
std::unique_ptr<column> copy_if_else(column_view const &lhs, column_view const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Returns a new column, where each element is selected from either
lhs
orrhs
based on the value of the corresponding element inboolean_mask
.Selects each element i in the output column from either
rhs
orlhs
using the following rule:output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs[i] : rhs[i]
- Throws:
cudf::logic_error – if lhs and rhs are not of the same type
cudf::logic_error – if lhs and rhs are not of the same length
cudf::logic_error – if boolean mask is not of type bool
cudf::logic_error – if boolean mask is not of the same length as lhs and rhs
- Parameters:
lhs – left-hand column_view
rhs – right-hand column_view
boolean_mask – column of
type_id::BOOL8
representing “left (true) / right (false)” boolean for each element. Null element represents false.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
new column with the selected elements
-
std::unique_ptr<column> copy_if_else(scalar const &lhs, column_view const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Returns a new column, where each element is selected from either
lhs
orrhs
based on the value of the corresponding element inboolean_mask
.Selects each element i in the output column from either
rhs
orlhs
using the following rule:output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs : rhs[i]
- Throws:
cudf::logic_error – if lhs and rhs are not of the same type
cudf::logic_error – if boolean mask is not of type bool
cudf::logic_error – if boolean mask is not of the same length as rhs
- Parameters:
lhs – left-hand scalar
rhs – right-hand column_view
boolean_mask – column of
type_id::BOOL8
representing “left (true) / right (false)” boolean for each element. Null element represents false.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
new column with the selected elements
-
std::unique_ptr<column> copy_if_else(column_view const &lhs, scalar const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Returns a new column, where each element is selected from either
lhs
orrhs
based on the value of the corresponding element inboolean_mask
.Selects each element i in the output column from either
rhs
orlhs
using the following rule:output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs[i] : rhs
- Throws:
cudf::logic_error – if lhs and rhs are not of the same type
cudf::logic_error – if boolean mask is not of type bool
cudf::logic_error – if boolean mask is not of the same length as lhs
- Parameters:
lhs – left-hand column_view
rhs – right-hand scalar
boolean_mask – column of
type_id::BOOL8
representing “left (true) / right (false)” boolean for each element. Null element represents false.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
new column with the selected elements
-
std::unique_ptr<column> copy_if_else(scalar const &lhs, scalar const &rhs, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Returns a new column, where each element is selected from either
lhs
orrhs
based on the value of the corresponding element inboolean_mask
.Selects each element i in the output column from either
rhs
orlhs
using the following rule:output[i] = (boolean_mask.valid(i) and boolean_mask[i]) ? lhs : rhs
- Throws:
cudf::logic_error – if boolean mask is not of type bool
- Parameters:
lhs – left-hand scalar
rhs – right-hand scalar
boolean_mask – column of
type_id::BOOL8
representing “left (true) / right (false)” boolean for each element. null element represents false.stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
new column with the selected elements
-
std::unique_ptr<scalar> get_element(column_view const &input, size_type index, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Get the element at specified index from a column.
Warning
This function is expensive (invokes a kernel launch). So, it is not recommended to be used in performance sensitive code or inside a loop.
- Throws:
cudf::logic_error – if
index
is not within the range[0, input.size())
- Parameters:
input – Column view to get the element from
index – Index into
input
to get the element atstream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned scalar’s device memory
- Returns:
Scalar containing the single value
-
std::unique_ptr<table> sample(table_view const &input, size_type const n, sample_with_replacement replacement = sample_with_replacement::FALSE, int64_t const seed = 0, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Gather
n
samples from giveninput
randomly.Example: input: {col1: {1, 2, 3, 4, 5}, col2: {6, 7, 8, 9, 10}} n: 3 replacement: false output: {col1: {3, 1, 4}, col2: {8, 6, 9}} replacement: true output: {col1: {3, 1, 1}, col2: {8, 6, 6}}
- Throws:
cudf::logic_error – if
n
>input.num_rows()
andreplacement
== FALSE.cudf::logic_error – if
n
< 0.
- Parameters:
input – View of a table to sample
n – non-negative number of samples expected from
input
replacement – Allow or disallow sampling of the same row more than once
seed – Seed value to initiate random number generator
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned table’s device memory
- Returns:
Table containing samples from
input
-
bool has_nonempty_nulls(column_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream())#
Checks if a column or its descendants have non-empty null rows.
A LIST or STRING column might have non-empty rows that are marked as null. A STRUCT OR LIST column might have child columns that have non-empty null rows. Other types of columns are deemed incapable of having non-empty null rows. E.g. Fixed width columns have no concept of an “empty” row.
Note
This function is exact. If it returns
true
, there exists one or more non-empty null elements.- Parameters:
input – The column which is (and whose descendants are) to be checked for non-empty null rows.
stream – CUDA stream used for device memory operations and kernel launches
- Returns:
true If either the column or its descendants have non-empty null rows
- Returns:
false If neither the column or its descendants have non-empty null rows
-
bool may_have_nonempty_nulls(column_view const &input)#
Approximates if a column or its descendants may have non-empty null elements.
False positives are possible, but false negatives are not.
Compared to the exact
has_nonempty_nulls()
function, this function is typically more efficient.Complexity:
Best case:
O(count_descendants(input))
Worst case:
O(count_descendants(input)) * m
, wherem
is the number of rows in the largest descendant
Note
This function is approximate.
true
: Non-empty null elements could existfalse
: Non-empty null elements definitely do not exist
- Parameters:
input – The column which is (and whose descendants are) to be checked for non-empty null rows
- Returns:
true If either the column or its descendants have null rows
- Returns:
false If neither the column nor its descendants have null rows
-
std::unique_ptr<column> purge_nonempty_nulls(column_view const &input, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())#
Copy
input
into output while purging any non-empty null rows in the column or its descendants.If the input column is not of compound type (LIST/STRING/STRUCT/DICTIONARY), the output will be the same as input.
The purge operation only applies directly to LIST and STRING columns, but it applies indirectly to STRUCT/DICTIONARY columns as well, since these columns may have child columns that are LIST or STRING.
Examples:
auto const lists = lists_column_wrapper<int32_t>{ {0,1}, {2,3}, {4,5} }.release(); cudf::detail::set_null_mask(lists->null_mask(), 1, 2, false); lists[1] is now null, but the lists child column still stores `{2,3}`. The lists column contents will be: Validity: 101 Offsets: [0, 2, 4, 6] Child: [0, 1, 2, 3, 4, 5] After purging the contents of the list's null rows, the column's contents will be: Validity: 101 Offsets: [0, 2, 2, 4] Child: [0, 1, 4, 5]
auto const strings = strings_column_wrapper{ "AB", "CD", "EF" }.release(); cudf::detail::set_null_mask(strings->null_mask(), 1, 2, false); strings[1] is now null, but the strings column still stores `"CD"`. The lists column contents will be: Validity: 101 Offsets: [0, 2, 4, 6] Child: [A, B, C, D, E, F] After purging the contents of the list's null rows, the column's contents will be: Validity: 101 Offsets: [0, 2, 2, 4] Child: [A, B, E, F]
auto const lists = lists_column_wrapper<int32_t>{ {0,1}, {2,3}, {4,5} }; auto const structs = structs_column_wrapper{ {lists}, null_at(1) }; structs[1].child is now null, but the lists column still stores `{2,3}`. The lists column contents will be: Validity: 101 Offsets: [0, 2, 4, 6] Child: [0, 1, 2, 3, 4, 5] After purging the contents of the list's null rows, the column's contents will be: Validity: 101 Offsets: [0, 2, 2, 4] Child: [0, 1, 4, 5]
- Parameters:
input – The column whose null rows are to be checked and purged
stream – CUDA stream used for device memory operations and kernel launches
mr – Device memory resource used to allocate the returned column’s device memory
- Returns:
A new column with equivalent contents to
input
, but with null rows purged
-
enum class out_of_bounds_policy : bool#