libcudf  23.12.00
Files | Enumerations | Functions
Modifying

Files

file  padding.hpp
 
file  strings/reverse.hpp
 
file  side_type.hpp
 
file  strip.hpp
 
file  translate.hpp
 
file  wrap.hpp
 

Enumerations

enum class  cudf::strings::side_type { cudf::strings::LEFT , cudf::strings::RIGHT , cudf::strings::BOTH }
 Direction identifier for cudf::strings::strip and cudf::strings::pad functions. More...
 
enum class  cudf::strings::filter_type : bool { cudf::strings::KEEP , cudf::strings::REMOVE }
 Removes or keeps the specified character ranges in cudf::strings::filter_characters. More...
 

Functions

std::unique_ptr< columncudf::strings::pad (strings_column_view const &input, size_type width, side_type side=side_type::RIGHT, std::string_view fill_char=" ", rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Add padding to each string using a provided character. More...
 
std::unique_ptr< columncudf::strings::zfill (strings_column_view const &input, size_type width, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Add '0' as padding to the left of each string. More...
 
std::unique_ptr< columncudf::strings::reverse (strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Reverses the characters within each string. More...
 
std::unique_ptr< columncudf::strings::strip (strings_column_view const &input, side_type side=side_type::BOTH, string_scalar const &to_strip=string_scalar(""), rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Removes the specified characters from the beginning or end (or both) of each string. More...
 
std::unique_ptr< columncudf::strings::translate (strings_column_view const &input, std::vector< std::pair< char_utf8, char_utf8 >> const &chars_table, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Translates individual characters within each string. More...
 
std::unique_ptr< columncudf::strings::filter_characters (strings_column_view const &input, std::vector< std::pair< cudf::char_utf8, cudf::char_utf8 >> characters_to_filter, filter_type keep_characters=filter_type::KEEP, string_scalar const &replacement=string_scalar(""), rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Removes ranges of characters from each string in a strings column. More...
 
std::unique_ptr< columncudf::strings::wrap (strings_column_view const &input, size_type width, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Wraps strings onto multiple lines shorter than width by replacing appropriate white space with new-line characters (ASCII 0x0A). More...
 

Detailed Description

Enumeration Type Documentation

◆ filter_type

enum cudf::strings::filter_type : bool
strong

Removes or keeps the specified character ranges in cudf::strings::filter_characters.

Enumerator
KEEP 

All characters but those specified are removed.

REMOVE 

Only the specified characters are removed.

Definition at line 65 of file translate.hpp.

◆ side_type

Direction identifier for cudf::strings::strip and cudf::strings::pad functions.

Enumerator
LEFT 

strip/pad characters from the beginning of the string

RIGHT 

strip/pad characters from the end of the string

BOTH 

strip/pad characters from the beginning and end of the string

Definition at line 29 of file side_type.hpp.

Function Documentation

◆ filter_characters()

std::unique_ptr<column> cudf::strings::filter_characters ( strings_column_view const &  input,
std::vector< std::pair< cudf::char_utf8, cudf::char_utf8 >>  characters_to_filter,
filter_type  keep_characters = filter_type::KEEP,
string_scalar const &  replacement = string_scalar(""),
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Removes ranges of characters from each string in a strings column.

This can also be used to keep only the specified character ranges and remove all others from each string.

Example:
s = ["aeiou", "AEIOU", "0123456789", "bcdOPQ5"]
f = [{'M','Z'}, {'a','l'}, {'4','6'}]
r1 = filter_characters(s, f)
r1 is now ["aei", "OU", "456", "bcdOPQ5"]
r2 = filter_characters(s, f, REMOVE)
r2 is now ["ou", "AEI", "0123789", ""]
r3 = filter_characters(s, f, KEEP, "*")
r3 is now ["aei**", "***OU", "****456***", "bcdOPQ5"]

Null string entries result in null entries in the output column.

Exceptions
cudf::logic_errorif replacement is invalid
Parameters
inputStrings instance for this operation
characters_to_filterTable of character ranges to filter on
keep_charactersIf true, the characters_to_filter are retained and all other characters are removed
replacementOptional replacement string for each character removed
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column with filtered strings

◆ pad()

std::unique_ptr<column> cudf::strings::pad ( strings_column_view const &  input,
size_type  width,
side_type  side = side_type::RIGHT,
std::string_view  fill_char = " ",
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Add padding to each string using a provided character.

If the string is already width or more characters, no padding is performed. Also, no strings are truncated.

Null string entries result in corresponding null entries in the output column.

Example:
s = ['aa','bbb','cccc','ddddd']
r = pad(s,4)
r is now ['aa ','bbb ','cccc','ddddd']
Parameters
inputStrings instance for this operation
widthThe minimum number of characters for each string
sideWhere to place the padding characters; Default is pad right (left justify)
fill_charSingle UTF-8 character to use for padding; Default is the space character
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column with padded strings

◆ reverse()

std::unique_ptr<column> cudf::strings::reverse ( strings_column_view const &  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Reverses the characters within each string.

Any null string entries return corresponding null output column entries.

Example:
s = ["abcdef", "12345", "", "A"]
r = reverse(s)
r is now ["fedcba", "54321", "", "A"]
Parameters
inputStrings column for this operation
mrDevice memory resource used to allocate the returned column's device memory
streamCUDA stream used for device memory operations and kernel launches
Returns
New strings column

◆ strip()

std::unique_ptr<column> cudf::strings::strip ( strings_column_view const &  input,
side_type  side = side_type::BOTH,
string_scalar const &  to_strip = string_scalar(""),
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Removes the specified characters from the beginning or end (or both) of each string.

The to_strip parameter can contain one or more characters. All characters in to_strip are removed from the input strings.

If to_strip is the empty string, whitespace characters are removed. Whitespace is considered the space character plus control characters like tab and line feed.

Any null string entries return corresponding null output column entries.

Example:
s = [" aaa ", "_bbbb ", "__cccc ", "ddd", " ee _ff gg_"]
r = strip(s,both," _")
r is now ["aaa", "bbbb", "cccc", "ddd", "ee _ff gg"]
Exceptions
cudf::logic_errorif to_strip is invalid.
Parameters
inputStrings column for this operation
sideIndicates characters are to be stripped from the beginning, end, or both of each string; Default is both
to_stripUTF-8 encoded characters to strip from each string; Default is empty string which indicates strip whitespace characters
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory.
Returns
New strings column.

◆ translate()

std::unique_ptr<column> cudf::strings::translate ( strings_column_view const &  input,
std::vector< std::pair< char_utf8, char_utf8 >> const &  chars_table,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Translates individual characters within each string.

This can also be used to remove a character by specifying 0 for the corresponding table entry.

Null string entries result in null entries in the output column.

Example:
s = ["aa","bbb","cccc","abcd"]
t = [['a','A'],['b',''],['d':'Q']]
r = translate(s,t)
r is now ["AA", "", "cccc", "AcQ"]
Parameters
inputStrings instance for this operation
chars_tableTable of UTF-8 character mappings
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column with padded strings

◆ wrap()

std::unique_ptr<column> cudf::strings::wrap ( strings_column_view const &  input,
size_type  width,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Wraps strings onto multiple lines shorter than width by replacing appropriate white space with new-line characters (ASCII 0x0A).

For each string row in the input column longer than width, the corresponding output string row will have newline characters inserted so that each line is no more than width characters. Attempts to use existing white space locations to split the strings, but may split non-white-space sequences if necessary.

Any null string entries return corresponding null output column entries.

Example 1:

width = 3
input_string_tbl = [ "12345", "thesé", nullptr, "ARE THE", "tést strings", "" ];
wrapped_string_tbl = wrap(input_string_tbl, width)
wrapped_string_tbl = [ "12345", "thesé", nullptr, "ARE\nTHE", "tést\nstrings", "" ]
std::unique_ptr< column > wrap(strings_column_view const &input, size_type width, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
Wraps strings onto multiple lines shorter than width by replacing appropriate white space with new-li...

Example 2:

width = 12;
input_string_tbl = ["the quick brown fox jumped over the lazy brown dog", "hello, world"]
wrapped_string_tbl = wrap(input_string_tbl, width)
wrapped_string_tbl = ["the quick\nbrown fox\njumped over\nthe lazy\nbrown dog", "hello, world"]
Parameters
inputString column
widthMaximum character width of a line within each string
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
Column of wrapped strings

◆ zfill()

std::unique_ptr<column> cudf::strings::zfill ( strings_column_view const &  input,
size_type  width,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Add '0' as padding to the left of each string.

This is equivalent to ‘pad(width,left,'0’)` but preserves the sign character if it appears in the first position.

If the string is already width or more characters, no padding is performed. No strings are truncated.

Null rows in the input result in corresponding null rows in the output column.

Example:
s = ['1234','-9876','+0.34','-342567', '2+2']
r = zfill(s,6)
r is now ['001234','-09876','+00.34','-342567', '0002+2']
Parameters
inputStrings instance for this operation
widthThe minimum number of characters for each string
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column of strings