\htmlonly "æb"-> the first key is key('a'), the second key is key('e'), and * the third key is key('b'). \endhtmlonly* The key of a character, is an integer composed of primary order(short), -* secondary order(char), and tertiary order(char). Java strictly defines the +* secondary order(char), and tertiary order(char). Java strictly defines the * size and signedness of its primitive data types. Therefore, the static -* functions primaryOrder(), secondaryOrder(), and tertiaryOrder() return +* functions primaryOrder(), secondaryOrder(), and tertiaryOrder() return * int32_t to ensure the correctness of the key value. *
Example of the iterator usage: (without error checking) *
@@ -97,8 +97,8 @@ class UVector32; * the comparison level of the collator. The method previous() returns the * collation order of the previous character based on the comparison level of * the collator. The Collation Element Iterator moves only in one direction -* between calls to reset(), setOffset(), or setText(). That is, next() -* and previous() can not be inter-used. Whenever previous() is to be called after +* between calls to reset(), setOffset(), or setText(). That is, next() +* and previous() can not be inter-used. Whenever previous() is to be called after * next() or vice versa, reset(), setOffset() or setText() has to be called first * to reset the status, shifting pointers to either the end or the start of * the string (reset() or setText()), or the specified position (setOffset()). @@ -109,9 +109,9 @@ class UVector32; * The result of a forward iterate (next()) and reversed result of the backward * iterate (previous()) on the same string are equivalent, if collation orders * with the value 0 are ignored. -* Character based on the comparison level of the collator. A collation order -* consists of primary order, secondary order and tertiary order. The data -* type of the collation order is int32_t. +* Character based on the comparison level of the collator. A collation order +* consists of primary order, secondary order and tertiary order. The data +* type of the collation order is int32_t. * * Note, CollationElementIterator should not be subclassed. * @see Collator @@ -119,13 +119,13 @@ class UVector32; * @version 1.8 Jan 16 2001 */ class U_I18N_API CollationElementIterator U_FINAL : public UObject { -public: +public: // CollationElementIterator public data member ------------------------------ enum { /** - * NULLORDER indicates that an error has occured while processing + * NULLORDER indicates that an error has occurred while processing * @stable ICU 2.0 */ NULLORDER = (int32_t)0xffffffff @@ -141,7 +141,7 @@ class U_I18N_API CollationElementIterator U_FINAL : public UObject { */ CollationElementIterator(const CollationElementIterator& other); - /** + /** * Destructor * @stable ICU 2.0 */ @@ -176,8 +176,8 @@ class U_I18N_API CollationElementIterator U_FINAL : public UObject { /** * Gets the ordering priority of the next character in the string. * @param status the error code status. - * @return the next character's ordering. otherwise returns NULLORDER if an - * error has occured or if the end of string has been reached + * @return the next character's ordering. otherwise returns NULLORDER if an + * error has occurred or if the end of string has been reached * @stable ICU 2.0 */ int32_t next(UErrorCode& status); @@ -185,8 +185,8 @@ class U_I18N_API CollationElementIterator U_FINAL : public UObject { /** * Get the ordering priority of the previous collation element in the string. * @param status the error code status. - * @return the previous element's ordering. otherwise returns NULLORDER if an - * error has occured or if the start of string has been reached + * @return the previous element's ordering. otherwise returns NULLORDER if an + * error has occurred or if the start of string has been reached * @stable ICU 2.0 */ int32_t previous(UErrorCode& status); @@ -216,11 +216,11 @@ class U_I18N_API CollationElementIterator U_FINAL : public UObject { static inline int32_t tertiaryOrder(int32_t order); /** - * Return the maximum length of any expansion sequences that end with the + * Return the maximum length of any expansion sequences that end with the * specified comparison order. * @param order a collation order returned by previous or next. - * @return maximum size of the expansion sequences ending with the collation - * element or 1 if collation element does not occur at the end of any + * @return maximum size of the expansion sequences ending with the collation + * element or 1 if collation element does not occur at the end of any * expansion sequence * @stable ICU 2.0 */ @@ -312,9 +312,9 @@ class U_I18N_API CollationElementIterator U_FINAL : public UObject { friend class UCollationPCE; /** - * CollationElementIterator constructor. This takes the source string and the - * collation object. The cursor will walk thru the source string based on the - * predefined collation rules. If the source string is empty, NULLORDER will + * CollationElementIterator constructor. This takes the source string and the + * collation object. The cursor will walk thru the source string based on the + * predefined collation rules. If the source string is empty, NULLORDER will * be returned on the calls to next(). * @param sourceText the source string. * @param order the collation object. @@ -332,9 +332,9 @@ class U_I18N_API CollationElementIterator U_FINAL : public UObject { // but only contain the part of RBC== related to data and rules. /** - * CollationElementIterator constructor. This takes the source string and the - * collation object. The cursor will walk thru the source string based on the - * predefined collation rules. If the source string is empty, NULLORDER will + * CollationElementIterator constructor. This takes the source string and the + * collation object. The cursor will walk thru the source string based on the + * predefined collation rules. If the source string is empty, NULLORDER will * be returned on the calls to next(). * @param sourceText the source string. * @param order the collation object. diff --git a/src/duckdb/extension/icu/third_party/icu/i18n/unicode/format.h b/src/duckdb/extension/icu/third_party/icu/i18n/unicode/format.h index 96883a81..8788f77e 100644 --- a/src/duckdb/extension/icu/third_party/icu/i18n/unicode/format.h +++ b/src/duckdb/extension/icu/third_party/icu/i18n/unicode/format.h @@ -29,8 +29,8 @@ #if U_SHOW_CPLUSPLUS_API /** - * \file - * \brief C++ API: Base class for all formats. + * \file + * \brief C++ API: Base class for all formats. */ #if !UCONFIG_NO_FORMATTING @@ -40,7 +40,7 @@ #include "unicode/fieldpos.h" #include "unicode/fpositer.h" #include "unicode/parsepos.h" -#include "unicode/parseerr.h" +#include "unicode/parseerr.h" #include "unicode/locid.h" U_NAMESPACE_BEGIN @@ -245,7 +245,7 @@ class U_I18N_API Format : public UObject { UErrorCode& status) const; /** Get the locale for this format object. You can choose between valid and actual locale. - * @param type type of the locale we're looking for (valid or actual) + * @param type type of the locale we're looking for (valid or actual) * @param status error code for the operation * @return the locale * @stable ICU 2.8 @@ -254,7 +254,7 @@ class U_I18N_API Format : public UObject { #ifndef U_HIDE_INTERNAL_API /** Get the locale for this format object. You can choose between valid and actual locale. - * @param type type of the locale we're looking for (valid or actual) + * @param type type of the locale we're looking for (valid or actual) * @param status error code for the operation * @return the locale * @internal @@ -283,12 +283,12 @@ class U_I18N_API Format : public UObject { */ Format& operator=(const Format&); // Does nothing; for subclasses - + /** * Simple function for initializing a UParseError from a UnicodeString. * * @param pattern The pattern to copy into the parseError - * @param pos The position in pattern where the error occured + * @param pos The position in pattern where the error occurred * @param parseError The UParseError object to fill in * @stable ICU 2.4 */ diff --git a/src/duckdb/extension/icu/third_party/icu/i18n/unicode/ucol.h b/src/duckdb/extension/icu/third_party/icu/i18n/unicode/ucol.h index c52f0b1d..cd6d4619 100644 --- a/src/duckdb/extension/icu/third_party/icu/i18n/unicode/ucol.h +++ b/src/duckdb/extension/icu/third_party/icu/i18n/unicode/ucol.h @@ -459,7 +459,7 @@ ucol_openRules( const UChar *rules, * instantiating collators (like out of memory or similar), this * API will return an error if an invalid attribute or attribute/value * combination is specified. - * @return A pointer to a UCollator or 0 if an error occured (including an + * @return A pointer to a UCollator or 0 if an error occurred (including an * invalid attribute). * @see ucol_open * @see ucol_setAttribute diff --git a/src/duckdb/extension/icu/third_party/icu/i18n/unicode/ucoleitr.h b/src/duckdb/extension/icu/third_party/icu/i18n/unicode/ucoleitr.h index 85ec8383..0a2929d4 100644 --- a/src/duckdb/extension/icu/third_party/icu/i18n/unicode/ucoleitr.h +++ b/src/duckdb/extension/icu/third_party/icu/i18n/unicode/ucoleitr.h @@ -11,7 +11,7 @@ * Modification History: * * Date Name Description -* 02/15/2001 synwee Modified all methods to process its own function +* 02/15/2001 synwee Modified all methods to process its own function * instead of calling the equivalent c++ api (coleitr.h) *******************************************************************************/ @@ -22,8 +22,8 @@ #if !UCONFIG_NO_COLLATION -/** - * This indicates an error has occured during processing or if no more CEs is +/** + * This indicates an error has occurred during processing or if no more CEs is * to be returned. * @stable ICU 2.0 */ @@ -31,7 +31,7 @@ #include "unicode/ucol.h" -/** +/** * The UCollationElements struct. * For usage in C programs. * @stable ICU 2.0 @@ -42,10 +42,10 @@ typedef struct UCollationElements UCollationElements; * \file * \brief C API: UCollationElements * - * The UCollationElements API is used as an iterator to walk through each + * The UCollationElements API is used as an iterator to walk through each * character of an international string. Use the iterator to return the - * ordering priority of the positioned character. The ordering priority of a - * character, which we refer to as a key, defines how a character is collated + * ordering priority of the positioned character. The ordering priority of a + * character, which we refer to as a key, defines how a character is collated * in the given collation object. * For example, consider the following in Slovak and in traditional Spanish collation: *@@ -82,19 +82,19 @@ typedef struct UCollationElements UCollationElements; * ucol_next() returns the collation order of the next. * ucol_prev() returns the collation order of the previous character. * The Collation Element Iterator moves only in one direction between calls to - * ucol_reset. That is, ucol_next() and ucol_prev can not be inter-used. - * Whenever ucol_prev is to be called after ucol_next() or vice versa, - * ucol_reset has to be called first to reset the status, shifting pointers to - * either the end or the start of the string. Hence at the next call of - * ucol_prev or ucol_next, the first or last collation order will be returned. - * If a change of direction is done without a ucol_reset, the result is + * ucol_reset. That is, ucol_next() and ucol_prev can not be inter-used. + * Whenever ucol_prev is to be called after ucol_next() or vice versa, + * ucol_reset has to be called first to reset the status, shifting pointers to + * either the end or the start of the string. Hence at the next call of + * ucol_prev or ucol_next, the first or last collation order will be returned. + * If a change of direction is done without a ucol_reset, the result is * undefined. - * The result of a forward iterate (ucol_next) and reversed result of the - * backward iterate (ucol_prev) on the same string are equivalent, if + * The result of a forward iterate (ucol_next) and reversed result of the + * backward iterate (ucol_prev) on the same string are equivalent, if * collation orders with the value 0 are ignored. - * Character based on the comparison level of the collator. A collation order - * consists of primary order, secondary order and tertiary order. The data - * type of the collation order is int32_t. + * Character based on the comparison level of the collator. A collation order + * consists of primary order, secondary order and tertiary order. The data + * type of the collation order is int32_t. * * @see UCollator */ @@ -109,7 +109,7 @@ typedef struct UCollationElements UCollationElements; * @return a struct containing collation element information * @stable ICU 2.0 */ -U_STABLE UCollationElements* U_EXPORT2 +U_STABLE UCollationElements* U_EXPORT2 ucol_openElements(const UCollator *coll, const UChar *text, int32_t textLength, @@ -123,7 +123,7 @@ ucol_openElements(const UCollator *coll, * @return the hash code. * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 ucol_keyHashCode(const uint8_t* key, int32_t length); /** @@ -132,7 +132,7 @@ ucol_keyHashCode(const uint8_t* key, int32_t length); * @param elems The UCollationElements to close. * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 ucol_closeElements(UCollationElements *elems); /** @@ -144,7 +144,7 @@ ucol_closeElements(UCollationElements *elems); * @see ucol_previous * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 ucol_reset(UCollationElements *elems); /** @@ -152,41 +152,41 @@ ucol_reset(UCollationElements *elems); * A single character may contain more than one collation element. * @param elems The UCollationElements containing the text. * @param status A pointer to a UErrorCode to receive any errors. - * @return The next collation elements ordering, otherwise returns UCOL_NULLORDER - * if an error has occured or if the end of string has been reached + * @return The next collation elements ordering, otherwise returns UCOL_NULLORDER + * if an error has occurred or if the end of string has been reached * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 ucol_next(UCollationElements *elems, UErrorCode *status); /** * Get the ordering priority of the previous collation element in the text. * A single character may contain more than one collation element. - * Note that internally a stack is used to store buffered collation elements. + * Note that internally a stack is used to store buffered collation elements. * @param elems The UCollationElements containing the text. - * @param status A pointer to a UErrorCode to receive any errors. Noteably + * @param status A pointer to a UErrorCode to receive any errors. Noteably * a U_BUFFER_OVERFLOW_ERROR is returned if the internal stack * buffer has been exhausted. - * @return The previous collation elements ordering, otherwise returns - * UCOL_NULLORDER if an error has occured or if the start of string has + * @return The previous collation elements ordering, otherwise returns + * UCOL_NULLORDER if an error has occurred or if the start of string has * been reached. * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 ucol_previous(UCollationElements *elems, UErrorCode *status); /** - * Get the maximum length of any expansion sequences that end with the + * Get the maximum length of any expansion sequences that end with the * specified comparison order. * This is useful for .... ? * @param elems The UCollationElements containing the text. * @param order A collation order returned by previous or next. - * @return maximum size of the expansion sequences ending with the collation - * element or 1 if collation element does not occur at the end of any + * @return maximum size of the expansion sequences ending with the collation + * element or 1 if collation element does not occur at the end of any * expansion sequence * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 ucol_getMaxExpansion(const UCollationElements *elems, int32_t order); /** @@ -201,8 +201,8 @@ ucol_getMaxExpansion(const UCollationElements *elems, int32_t order); * @see ucol_getText * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 -ucol_setText( UCollationElements *elems, +U_STABLE void U_EXPORT2 +ucol_setText( UCollationElements *elems, const UChar *text, int32_t textLength, UErrorCode *status); @@ -216,7 +216,7 @@ ucol_setText( UCollationElements *elems, * @see ucol_setOffset * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 ucol_getOffset(const UCollationElements *elems); /** @@ -231,7 +231,7 @@ ucol_getOffset(const UCollationElements *elems); * @see ucol_getOffset * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 ucol_setOffset(UCollationElements *elems, int32_t offset, UErrorCode *status); @@ -243,7 +243,7 @@ ucol_setOffset(UCollationElements *elems, * @stable ICU 2.6 */ U_STABLE int32_t U_EXPORT2 -ucol_primaryOrder (int32_t order); +ucol_primaryOrder (int32_t order); /** * Get the secondary order of a collation order. @@ -252,7 +252,7 @@ ucol_primaryOrder (int32_t order); * @stable ICU 2.6 */ U_STABLE int32_t U_EXPORT2 -ucol_secondaryOrder (int32_t order); +ucol_secondaryOrder (int32_t order); /** * Get the tertiary order of a collation order. @@ -261,7 +261,7 @@ ucol_secondaryOrder (int32_t order); * @stable ICU 2.6 */ U_STABLE int32_t U_EXPORT2 -ucol_tertiaryOrder (int32_t order); +ucol_tertiaryOrder (int32_t order); #endif /* #if !UCONFIG_NO_COLLATION */ diff --git a/src/duckdb/extension/icu/third_party/icu/i18n/unicode/umsg.h b/src/duckdb/extension/icu/third_party/icu/i18n/unicode/umsg.h index 5d235e42..18765b0f 100644 --- a/src/duckdb/extension/icu/third_party/icu/i18n/unicode/umsg.h +++ b/src/duckdb/extension/icu/third_party/icu/i18n/unicode/umsg.h @@ -1,10 +1,10 @@ // © 2016 and later: Unicode, Inc. and others. // License & terms of use: http://www.unicode.org/copyright.html /******************************************************************** - * COPYRIGHT: + * COPYRIGHT: * Copyright (c) 1997-2011, International Business Machines Corporation and * others. All Rights Reserved. - * Copyright (C) 2010 , Yahoo! Inc. + * Copyright (C) 2010 , Yahoo! Inc. ******************************************************************** * * file name: umsg.h @@ -100,8 +100,8 @@ * u_uastrcpy(str, "MyDisk"); * u_uastrcpy(pattern, "The disk {1} contains {0,choice,0#no files|1#one file|1<{0,number,integer} files}"); * for(i=0; i<3; i++){ - * resultlength=0; - * resultLengthOut=u_formatMessage( "en_US", pattern, u_strlen(pattern), NULL, resultlength, &status, testArgs[i], str); + * resultlength=0; + * resultLengthOut=u_formatMessage( "en_US", pattern, u_strlen(pattern), NULL, resultlength, &status, testArgs[i], str); * if(status==U_BUFFER_OVERFLOW_ERROR){ * status=U_ZERO_ERROR; * resultlength=resultLengthOut+1; @@ -175,7 +175,7 @@ * @see u_parseMessage * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 u_formatMessage(const char *locale, const UChar *pattern, int32_t patternLength, @@ -202,7 +202,7 @@ u_formatMessage(const char *locale, * @see u_parseMessage * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 u_vformatMessage( const char *locale, const UChar *pattern, int32_t patternLength, @@ -227,7 +227,7 @@ u_vformatMessage( const char *locale, * @see u_formatMessage * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 u_parseMessage( const char *locale, const UChar *pattern, int32_t patternLength, @@ -252,7 +252,7 @@ u_parseMessage( const char *locale, * @see u_formatMessage * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 u_vparseMessage(const char *locale, const UChar *pattern, int32_t patternLength, @@ -281,7 +281,7 @@ u_vparseMessage(const char *locale, * @see u_parseMessage * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 u_formatMessageWithError( const char *locale, const UChar *pattern, int32_t patternLength, @@ -310,7 +310,7 @@ u_formatMessageWithError( const char *locale, * output was truncated. * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 u_vformatMessageWithError( const char *locale, const UChar *pattern, int32_t patternLength, @@ -338,7 +338,7 @@ u_vformatMessageWithError( const char *locale, * @see u_formatMessage * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 u_parseMessageWithError(const char *locale, const UChar *pattern, int32_t patternLength, @@ -366,7 +366,7 @@ u_parseMessageWithError(const char *locale, * @see u_formatMessage * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 u_vparseMessageWithError(const char *locale, const UChar *pattern, int32_t patternLength, @@ -377,7 +377,7 @@ u_vparseMessageWithError(const char *locale, UErrorCode* status); /*----------------------- New experimental API --------------------------- */ -/** +/** * The message format object * @stable ICU 2.0 */ @@ -389,14 +389,14 @@ typedef void* UMessageFormat; * @param pattern A pattern specifying the format to use. * @param patternLength Length of the pattern to use * @param locale The locale for which the messages are formatted. - * @param parseError A pointer to UParseError struct to receive any errors - * occured during parsing. Can be NULL. + * @param parseError A pointer to UParseError struct to receive any errors + * occurred during parsing. Can be NULL. * @param status A pointer to an UErrorCode to receive any errors. - * @return A pointer to a UMessageFormat to use for formatting - * messages, or 0 if an error occurred. + * @return A pointer to a UMessageFormat to use for formatting + * messages, or 0 if an error occurred. * @stable ICU 2.0 */ -U_STABLE UMessageFormat* U_EXPORT2 +U_STABLE UMessageFormat* U_EXPORT2 umsg_open( const UChar *pattern, int32_t patternLength, const char *locale, @@ -409,7 +409,7 @@ umsg_open( const UChar *pattern, * @param format The formatter to close. * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 umsg_close(UMessageFormat* format); #if U_SHOW_CPLUSPLUS_API @@ -439,7 +439,7 @@ U_NAMESPACE_END * @return A pointer to a UDateFormat identical to fmt. * @stable ICU 2.0 */ -U_STABLE UMessageFormat U_EXPORT2 +U_STABLE UMessageFormat U_EXPORT2 umsg_clone(const UMessageFormat *fmt, UErrorCode *status); @@ -450,7 +450,7 @@ umsg_clone(const UMessageFormat *fmt, * @param locale The locale the formatter should use. * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 umsg_setLocale(UMessageFormat *fmt, const char* locale); @@ -461,7 +461,7 @@ umsg_setLocale(UMessageFormat *fmt, * @return the locale. * @stable ICU 2.0 */ -U_STABLE const char* U_EXPORT2 +U_STABLE const char* U_EXPORT2 umsg_getLocale(const UMessageFormat *fmt); /** @@ -469,14 +469,14 @@ umsg_getLocale(const UMessageFormat *fmt); * @param fmt The formatter to use * @param pattern The pattern to be applied. * @param patternLength Length of the pattern to use - * @param parseError Struct to receive information on position + * @param parseError Struct to receive information on position * of error if an error is encountered.Can be NULL. * @param status Output param set to success/failure code on * exit. If the pattern is invalid, this will be * set to a failure result. * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 umsg_applyPattern( UMessageFormat *fmt, const UChar* pattern, int32_t patternLength, @@ -490,13 +490,13 @@ umsg_applyPattern( UMessageFormat *fmt, * @param resultLength The maximum size of result. * @param status Output param set to success/failure code on * exit. If the pattern is invalid, this will be - * set to a failure result. + * set to a failure result. * @return the pattern of the format * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 umsg_toPattern(const UMessageFormat *fmt, - UChar* result, + UChar* result, int32_t resultLength, UErrorCode* status); @@ -509,13 +509,13 @@ umsg_toPattern(const UMessageFormat *fmt, * @param result A pointer to a buffer to receive the formatted message. * @param resultLength The maximum size of result. * @param status A pointer to an UErrorCode to receive any errors - * @param ... A variable-length argument list containing the arguments + * @param ... A variable-length argument list containing the arguments * specified in pattern. - * @return The total buffer size needed; if greater than resultLength, + * @return The total buffer size needed; if greater than resultLength, * the output was truncated. * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 umsg_format( const UMessageFormat *fmt, UChar *result, int32_t resultLength, @@ -527,17 +527,17 @@ umsg_format( const UMessageFormat *fmt, * This function may perform re-ordering of the arguments depending on the * locale. For all numeric arguments, double is assumed unless the type is * explicitly integer. All choice format arguments must be of type double. - * @param fmt The formatter to use + * @param fmt The formatter to use * @param result A pointer to a buffer to receive the formatted message. * @param resultLength The maximum size of result. - * @param ap A variable-length argument list containing the arguments + * @param ap A variable-length argument list containing the arguments * @param status A pointer to an UErrorCode to receive any errors * specified in pattern. - * @return The total buffer size needed; if greater than resultLength, + * @return The total buffer size needed; if greater than resultLength, * the output was truncated. * @stable ICU 2.0 */ -U_STABLE int32_t U_EXPORT2 +U_STABLE int32_t U_EXPORT2 umsg_vformat( const UMessageFormat *fmt, UChar *result, int32_t resultLength, @@ -549,7 +549,7 @@ umsg_vformat( const UMessageFormat *fmt, * For numeric arguments, this function will always use doubles. Integer types * should not be passed. * This function is not able to parse all output from {@link #umsg_format }. - * @param fmt The formatter to use + * @param fmt The formatter to use * @param source The text to parse. * @param sourceLength The length of source, or -1 if null-terminated. * @param count Output param to receive number of elements returned. @@ -558,7 +558,7 @@ umsg_vformat( const UMessageFormat *fmt, * specified in pattern. * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 umsg_parse( const UMessageFormat *fmt, const UChar *source, int32_t sourceLength, @@ -571,7 +571,7 @@ umsg_parse( const UMessageFormat *fmt, * For numeric arguments, this function will always use doubles. Integer types * should not be passed. * This function is not able to parse all output from {@link #umsg_format }. - * @param fmt The formatter to use + * @param fmt The formatter to use * @param source The text to parse. * @param sourceLength The length of source, or -1 if null-terminated. * @param count Output param to receive number of elements returned. @@ -581,7 +581,7 @@ umsg_parse( const UMessageFormat *fmt, * @see u_formatMessage * @stable ICU 2.0 */ -U_STABLE void U_EXPORT2 +U_STABLE void U_EXPORT2 umsg_vparse(const UMessageFormat *fmt, const UChar *source, int32_t sourceLength, @@ -593,7 +593,7 @@ umsg_vparse(const UMessageFormat *fmt, /** * Convert an 'apostrophe-friendly' pattern into a standard * pattern. Standard patterns treat all apostrophes as - * quotes, which is problematic in some languages, e.g. + * quotes, which is problematic in some languages, e.g. * French, where apostrophe is commonly used. This utility * assumes that only an unpaired apostrophe immediately before * a brace is a true quote. Other unpaired apostrophes are paired, @@ -613,8 +613,8 @@ umsg_vparse(const UMessageFormat *fmt, * not * @stable ICU 3.4 */ -U_STABLE int32_t U_EXPORT2 -umsg_autoQuoteApostrophe(const UChar* pattern, +U_STABLE int32_t U_EXPORT2 +umsg_autoQuoteApostrophe(const UChar* pattern, int32_t patternLength, UChar* dest, int32_t destCapacity, diff --git a/src/duckdb/extension/icu/third_party/icu/i18n/usrchimp.h b/src/duckdb/extension/icu/third_party/icu/i18n/usrchimp.h index 5438417e..cd3c5a7c 100644 --- a/src/duckdb/extension/icu/third_party/icu/i18n/usrchimp.h +++ b/src/duckdb/extension/icu/third_party/icu/i18n/usrchimp.h @@ -43,7 +43,7 @@ #define isContinuation(CE) (((CE) & UCOL_CONTINUATION_MARKER) == UCOL_CONTINUATION_MARKER) /** - * This indicates an error has occured during processing or there are no more CEs + * This indicates an error has occurred during processing or there are no more CEs * to be returned. */ #define UCOL_PROCESSED_NULLORDER ((int64_t)U_INT64_MAX) @@ -101,7 +101,7 @@ class UCollationPCE : public UMemory { * @param ixHigh a pointer to an int32_t to receive the iterator index after fetching the CE. * @param status A pointer to an UErrorCode to receive any errors. * @return The next collation elements ordering, otherwise returns UCOL_PROCESSED_NULLORDER - * if an error has occured or if the end of string has been reached + * if an error has occurred or if the end of string has been reached */ int64_t nextProcessed(int32_t *ixLow, int32_t *ixHigh, UErrorCode *status); /** @@ -114,7 +114,7 @@ class UCollationPCE : public UMemory { * a U_BUFFER_OVERFLOW_ERROR is returned if the internal stack * buffer has been exhausted. * @return The previous collation elements ordering, otherwise returns - * UCOL_PROCESSED_NULLORDER if an error has occured or if the start of + * UCOL_PROCESSED_NULLORDER if an error has occurred or if the start of * string has been reached. */ int64_t previousProcessed(int32_t *ixLow, int32_t *ixHigh, UErrorCode *status); diff --git a/src/duckdb/extension/json/include/json_common.hpp b/src/duckdb/extension/json/include/json_common.hpp index 87205802..ca29cde9 100644 --- a/src/duckdb/extension/json/include/json_common.hpp +++ b/src/duckdb/extension/json/include/json_common.hpp @@ -301,7 +301,7 @@ struct JSONCommon { //! Get JSON pointer (/field/index/... syntax) static inline yyjson_val *GetPointer(yyjson_val *val, const char *ptr, const idx_t &len) { yyjson_ptr_err err; - return len == 1 ? val : unsafe_yyjson_ptr_getx(val, ptr, len, &err); + return unsafe_yyjson_ptr_getx(val, ptr, len, &err); } //! Get JSON path ($.field[index]... syntax) static yyjson_val *GetPath(yyjson_val *val, const char *ptr, const idx_t &len); diff --git a/src/duckdb/extension/json/json_functions/json_structure.cpp b/src/duckdb/extension/json/json_functions/json_structure.cpp index 04800572..7982003f 100644 --- a/src/duckdb/extension/json/json_functions/json_structure.cpp +++ b/src/duckdb/extension/json/json_functions/json_structure.cpp @@ -214,8 +214,8 @@ void JSONStructureNode::RefineCandidateTypesObject(yyjson_val *vals[], const idx D_ASSERT(it != key_map.end()); const auto child_idx = it->second; child_vals[child_idx][i] = child_val; + found_key_count += !found_keys[child_idx]; found_keys[child_idx] = true; - found_key_count++; } if (found_key_count != child_count) { @@ -562,10 +562,12 @@ static void MergeNodeVal(JSONStructureNode &merged, const JSONStructureDescripti } if (!merged.initialized) { merged_desc.candidate_types = child_desc.candidate_types; - } else if (!merged_desc.candidate_types.empty() && !child_desc.candidate_types.empty() && - merged_desc.candidate_types.back() != child_desc.candidate_types.back()) { + } else if (merged_desc.candidate_types.empty() != child_desc.candidate_types.empty() // both empty or neither empty + || (!merged_desc.candidate_types.empty() && + merged_desc.candidate_types.back() != child_desc.candidate_types.back())) { // non-empty: check type merged_desc.candidate_types.clear(); // Not the same, default to VARCHAR } + merged.initialized = true; } @@ -704,14 +706,18 @@ static LogicalType StructureToTypeObject(ClientContext &context, const JSONStruc D_ASSERT(node.descriptions.size() == 1 && node.descriptions[0].type == LogicalTypeId::STRUCT); auto &desc = node.descriptions[0]; - // If it's an empty struct we do MAP of JSON instead if (desc.children.empty()) { - // Empty struct - let's do MAP of JSON instead - return LogicalType::MAP(LogicalType::VARCHAR, null_type); + if (map_inference_threshold != DConstants::INVALID_INDEX) { + // Empty struct - let's do MAP of JSON instead + return LogicalType::MAP(LogicalType::VARCHAR, null_type); + } else { + return LogicalType::JSON(); + } } // If it's an inconsistent object we also just do MAP with the best-possible, recursively-merged value type - if (IsStructureInconsistent(desc, node.count, node.null_count, field_appearance_threshold)) { + if (map_inference_threshold != DConstants::INVALID_INDEX && + IsStructureInconsistent(desc, node.count, node.null_count, field_appearance_threshold)) { return LogicalType::MAP(LogicalType::VARCHAR, GetMergedType(context, node, max_depth, field_appearance_threshold, map_inference_threshold, depth + 1, null_type)); diff --git a/src/duckdb/extension/parquet/column_writer.cpp b/src/duckdb/extension/parquet/column_writer.cpp index 1d42da05..0b1d867c 100644 --- a/src/duckdb/extension/parquet/column_writer.cpp +++ b/src/duckdb/extension/parquet/column_writer.cpp @@ -2244,7 +2244,8 @@ unique_ptrColumnWriter::CreateWriterRecursive(ClientContext &cont schemas.push_back(std::move(schema_element)); schema_path.push_back(name); - if (type.id() == LogicalTypeId::BLOB && type.GetAlias() == "WKB_BLOB") { + if (type.id() == LogicalTypeId::BLOB && type.GetAlias() == "WKB_BLOB" && + GeoParquetFileMetadata::IsGeoParquetConversionEnabled(context)) { return make_uniq (context, writer, schema_idx, std::move(schema_path), max_repeat, max_define, can_have_nulls, name); } diff --git a/src/duckdb/extension/parquet/geo_parquet.cpp b/src/duckdb/extension/parquet/geo_parquet.cpp index 28c56991..b82cd502 100644 --- a/src/duckdb/extension/parquet/geo_parquet.cpp +++ b/src/duckdb/extension/parquet/geo_parquet.cpp @@ -177,7 +177,14 @@ void GeoParquetColumnMetadataWriter::Update(GeoParquetColumnMetadata &meta, Vect //------------------------------------------------------------------------------ unique_ptr -GeoParquetFileMetadata::TryRead(const duckdb_parquet::format::FileMetaData &file_meta_data, ClientContext &context) { +GeoParquetFileMetadata::TryRead(const duckdb_parquet::format::FileMetaData &file_meta_data, + const ClientContext &context) { + + // Conversion not enabled, or spatial is not loaded! + if (!IsGeoParquetConversionEnabled(context)) { + return nullptr; + } + for (auto &kv : file_meta_data.key_value_metadata) { if (kv.key == "geo") { const auto geo_metadata = yyjson_read(kv.value.c_str(), kv.value.size(), 0); @@ -186,14 +193,6 @@ GeoParquetFileMetadata::TryRead(const duckdb_parquet::format::FileMetaData &file return nullptr; } - // Check if the spatial extension is loaded, or try to autoload it. - const auto is_loaded = ExtensionHelper::TryAutoLoadExtension(context, "spatial"); - if (!is_loaded) { - // Spatial extension is not available, we can't make use of the metadata anyway. - yyjson_doc_free(geo_metadata); - return nullptr; - } - try { // Check the root object const auto root = yyjson_doc_get_root(geo_metadata); @@ -368,6 +367,22 @@ void GeoParquetFileMetadata::RegisterGeometryColumn(const string &column_name) { geometry_columns[column_name] = GeoParquetColumnMetadata(); } +bool GeoParquetFileMetadata::IsGeoParquetConversionEnabled(const ClientContext &context) { + Value geoparquet_enabled; + if (!context.TryGetCurrentSetting("enable_geoparquet_conversion", geoparquet_enabled)) { + return false; + } + if (!geoparquet_enabled.GetValue ()) { + // Disabled by setting + return false; + } + if (!context.db->ExtensionIsLoaded("spatial")) { + // Spatial extension is not loaded, we cant convert anyway + return false; + } + return true; +} + unique_ptr GeoParquetFileMetadata::CreateColumnReader(ParquetReader &reader, const LogicalType &logical_type, const SchemaElement &s_ele, idx_t schema_idx_p, diff --git a/src/duckdb/extension/parquet/include/geo_parquet.hpp b/src/duckdb/extension/parquet/include/geo_parquet.hpp index ab04dcdf..e9b7ce48 100644 --- a/src/duckdb/extension/parquet/include/geo_parquet.hpp +++ b/src/duckdb/extension/parquet/include/geo_parquet.hpp @@ -120,7 +120,7 @@ class GeoParquetFileMetadata { // Try to read GeoParquet metadata. Returns nullptr if not found, invalid or the required spatial extension is not // available. static unique_ptr TryRead(const duckdb_parquet::format::FileMetaData &file_meta_data, - ClientContext &context); + const ClientContext &context); void Write(duckdb_parquet::format::FileMetaData &file_meta_data) const; void FlushColumnMeta(const string &column_name, const GeoParquetColumnMetadata &meta); @@ -133,6 +133,8 @@ class GeoParquetFileMetadata { bool IsGeometryColumn(const string &column_name) const; void RegisterGeometryColumn(const string &column_name); + static bool IsGeoParquetConversionEnabled(const ClientContext &context); + private: mutex write_lock; string version = "1.1.0"; diff --git a/src/duckdb/extension/parquet/include/parquet_reader.hpp b/src/duckdb/extension/parquet/include/parquet_reader.hpp index ef8dcaf8..6b536bbd 100644 --- a/src/duckdb/extension/parquet/include/parquet_reader.hpp +++ b/src/duckdb/extension/parquet/include/parquet_reader.hpp @@ -93,6 +93,7 @@ struct ParquetOptions { MultiFileReaderOptions file_options; vector schema; + idx_t explicit_cardinality = 0; public: void Serialize(Serializer &serializer) const; diff --git a/src/duckdb/extension/parquet/include/parquet_rle_bp_decoder.hpp b/src/duckdb/extension/parquet/include/parquet_rle_bp_decoder.hpp index 27093388..49583f71 100644 --- a/src/duckdb/extension/parquet/include/parquet_rle_bp_decoder.hpp +++ b/src/duckdb/extension/parquet/include/parquet_rle_bp_decoder.hpp @@ -66,7 +66,7 @@ class RleBpDecoder { return 0; } uint8_t ret = 1; - while (((idx_t)(1u << ret) - 1) < val) { + while ((((idx_t)1u << (idx_t)ret) - 1) < val) { ret++; } return ret; diff --git a/src/duckdb/extension/parquet/include/templated_column_reader.hpp b/src/duckdb/extension/parquet/include/templated_column_reader.hpp index 3ebddfee..f9311524 100644 --- a/src/duckdb/extension/parquet/include/templated_column_reader.hpp +++ b/src/duckdb/extension/parquet/include/templated_column_reader.hpp @@ -68,10 +68,6 @@ class TemplatedColumnReader : public ColumnReader { void Offsets(uint32_t *offsets, uint8_t *defines, uint64_t num_values, parquet_filter_t &filter, idx_t result_offset, Vector &result) override { - if (!dict || dict->len == 0) { - throw IOException("Parquet file is likely corrupted, cannot have dictionary offsets without seeing a " - "non-empty dictionary first."); - } if (HasDefines()) { OffsetsInternal (*dict, offsets, defines, num_values, filter, result_offset, result); } else { diff --git a/src/duckdb/extension/parquet/parquet_extension.cpp b/src/duckdb/extension/parquet/parquet_extension.cpp index 596fed87..617dc3ca 100644 --- a/src/duckdb/extension/parquet/parquet_extension.cpp +++ b/src/duckdb/extension/parquet/parquet_extension.cpp @@ -70,8 +70,8 @@ struct ParquetReadBindData : public TableFunctionData { // These come from the initial_reader, but need to be stored in case the initial_reader is removed by a filter idx_t initial_file_cardinality; idx_t initial_file_row_groups; + idx_t explicit_cardinality = 0; // can be set to inject exterior cardinality knowledge (e.g. from a data lake) ParquetOptions parquet_options; - MultiFileReaderBindData reader_bind; void Initialize(shared_ptr reader) { @@ -395,6 +395,7 @@ class ParquetScanFunction { table_function.named_parameters["file_row_number"] = LogicalType::BOOLEAN; table_function.named_parameters["debug_use_openssl"] = LogicalType::BOOLEAN; table_function.named_parameters["compression"] = LogicalType::VARCHAR; + table_function.named_parameters["explicit_cardinality"] = LogicalType::UBIGINT; table_function.named_parameters["schema"] = LogicalType::MAP(LogicalType::INTEGER, LogicalType::STRUCT({{{"name", LogicalType::VARCHAR}, {"type", LogicalType::VARCHAR}, @@ -545,7 +546,11 @@ class ParquetScanFunction { result->reader_bind = result->multi_file_reader->BindReader ( context, result->types, result->names, *result->file_list, *result, parquet_options); } - + if (parquet_options.explicit_cardinality) { + auto file_count = result->file_list->GetTotalFileCount(); + result->explicit_cardinality = parquet_options.explicit_cardinality; + result->initial_file_cardinality = result->explicit_cardinality / (file_count ? file_count : 1); + } if (return_types.empty()) { // no expected types - just copy the types return_types = result->types; @@ -618,6 +623,8 @@ class ParquetScanFunction { // cannot be combined with hive_partitioning=true, so we disable auto-detection parquet_options.file_options.auto_detect_hive_partitioning = false; + } else if (loption == "explicit_cardinality") { + parquet_options.explicit_cardinality = UBigIntValue::Get(kv.second); } else if (loption == "encryption_config") { parquet_options.encryption_config = ParquetEncryptionConfig::Create(context, kv.second); } @@ -847,13 +854,15 @@ class ParquetScanFunction { static unique_ptr ParquetCardinality(ClientContext &context, const FunctionData *bind_data) { auto &data = bind_data->Cast (); - + if (data.explicit_cardinality) { + return make_uniq (data.explicit_cardinality); + } auto file_list_cardinality_estimate = data.file_list->GetCardinality(context); if (file_list_cardinality_estimate) { return file_list_cardinality_estimate; } - - return make_uniq (data.initial_file_cardinality * data.file_list->GetTotalFileCount()); + return make_uniq (MaxValue(data.initial_file_cardinality, (idx_t)1) * + data.file_list->GetTotalFileCount()); } static idx_t ParquetScanMaxThreads(ClientContext &context, const FunctionData *bind_data) { @@ -1573,7 +1582,7 @@ static vector > ParquetWriteSelect(CopyToSelectInput &inpu // Spatial types need to be encoded into WKB when writing GeoParquet. // But dont perform this conversion if this is a EXPORT DATABASE statement if (input.copy_to_type == CopyToType::COPY_TO_FILE && type.id() == LogicalTypeId::BLOB && type.HasAlias() && - type.GetAlias() == "GEOMETRY") { + type.GetAlias() == "GEOMETRY" && GeoParquetFileMetadata::IsGeoParquetConversionEnabled(context)) { LogicalType wkb_blob_type(LogicalTypeId::BLOB); wkb_blob_type.SetAlias("WKB_BLOB"); @@ -1680,6 +1689,11 @@ void ParquetExtension::Load(DuckDB &db) { config.replacement_scans.emplace_back(ParquetScanReplacement); config.AddExtensionOption("binary_as_string", "In Parquet files, interpret binary data as a string.", LogicalType::BOOLEAN); + + config.AddExtensionOption( + "enable_geoparquet_conversion", + "Attempt to decode/encode geometry data in/as GeoParquet files if the spatial extension is present.", + LogicalType::BOOLEAN, Value::BOOLEAN(true)); } std::string ParquetExtension::Name() { diff --git a/src/duckdb/extension/parquet/parquet_reader.cpp b/src/duckdb/extension/parquet/parquet_reader.cpp index 0508d254..7357767b 100644 --- a/src/duckdb/extension/parquet/parquet_reader.cpp +++ b/src/duckdb/extension/parquet/parquet_reader.cpp @@ -637,8 +637,7 @@ uint32_t ParquetReader::ReadData(duckdb_apache::thrift::protocol::TProtocol &ipr const ParquetRowGroup &ParquetReader::GetGroup(ParquetReaderScanState &state) { auto file_meta_data = GetFileMetadata(); D_ASSERT(state.current_group >= 0 && (idx_t)state.current_group < state.group_idx_list.size()); - D_ASSERT(state.group_idx_list[state.current_group] >= 0 && - state.group_idx_list[state.current_group] < file_meta_data->row_groups.size()); + D_ASSERT(state.group_idx_list[state.current_group] < file_meta_data->row_groups.size()); return file_meta_data->row_groups[state.group_idx_list[state.current_group]]; } diff --git a/src/duckdb/extension/parquet/parquet_writer.cpp b/src/duckdb/extension/parquet/parquet_writer.cpp index f9b864ef..a7a847af 100644 --- a/src/duckdb/extension/parquet/parquet_writer.cpp +++ b/src/duckdb/extension/parquet/parquet_writer.cpp @@ -466,7 +466,7 @@ void ParquetWriter::PrepareRowGroup(ColumnDataCollection &buffer, PreparedRowGro // Validation code adapted from Impala static void ValidateOffsetInFile(const string &filename, idx_t col_idx, idx_t file_length, idx_t offset, const string &offset_name) { - if (offset < 0 || offset >= file_length) { + if (offset >= file_length) { throw IOException("File '%s': metadata is corrupt. Column %d has invalid " "%s (offset=%llu file_size=%llu).", filename, col_idx, offset_name, offset, file_length); diff --git a/src/duckdb/extension/parquet/serialize_parquet.cpp b/src/duckdb/extension/parquet/serialize_parquet.cpp index e6aeac02..b72a78be 100644 --- a/src/duckdb/extension/parquet/serialize_parquet.cpp +++ b/src/duckdb/extension/parquet/serialize_parquet.cpp @@ -7,8 +7,6 @@ #include "duckdb/common/serializer/deserializer.hpp" #include "parquet_reader.hpp" #include "parquet_crypto.hpp" -#include "parquet_reader.hpp" -#include "parquet_writer.hpp" #include "parquet_writer.hpp" namespace duckdb { diff --git a/src/duckdb/src/catalog/catalog_entry/duck_schema_entry.cpp b/src/duckdb/src/catalog/catalog_entry/duck_schema_entry.cpp index 42dea06f..f3c8684f 100644 --- a/src/duckdb/src/catalog/catalog_entry/duck_schema_entry.cpp +++ b/src/duckdb/src/catalog/catalog_entry/duck_schema_entry.cpp @@ -119,6 +119,13 @@ optional_ptr DuckSchemaEntry::AddEntryInternal(CatalogTransaction // first find the set for this entry auto &set = GetCatalogSet(entry_type); dependencies.AddDependency(*this); + if (on_conflict == OnCreateConflict::IGNORE_ON_CONFLICT) { + auto old_entry = set.GetEntry(transaction, entry_name); + if (old_entry) { + return nullptr; + } + } + if (on_conflict == OnCreateConflict::REPLACE_ON_CONFLICT) { // CREATE OR REPLACE: first try to drop the entry auto old_entry = set.GetEntry(transaction, entry_name); @@ -315,7 +322,7 @@ void DuckSchemaEntry::DropEntry(ClientContext &context, DropInfo &info) { throw InternalException("Failed to drop entry \"%s\" - entry could not be found", info.name); } if (existing_entry->type != info.type) { - throw CatalogException("Existing object %s is of type %s, trying to replace with type %s", info.name, + throw CatalogException("Existing object %s is of type %s, trying to drop type %s", info.name, CatalogTypeToString(existing_entry->type), CatalogTypeToString(info.type)); } diff --git a/src/duckdb/src/catalog/default/default_functions.cpp b/src/duckdb/src/catalog/default/default_functions.cpp index b4f7deca..f7f4634b 100644 --- a/src/duckdb/src/catalog/default/default_functions.cpp +++ b/src/duckdb/src/catalog/default/default_functions.cpp @@ -12,7 +12,7 @@ namespace duckdb { static const DefaultMacro internal_macros[] = { {DEFAULT_SCHEMA, "current_role", {nullptr}, {{nullptr, nullptr}}, "'duckdb'"}, // user name of current execution context {DEFAULT_SCHEMA, "current_user", {nullptr}, {{nullptr, nullptr}}, "'duckdb'"}, // user name of current execution context - {DEFAULT_SCHEMA, "current_catalog", {nullptr}, {{nullptr, nullptr}}, "current_database()"}, // name of current database (called "catalog" in the SQL standard) + {DEFAULT_SCHEMA, "current_catalog", {nullptr}, {{nullptr, nullptr}}, "main.current_database()"}, // name of current database (called "catalog" in the SQL standard) {DEFAULT_SCHEMA, "user", {nullptr}, {{nullptr, nullptr}}, "current_user"}, // equivalent to current_user {DEFAULT_SCHEMA, "session_user", {nullptr}, {{nullptr, nullptr}}, "'duckdb'"}, // session user name {"pg_catalog", "inet_client_addr", {nullptr}, {{nullptr, nullptr}}, "NULL"}, // address of the remote connection @@ -27,10 +27,10 @@ static const DefaultMacro internal_macros[] = { {"pg_catalog", "pg_typeof", {"expression", nullptr}, {{nullptr, nullptr}}, "lower(typeof(expression))"}, // get the data type of any value - {"pg_catalog", "current_database", {nullptr}, {{nullptr, nullptr}}, "current_database()"}, // name of current database (called "catalog" in the SQL standard) - {"pg_catalog", "current_query", {nullptr}, {{nullptr, nullptr}}, "current_query()"}, // the currently executing query (NULL if not inside a plpgsql function) - {"pg_catalog", "current_schema", {nullptr}, {{nullptr, nullptr}}, "current_schema()"}, // name of current schema - {"pg_catalog", "current_schemas", {"include_implicit"}, {{nullptr, nullptr}}, "current_schemas(include_implicit)"}, // names of schemas in search path + {"pg_catalog", "current_database", {nullptr}, {{nullptr, nullptr}}, "main.current_database()"}, // name of current database (called "catalog" in the SQL standard) + {"pg_catalog", "current_query", {nullptr}, {{nullptr, nullptr}}, "main.current_query()"}, // the currently executing query (NULL if not inside a plpgsql function) + {"pg_catalog", "current_schema", {nullptr}, {{nullptr, nullptr}}, "main.current_schema()"}, // name of current schema + {"pg_catalog", "current_schemas", {"include_implicit"}, {{nullptr, nullptr}}, "main.current_schemas(include_implicit)"}, // names of schemas in search path // privilege functions {"pg_catalog", "has_any_column_privilege", {"table", "privilege", nullptr}, {{nullptr, nullptr}}, "true"}, //boolean //does current user have privilege for any column of table diff --git a/src/duckdb/src/common/allocator.cpp b/src/duckdb/src/common/allocator.cpp index d3ef18bb..c1338715 100644 --- a/src/duckdb/src/common/allocator.cpp +++ b/src/duckdb/src/common/allocator.cpp @@ -242,12 +242,13 @@ static void MallocTrim(idx_t pad) { static atomic LAST_TRIM_TIMESTAMP_MS {0}; int64_t last_trim_timestamp_ms = LAST_TRIM_TIMESTAMP_MS.load(); - const int64_t current_timestamp_ms = Timestamp::GetEpochMs(Timestamp::GetCurrentTimestamp()); + int64_t current_timestamp_ms = Timestamp::GetEpochMs(Timestamp::GetCurrentTimestamp()); if (current_timestamp_ms - last_trim_timestamp_ms < TRIM_INTERVAL_MS) { return; // We trimmed less than TRIM_INTERVAL_MS ago } - if (!std::atomic_compare_exchange_weak(&LAST_TRIM_TIMESTAMP_MS, &last_trim_timestamp_ms, current_timestamp_ms)) { + if (!LAST_TRIM_TIMESTAMP_MS.compare_exchange_strong(last_trim_timestamp_ms, current_timestamp_ms, + std::memory_order_acquire, std::memory_order_relaxed)) { return; // Another thread has updated LAST_TRIM_TIMESTAMP_MS since we loaded it } diff --git a/src/duckdb/src/common/arrow/arrow_appender.cpp b/src/duckdb/src/common/arrow/arrow_appender.cpp index b478fdb3..632bffc6 100644 --- a/src/duckdb/src/common/arrow/arrow_appender.cpp +++ b/src/duckdb/src/common/arrow/arrow_appender.cpp @@ -225,6 +225,7 @@ static void InitializeFunctionPointers(ArrowAppendData &append_data, const Logic break; case LogicalTypeId::BLOB: case LogicalTypeId::BIT: + case LogicalTypeId::VARINT: if (append_data.options.arrow_offset_size == ArrowOffsetSize::LARGE) { InitializeAppenderForType >(append_data); } else { diff --git a/src/duckdb/src/common/arrow/arrow_converter.cpp b/src/duckdb/src/common/arrow/arrow_converter.cpp index 851d45b5..9c674cd7 100644 --- a/src/duckdb/src/common/arrow/arrow_converter.cpp +++ b/src/duckdb/src/common/arrow/arrow_converter.cpp @@ -142,6 +142,17 @@ void SetArrowFormat(DuckDBArrowSchemaHolder &root_holder, ArrowSchema &child, co child.metadata = root_holder.metadata_info.back().get(); break; } + case LogicalTypeId::VARINT: { + if (options.arrow_offset_size == ArrowOffsetSize::LARGE) { + child.format = "Z"; + } else { + child.format = "z"; + } + auto schema_metadata = ArrowSchemaMetadata::MetadataFromName("duckdb.varint"); + root_holder.metadata_info.emplace_back(schema_metadata.SerializeMetadata()); + child.metadata = root_holder.metadata_info.back().get(); + break; + } case LogicalTypeId::DOUBLE: child.format = "g"; break; diff --git a/src/duckdb/src/common/arrow/schema_metadata.cpp b/src/duckdb/src/common/arrow/schema_metadata.cpp index acbf75c5..7240d40f 100644 --- a/src/duckdb/src/common/arrow/schema_metadata.cpp +++ b/src/duckdb/src/common/arrow/schema_metadata.cpp @@ -36,7 +36,12 @@ void ArrowSchemaMetadata::AddOption(const string &key, const string &value) { metadata_map[key] = value; } string ArrowSchemaMetadata::GetOption(const string &key) const { - return metadata_map.at(key); + auto it = metadata_map.find(key); + if (it != metadata_map.end()) { + return it->second; + } else { + return ""; + } } string ArrowSchemaMetadata::GetExtensionName() const { @@ -51,9 +56,6 @@ ArrowSchemaMetadata ArrowSchemaMetadata::MetadataFromName(const string &extensio } bool ArrowSchemaMetadata::HasExtension() { - if (metadata_map.find(ARROW_EXTENSION_NAME) == metadata_map.end()) { - return false; - } auto arrow_extension = GetOption(ArrowSchemaMetadata::ARROW_EXTENSION_NAME); // FIXME: We are currently ignoring the ogc extensions return !arrow_extension.empty() && !StringUtil::StartsWith(arrow_extension, "ogc"); diff --git a/src/duckdb/src/common/enum_util.cpp b/src/duckdb/src/common/enum_util.cpp index c4185af9..9a0db08f 100644 --- a/src/duckdb/src/common/enum_util.cpp +++ b/src/duckdb/src/common/enum_util.cpp @@ -4394,6 +4394,10 @@ const char* EnumUtil::ToChars (MetricsType value) { return "OPERATOR_ROWS_SCANNED"; case MetricsType::OPERATOR_TIMING: return "OPERATOR_TIMING"; + case MetricsType::LATENCY: + return "LATENCY"; + case MetricsType::ROWS_RETURNED: + return "ROWS_RETURNED"; case MetricsType::RESULT_SET_SIZE: return "RESULT_SET_SIZE"; case MetricsType::ALL_OPTIMIZERS: @@ -4495,6 +4499,12 @@ MetricsType EnumUtil::FromString (const char *value) { if (StringUtil::Equals(value, "OPERATOR_TIMING")) { return MetricsType::OPERATOR_TIMING; } + if (StringUtil::Equals(value, "LATENCY")) { + return MetricsType::LATENCY; + } + if (StringUtil::Equals(value, "ROWS_RETURNED")) { + return MetricsType::ROWS_RETURNED; + } if (StringUtil::Equals(value, "RESULT_SET_SIZE")) { return MetricsType::RESULT_SET_SIZE; } @@ -6457,6 +6467,29 @@ SecretPersistType EnumUtil::FromString (const char *value) { throw NotImplementedException(StringUtil::Format("Enum value: '%s' not implemented in FromString ", value)); } +template<> +const char* EnumUtil::ToChars (SecretSerializationType value) { + switch(value) { + case SecretSerializationType::CUSTOM: + return "CUSTOM"; + case SecretSerializationType::KEY_VALUE_SECRET: + return "KEY_VALUE_SECRET"; + default: + throw NotImplementedException(StringUtil::Format("Enum value: '%d' not implemented in ToChars ", value)); + } +} + +template<> +SecretSerializationType EnumUtil::FromString (const char *value) { + if (StringUtil::Equals(value, "CUSTOM")) { + return SecretSerializationType::CUSTOM; + } + if (StringUtil::Equals(value, "KEY_VALUE_SECRET")) { + return SecretSerializationType::KEY_VALUE_SECRET; + } + throw NotImplementedException(StringUtil::Format("Enum value: '%s' not implemented in FromString ", value)); +} + template<> const char* EnumUtil::ToChars (SequenceInfo value) { switch(value) { diff --git a/src/duckdb/src/common/exception.cpp b/src/duckdb/src/common/exception.cpp index b8aac720..4527f3ff 100644 --- a/src/duckdb/src/common/exception.cpp +++ b/src/duckdb/src/common/exception.cpp @@ -292,6 +292,9 @@ PermissionException::PermissionException(const string &msg) : Exception(Exceptio SyntaxException::SyntaxException(const string &msg) : Exception(ExceptionType::SYNTAX, msg) { } +ExecutorException::ExecutorException(const string &msg) : Exception(ExceptionType::EXECUTOR, msg) { +} + ConstraintException::ConstraintException(const string &msg) : Exception(ExceptionType::CONSTRAINT, msg) { } diff --git a/src/duckdb/src/common/extra_type_info.cpp b/src/duckdb/src/common/extra_type_info.cpp index 6c09480c..54f03447 100644 --- a/src/duckdb/src/common/extra_type_info.cpp +++ b/src/duckdb/src/common/extra_type_info.cpp @@ -1,4 +1,5 @@ #include "duckdb/common/extra_type_info.hpp" +#include "duckdb/common/extra_type_info/enum_type_info.hpp" #include "duckdb/common/serializer/deserializer.hpp" #include "duckdb/common/enum_util.hpp" #include "duckdb/common/numeric_utils.hpp" @@ -220,50 +221,6 @@ PhysicalType EnumTypeInfo::DictType(idx_t size) { } } -template -struct EnumTypeInfoTemplated : public EnumTypeInfo { - explicit EnumTypeInfoTemplated(Vector &values_insert_order_p, idx_t size_p) - : EnumTypeInfo(values_insert_order_p, size_p) { - D_ASSERT(values_insert_order_p.GetType().InternalType() == PhysicalType::VARCHAR); - - UnifiedVectorFormat vdata; - values_insert_order.ToUnifiedFormat(size_p, vdata); - - auto data = UnifiedVectorFormat::GetData (vdata); - for (idx_t i = 0; i < size_p; i++) { - auto idx = vdata.sel->get_index(i); - if (!vdata.validity.RowIsValid(idx)) { - throw InternalException("Attempted to create ENUM type with NULL value"); - } - if (values.count(data[idx]) > 0) { - throw InvalidInputException("Attempted to create ENUM type with duplicate value %s", - data[idx].GetString()); - } - values[data[idx]] = UnsafeNumericCast (i); - } - } - - static shared_ptr Deserialize(Deserializer &deserializer, uint32_t size) { - Vector values_insert_order(LogicalType::VARCHAR, size); - auto strings = FlatVector::GetData (values_insert_order); - - deserializer.ReadList(201, "values", [&](Deserializer::List &list, idx_t i) { - strings[i] = StringVector::AddStringOrBlob(values_insert_order, list.ReadElement ()); - }); - return make_shared_ptr (values_insert_order, size); - } - - const string_map_t &GetValues() const { - return values; - } - - EnumTypeInfoTemplated(const EnumTypeInfoTemplated &) = delete; - EnumTypeInfoTemplated &operator=(const EnumTypeInfoTemplated &) = delete; - -private: - string_map_t values; -}; - EnumTypeInfo::EnumTypeInfo(Vector &values_insert_order_p, idx_t dict_size_p) : ExtraTypeInfo(ExtraTypeInfoType::ENUM_TYPE_INFO), values_insert_order(values_insert_order_p), dict_type(EnumDictType::VECTOR_DICT), dict_size(dict_size_p) { diff --git a/src/duckdb/src/common/field_writer.cpp b/src/duckdb/src/common/field_writer.cpp new file mode 100644 index 00000000..af899ca3 --- /dev/null +++ b/src/duckdb/src/common/field_writer.cpp @@ -0,0 +1,97 @@ +#include "duckdb/common/field_writer.hpp" + +namespace duckdb { + +//===--------------------------------------------------------------------===// +// Field Writer +//===--------------------------------------------------------------------===// +FieldWriter::FieldWriter(Serializer &serializer_p) + : serializer(serializer_p), buffer(make_uniq ()), field_count(0), finalized(false) { + buffer->SetVersion(serializer.GetVersion()); +} + +FieldWriter::~FieldWriter() { + if (Exception::UncaughtException()) { + return; + } + D_ASSERT(finalized); + // finalize should always have been called, unless this is destroyed as part of stack unwinding + D_ASSERT(!buffer); +} + +void FieldWriter::WriteData(const_data_ptr_t buffer_ptr, idx_t write_size) { + D_ASSERT(buffer); + buffer->WriteData(buffer_ptr, write_size); +} + +template <> +void FieldWriter::Write(const string &val) { + Write ((uint32_t)val.size()); + if (!val.empty()) { + WriteData(const_data_ptr_cast(val.c_str()), val.size()); + } +} + +void FieldWriter::Finalize() { + D_ASSERT(buffer); + D_ASSERT(!finalized); + finalized = true; + serializer.Write (field_count); + serializer.Write (buffer->blob.size); + serializer.WriteData(buffer->blob.data.get(), buffer->blob.size); + + buffer.reset(); +} + +//===--------------------------------------------------------------------===// +// Field Deserializer +//===--------------------------------------------------------------------===// +FieldDeserializer::FieldDeserializer(Deserializer &root) : root(root), remaining_data(idx_t(-1)) { + SetVersion(root.GetVersion()); +} + +void FieldDeserializer::ReadData(data_ptr_t buffer, idx_t read_size) { + D_ASSERT(remaining_data != idx_t(-1)); + D_ASSERT(read_size <= remaining_data); + root.ReadData(buffer, read_size); + remaining_data -= read_size; +} + +idx_t FieldDeserializer::RemainingData() { + return remaining_data; +} + +void FieldDeserializer::SetRemainingData(idx_t remaining_data) { + this->remaining_data = remaining_data; +} + +//===--------------------------------------------------------------------===// +// Field Reader +//===--------------------------------------------------------------------===// +FieldReader::FieldReader(Deserializer &source_p) : source(source_p), field_count(0), finalized(false) { + max_field_count = source_p.Read (); + total_size = source_p.Read (); + D_ASSERT(max_field_count > 0); + D_ASSERT(total_size > 0); + source.SetRemainingData(total_size); +} + +FieldReader::~FieldReader() { + if (Exception::UncaughtException()) { + return; + } + D_ASSERT(finalized); +} + +void FieldReader::Finalize() { + D_ASSERT(!finalized); + finalized = true; + if (field_count < max_field_count) { + // we can handle this case by calling source.ReadData(buffer, source.RemainingData()) + throw SerializationException("Not all fields were read. This file might have been written with a newer version " + "of DuckDB and is incompatible with this version of DuckDB."); + } + D_ASSERT(source.RemainingData() == 0); +} + +} // namespace duckdb diff --git a/src/duckdb/src/common/render_tree.cpp b/src/duckdb/src/common/render_tree.cpp index 9c7dca21..6942d6fc 100644 --- a/src/duckdb/src/common/render_tree.cpp +++ b/src/duckdb/src/common/render_tree.cpp @@ -118,7 +118,7 @@ static unique_ptr CreateNode(const PipelineRenderNode &op) { static unique_ptr CreateNode(const ProfilingNode &op) { auto &info = op.GetProfilingInfo(); InsertionOrderPreservingMap extra_info; - if (info.Enabled(MetricsType::EXTRA_INFO)) { + if (info.Enabled(info.settings, MetricsType::EXTRA_INFO)) { extra_info = op.GetProfilingInfo().extra_info; } @@ -128,11 +128,13 @@ static unique_ptr CreateNode(const ProfilingNode &op) { } auto result = make_uniq (node_name, extra_info); - if (info.Enabled(MetricsType::OPERATOR_CARDINALITY)) { - result->extra_text[RenderTreeNode::CARDINALITY] = info.GetMetricAsString(MetricsType::OPERATOR_CARDINALITY); + if (info.Enabled(info.settings, MetricsType::OPERATOR_CARDINALITY)) { + auto cardinality = info.GetMetricAsString(MetricsType::OPERATOR_CARDINALITY); + result->extra_text[RenderTreeNode::CARDINALITY] = cardinality; } - if (info.Enabled(MetricsType::OPERATOR_TIMING)) { - string timing = StringUtil::Format("%.2f", info.metrics.at(MetricsType::OPERATOR_TIMING).GetValue ()); + if (info.Enabled(info.settings, MetricsType::OPERATOR_TIMING)) { + auto value = info.metrics.at(MetricsType::OPERATOR_TIMING).GetValue (); + string timing = StringUtil::Format("%.2f", value); result->extra_text[RenderTreeNode::TIMING] = timing + "s"; } return result; diff --git a/src/duckdb/src/common/row_operations/row_match.cpp b/src/duckdb/src/common/row_operations/row_match.cpp new file mode 100644 index 00000000..b7727e79 --- /dev/null +++ b/src/duckdb/src/common/row_operations/row_match.cpp @@ -0,0 +1,359 @@ +//===--------------------------------------------------------------------===// +// row_match.cpp +// Description: This file contains the implementation of the match operators +//===--------------------------------------------------------------------===// + +#include "duckdb/common/exception.hpp" +#include "duckdb/common/operator/comparison_operators.hpp" +#include "duckdb/common/operator/constant_operators.hpp" +#include "duckdb/common/row_operations/row_operations.hpp" +#include "duckdb/common/types/row/tuple_data_collection.hpp" + +namespace duckdb { + +using ValidityBytes = RowLayout::ValidityBytes; +using Predicates = RowOperations::Predicates; + +template +static idx_t SelectComparison(Vector &left, Vector &right, const SelectionVector &sel, idx_t count, + SelectionVector *true_sel, SelectionVector *false_sel) { + throw NotImplementedException("Unsupported nested comparison operand for RowOperations::Match"); +} + +template <> +idx_t SelectComparison (Vector &left, Vector &right, const SelectionVector &sel, idx_t count, + SelectionVector *true_sel, SelectionVector *false_sel) { + return VectorOperations::NestedEquals(left, right, sel, count, true_sel, false_sel); +} + +template <> +idx_t SelectComparison (Vector &left, Vector &right, const SelectionVector &sel, idx_t count, + SelectionVector *true_sel, SelectionVector *false_sel) { + return VectorOperations::NestedNotEquals(left, right, sel, count, true_sel, false_sel); +} + +template <> +idx_t SelectComparison (Vector &left, Vector &right, const SelectionVector &sel, idx_t count, + SelectionVector *true_sel, SelectionVector *false_sel) { + return VectorOperations::DistinctGreaterThan(left, right, &sel, count, true_sel, false_sel); +} + +template <> +idx_t SelectComparison (Vector &left, Vector &right, const SelectionVector &sel, idx_t count, + SelectionVector *true_sel, SelectionVector *false_sel) { + return VectorOperations::DistinctGreaterThanEquals(left, right, &sel, count, true_sel, false_sel); +} + +template <> +idx_t SelectComparison (Vector &left, Vector &right, const SelectionVector &sel, idx_t count, + SelectionVector *true_sel, SelectionVector *false_sel) { + return VectorOperations::DistinctLessThan(left, right, &sel, count, true_sel, false_sel); +} + +template <> +idx_t SelectComparison (Vector &left, Vector &right, const SelectionVector &sel, idx_t count, + SelectionVector *true_sel, SelectionVector *false_sel) { + return VectorOperations::DistinctLessThanEquals(left, right, &sel, count, true_sel, false_sel); +} + +template +static void TemplatedMatchType(UnifiedVectorFormat &col, Vector &rows, SelectionVector &sel, idx_t &count, + idx_t col_offset, idx_t col_no, SelectionVector *no_match, idx_t &no_match_count) { + // Precompute row_mask indexes + idx_t entry_idx; + idx_t idx_in_entry; + ValidityBytes::GetEntryIndex(col_no, entry_idx, idx_in_entry); + + auto data = UnifiedVectorFormat::GetData (col); + auto ptrs = FlatVector::GetData (rows); + idx_t match_count = 0; + if (!col.validity.AllValid()) { + for (idx_t i = 0; i < count; i++) { + auto idx = sel.get_index(i); + + auto row = ptrs[idx]; + ValidityBytes row_mask(row); + auto isnull = !row_mask.RowIsValid(row_mask.GetValidityEntry(entry_idx), idx_in_entry); + + auto col_idx = col.sel->get_index(idx); + if (!col.validity.RowIsValid(col_idx)) { + if (isnull) { + // match: move to next value to compare + sel.set_index(match_count++, idx); + } else { + if (NO_MATCH_SEL) { + no_match->set_index(no_match_count++, idx); + } + } + } else { + auto value = Load (row + col_offset); + if (!isnull && OP::template Operation (data[col_idx], value)) { + sel.set_index(match_count++, idx); + } else { + if (NO_MATCH_SEL) { + no_match->set_index(no_match_count++, idx); + } + } + } + } + } else { + for (idx_t i = 0; i < count; i++) { + auto idx = sel.get_index(i); + + auto row = ptrs[idx]; + ValidityBytes row_mask(row); + auto isnull = !row_mask.RowIsValid(row_mask.GetValidityEntry(entry_idx), idx_in_entry); + + auto col_idx = col.sel->get_index(idx); + auto value = Load (row + col_offset); + if (!isnull && OP::template Operation (data[col_idx], value)) { + sel.set_index(match_count++, idx); + } else { + if (NO_MATCH_SEL) { + no_match->set_index(no_match_count++, idx); + } + } + } + } + count = match_count; +} + +//! Forward declaration for recursion +template +static void TemplatedMatchOp(Vector &vec, UnifiedVectorFormat &col, const TupleDataLayout &layout, Vector &rows, + SelectionVector &sel, idx_t &count, idx_t col_no, SelectionVector *no_match, + idx_t &no_match_count, const idx_t original_count); + +template +static void TemplatedMatchStruct(Vector &vec, UnifiedVectorFormat &col, const TupleDataLayout &layout, Vector &rows, + SelectionVector &sel, idx_t &count, const idx_t col_no, SelectionVector *no_match, + idx_t &no_match_count, const idx_t original_count) { + // Precompute row_mask indexes + idx_t entry_idx; + idx_t idx_in_entry; + ValidityBytes::GetEntryIndex(col_no, entry_idx, idx_in_entry); + + // Work our way through the validity of the whole struct + auto ptrs = FlatVector::GetData (rows); + idx_t match_count = 0; + if (!col.validity.AllValid()) { + for (idx_t i = 0; i < count; i++) { + auto idx = sel.get_index(i); + + auto row = ptrs[idx]; + ValidityBytes row_mask(row); + auto isnull = !row_mask.RowIsValid(row_mask.GetValidityEntry(entry_idx), idx_in_entry); + + auto col_idx = col.sel->get_index(idx); + if (!col.validity.RowIsValid(col_idx)) { + if (isnull) { + // match: move to next value to compare + sel.set_index(match_count++, idx); + } else { + if (NO_MATCH_SEL) { + no_match->set_index(no_match_count++, idx); + } + } + } else { + if (!isnull) { + sel.set_index(match_count++, idx); + } else { + if (NO_MATCH_SEL) { + no_match->set_index(no_match_count++, idx); + } + } + } + } + } else { + for (idx_t i = 0; i < count; i++) { + auto idx = sel.get_index(i); + + auto row = ptrs[idx]; + ValidityBytes row_mask(row); + auto isnull = !row_mask.RowIsValid(row_mask.GetValidityEntry(entry_idx), idx_in_entry); + + if (!isnull) { + sel.set_index(match_count++, idx); + } else { + if (NO_MATCH_SEL) { + no_match->set_index(no_match_count++, idx); + } + } + } + } + count = match_count; + + // Now we construct row pointers to the structs + Vector struct_rows(LogicalTypeId::POINTER); + auto struct_ptrs = FlatVector::GetData (struct_rows); + + const auto col_offset = layout.GetOffsets()[col_no]; + for (idx_t i = 0; i < count; i++) { + auto idx = sel.get_index(i); + auto row = ptrs[idx]; + struct_ptrs[idx] = row + col_offset; + } + + // Get the struct layout, child columns, then recurse + const auto &struct_layout = layout.GetStructLayout(col_no); + auto &struct_entries = StructVector::GetEntries(vec); + D_ASSERT(struct_layout.ColumnCount() == struct_entries.size()); + for (idx_t struct_col_no = 0; struct_col_no < struct_layout.ColumnCount(); struct_col_no++) { + auto &struct_vec = *struct_entries[struct_col_no]; + UnifiedVectorFormat struct_col; + struct_vec.ToUnifiedFormat(original_count, struct_col); + TemplatedMatchOp (struct_vec, struct_col, struct_layout, struct_rows, sel, count, + struct_col_no, no_match, no_match_count, original_count); + } +} + +template +static void TemplatedMatchList(Vector &col, Vector &rows, SelectionVector &sel, idx_t &count, + const TupleDataLayout &layout, const idx_t col_no, SelectionVector *no_match, + idx_t &no_match_count) { + // Gather a dense Vector containing the column values being matched + Vector key(col.GetType()); + const auto gather_function = TupleDataCollection::GetGatherFunction(col.GetType()); + gather_function.function(layout, rows, col_no, sel, count, key, *FlatVector::IncrementalSelectionVector(), key, + gather_function.child_functions); + + // Densify the input column + Vector sliced(col, sel, count); + + if (NO_MATCH_SEL) { + SelectionVector no_match_sel_offset(no_match->data() + no_match_count); + auto match_count = SelectComparison (sliced, key, sel, count, &sel, &no_match_sel_offset); + no_match_count += count - match_count; + count = match_count; + } else { + count = SelectComparison (sliced, key, sel, count, &sel, nullptr); + } +} + +template +static void TemplatedMatchOp(Vector &vec, UnifiedVectorFormat &col, const TupleDataLayout &layout, Vector &rows, + SelectionVector &sel, idx_t &count, idx_t col_no, SelectionVector *no_match, + idx_t &no_match_count, const idx_t original_count) { + if (count == 0) { + return; + } + auto col_offset = layout.GetOffsets()[col_no]; + switch (layout.GetTypes()[col_no].InternalType()) { + case PhysicalType::BOOL: + case PhysicalType::INT8: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::INT16: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::INT32: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::INT64: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::UINT8: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::UINT16: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::UINT32: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::UINT64: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::INT128: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::FLOAT: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::DOUBLE: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::INTERVAL: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::VARCHAR: + TemplatedMatchType (col, rows, sel, count, col_offset, col_no, no_match, + no_match_count); + break; + case PhysicalType::STRUCT: + TemplatedMatchStruct (vec, col, layout, rows, sel, count, col_no, no_match, no_match_count, + original_count); + break; + case PhysicalType::LIST: + TemplatedMatchList (vec, rows, sel, count, layout, col_no, no_match, no_match_count); + break; + default: + throw InternalException("Unsupported column type for RowOperations::Match"); + } +} + +template +static void TemplatedMatch(DataChunk &columns, UnifiedVectorFormat col_data[], const TupleDataLayout &layout, + Vector &rows, const Predicates &predicates, SelectionVector &sel, idx_t &count, + SelectionVector *no_match, idx_t &no_match_count) { + for (idx_t col_no = 0; col_no < predicates.size(); ++col_no) { + auto &vec = columns.data[col_no]; + auto &col = col_data[col_no]; + switch (predicates[col_no]) { + case ExpressionType::COMPARE_EQUAL: + case ExpressionType::COMPARE_NOT_DISTINCT_FROM: + case ExpressionType::COMPARE_DISTINCT_FROM: + TemplatedMatchOp (vec, col, layout, rows, sel, count, col_no, no_match, no_match_count, + count); + break; + case ExpressionType::COMPARE_NOTEQUAL: + TemplatedMatchOp (vec, col, layout, rows, sel, count, col_no, no_match, + no_match_count, count); + break; + case ExpressionType::COMPARE_GREATERTHAN: + TemplatedMatchOp (vec, col, layout, rows, sel, count, col_no, no_match, + no_match_count, count); + break; + case ExpressionType::COMPARE_GREATERTHANOREQUALTO: + TemplatedMatchOp (vec, col, layout, rows, sel, count, col_no, no_match, + no_match_count, count); + break; + case ExpressionType::COMPARE_LESSTHAN: + TemplatedMatchOp (vec, col, layout, rows, sel, count, col_no, no_match, + no_match_count, count); + break; + case ExpressionType::COMPARE_LESSTHANOREQUALTO: + TemplatedMatchOp (vec, col, layout, rows, sel, count, col_no, no_match, + no_match_count, count); + break; + default: + throw InternalException("Unsupported comparison type for RowOperations::Match"); + } + } +} + +idx_t RowOperations::Match(DataChunk &columns, UnifiedVectorFormat col_data[], const TupleDataLayout &layout, + Vector &rows, const Predicates &predicates, SelectionVector &sel, idx_t count, + SelectionVector *no_match, idx_t &no_match_count) { + if (no_match) { + TemplatedMatch (columns, col_data, layout, rows, predicates, sel, count, no_match, no_match_count); + } else { + TemplatedMatch (columns, col_data, layout, rows, predicates, sel, count, no_match, no_match_count); + } + + return count; +} + +} // namespace duckdb diff --git a/src/duckdb/src/common/serializer.cpp b/src/duckdb/src/common/serializer.cpp new file mode 100644 index 00000000..2321fb80 --- /dev/null +++ b/src/duckdb/src/common/serializer.cpp @@ -0,0 +1,24 @@ +#include "duckdb/common/serializer.hpp" + +namespace duckdb { + +template <> +string Deserializer::Read() { + uint32_t size = Read (); + if (size == 0) { + return string(); + } + auto buffer = make_unsafe_uniq_array (size); + ReadData(buffer.get(), size); + return string(const_char_ptr_cast(buffer.get()), size); +} + +void Deserializer::ReadStringVector(vector &list) { + uint32_t sz = Read (); + list.resize(sz); + for (idx_t i = 0; i < sz; i++) { + list[i] = Read (); + } +} + +} // namespace duckdb diff --git a/src/duckdb/src/common/serializer/buffered_deserializer.cpp b/src/duckdb/src/common/serializer/buffered_deserializer.cpp new file mode 100644 index 00000000..e1636eb8 --- /dev/null +++ b/src/duckdb/src/common/serializer/buffered_deserializer.cpp @@ -0,0 +1,27 @@ +#include "duckdb/common/serializer/buffered_deserializer.hpp" + +#include + +namespace duckdb { + +BufferedDeserializer::BufferedDeserializer(data_ptr_t ptr, idx_t data_size) : ptr(ptr), endptr(ptr + data_size) { +} + +BufferedDeserializer::BufferedDeserializer(BufferedSerializer &serializer) + : BufferedDeserializer(serializer.data, serializer.maximum_size) { + SetVersion(serializer.GetVersion()); +} + +void BufferedDeserializer::ReadData(data_ptr_t buffer, idx_t read_size) { + if (ptr + read_size > endptr) { + throw SerializationException("Failed to deserialize: not enough data in buffer to fulfill read request"); + } + memcpy(buffer, ptr, read_size); + ptr += read_size; +} + +ClientContext &BufferedContextDeserializer::GetContext() { + return context; +} + +} // namespace duckdb diff --git a/src/duckdb/src/common/serializer/buffered_serializer.cpp b/src/duckdb/src/common/serializer/buffered_serializer.cpp new file mode 100644 index 00000000..af2ec931 --- /dev/null +++ b/src/duckdb/src/common/serializer/buffered_serializer.cpp @@ -0,0 +1,36 @@ +#include "duckdb/common/serializer/buffered_serializer.hpp" + +#include + +namespace duckdb { + +BufferedSerializer::BufferedSerializer(idx_t maximum_size) + : BufferedSerializer(make_unsafe_uniq_array (maximum_size), maximum_size) { +} + +BufferedSerializer::BufferedSerializer(unsafe_unique_array data, idx_t size) + : maximum_size(size), data(data.get()) { + blob.size = 0; + blob.data = std::move(data); +} + +BufferedSerializer::BufferedSerializer(data_ptr_t data, idx_t size) : maximum_size(size), data(data) { + blob.size = 0; +} + +void BufferedSerializer::WriteData(const_data_ptr_t buffer, idx_t write_size) { + if (blob.size + write_size >= maximum_size) { + do { + maximum_size *= 2; + } while (blob.size + write_size > maximum_size); + auto new_data = new data_t[maximum_size]; + memcpy(new_data, data, blob.size); + data = new_data; + blob.data = unsafe_unique_array (new_data); + } + + memcpy(data + blob.size, buffer, write_size); + blob.size += write_size; +} + +} // namespace duckdb diff --git a/src/duckdb/src/common/serializer/format_serializer.cpp b/src/duckdb/src/common/serializer/format_serializer.cpp new file mode 100644 index 00000000..76415a81 --- /dev/null +++ b/src/duckdb/src/common/serializer/format_serializer.cpp @@ -0,0 +1,15 @@ +#include "duckdb/common/serializer/format_serializer.hpp" + +namespace duckdb { + +template <> +void FormatSerializer::WriteValue(const vector &vec) { + auto count = vec.size(); + OnListBegin(count); + for (auto item : vec) { + WriteValue(item); + } + OnListEnd(count); +} + +} // namespace duckdb diff --git a/src/duckdb/src/common/sort/comparators.cpp b/src/duckdb/src/common/sort/comparators.cpp index 82e8069d..560b44cc 100644 --- a/src/duckdb/src/common/sort/comparators.cpp +++ b/src/duckdb/src/common/sort/comparators.cpp @@ -24,7 +24,7 @@ bool Comparators::TieIsBreakable(const idx_t &tie_col, const data_ptr_t &row_ptr } const auto &tie_col_offset = row_layout.GetOffsets()[col_idx]; auto tie_string = Load (row_ptr + tie_col_offset); - if (tie_string.GetSize() < sort_layout.prefix_lengths[tie_col]) { + if (tie_string.GetSize() < sort_layout.prefix_lengths[tie_col] && tie_string.GetSize() > 0) { // No need to break the tie - we already compared the full string return false; } @@ -71,7 +71,7 @@ int Comparators::BreakBlobTie(const idx_t &tie_col, const SBScanState &left, con const SortLayout &sort_layout, const bool &external) { data_ptr_t l_data_ptr = left.DataPtr(*left.sb->blob_sorting_data); data_ptr_t r_data_ptr = right.DataPtr(*right.sb->blob_sorting_data); - if (!TieIsBreakable(tie_col, l_data_ptr, sort_layout)) { + if (!TieIsBreakable(tie_col, l_data_ptr, sort_layout) && !TieIsBreakable(tie_col, r_data_ptr, sort_layout)) { // Quick check to see if ties can be broken return 0; } diff --git a/src/duckdb/src/common/types/bit.cpp b/src/duckdb/src/common/types/bit.cpp index f263c2c4..5006d64f 100644 --- a/src/duckdb/src/common/types/bit.cpp +++ b/src/duckdb/src/common/types/bit.cpp @@ -22,7 +22,7 @@ idx_t Bit::ComputeBitstringLen(idx_t len) { return result; } -static inline idx_t GetBitPadding(const string_t &bit_string) { +static inline idx_t GetBitPadding(const bitstring_t &bit_string) { auto data = const_data_ptr_cast(bit_string.GetData()); D_ASSERT(idx_t(data[0]) <= 8); return data[0]; @@ -37,14 +37,14 @@ static inline idx_t GetBitSize(const string_t &str) { return str_len; } -uint8_t Bit::GetFirstByte(const string_t &str) { +uint8_t Bit::GetFirstByte(const bitstring_t &str) { D_ASSERT(str.GetSize() > 1); auto data = const_data_ptr_cast(str.GetData()); return data[1] & ((1 << (8 - data[0])) - 1); } -void Bit::Finalize(string_t &str) { +void Bit::Finalize(bitstring_t &str) { // bit strings require all padding bits to be set to 1 // this method sets all padding bits to 1 auto padding = GetBitPadding(str); @@ -55,7 +55,7 @@ void Bit::Finalize(string_t &str) { Bit::Verify(str); } -void Bit::SetEmptyBitString(string_t &target, string_t &input) { +void Bit::SetEmptyBitString(bitstring_t &target, string_t &input) { char *res_buf = target.GetDataWriteable(); const char *buf = input.GetData(); memset(res_buf, 0, input.GetSize()); @@ -63,7 +63,7 @@ void Bit::SetEmptyBitString(string_t &target, string_t &input) { Bit::Finalize(target); } -void Bit::SetEmptyBitString(string_t &target, idx_t len) { +void Bit::SetEmptyBitString(bitstring_t &target, idx_t len) { char *res_buf = target.GetDataWriteable(); memset(res_buf, 0, target.GetSize()); res_buf[0] = ComputePadding(len); @@ -71,7 +71,7 @@ void Bit::SetEmptyBitString(string_t &target, idx_t len) { } // **** casting functions **** -void Bit::ToString(string_t bits, char *output) { +void Bit::ToString(bitstring_t bits, char *output) { auto data = const_data_ptr_cast(bits.GetData()); auto len = bits.GetSize(); @@ -87,7 +87,7 @@ void Bit::ToString(string_t bits, char *output) { } } -string Bit::ToString(string_t str) { +string Bit::ToString(bitstring_t str) { auto len = BitLength(str); auto buffer = make_unsafe_uniq_array_uninitialized (len); ToString(str, buffer.get()); @@ -117,7 +117,7 @@ bool Bit::TryGetBitStringSize(string_t str, idx_t &str_len, string *error_messag return true; } -void Bit::ToBit(string_t str, string_t &output_str) { +void Bit::ToBit(string_t str, bitstring_t &output_str) { auto data = const_data_ptr_cast(str.GetData()); auto len = str.GetSize(); auto output = output_str.GetDataWriteable(); @@ -151,12 +151,12 @@ void Bit::ToBit(string_t str, string_t &output_str) { string Bit::ToBit(string_t str) { auto bit_len = GetBitSize(str); auto buffer = make_unsafe_uniq_array_uninitialized (bit_len); - string_t output_str(buffer.get(), UnsafeNumericCast (bit_len)); + bitstring_t output_str(buffer.get(), UnsafeNumericCast (bit_len)); Bit::ToBit(str, output_str); return output_str.GetString(); } -void Bit::BlobToBit(string_t blob, string_t &output_str) { +void Bit::BlobToBit(string_t blob, bitstring_t &output_str) { auto data = const_data_ptr_cast(blob.GetData()); auto output = output_str.GetDataWriteable(); idx_t size = blob.GetSize(); @@ -167,12 +167,12 @@ void Bit::BlobToBit(string_t blob, string_t &output_str) { string Bit::BlobToBit(string_t blob) { auto buffer = make_unsafe_uniq_array_uninitialized (blob.GetSize() + 1); - string_t output_str(buffer.get(), UnsafeNumericCast (blob.GetSize() + 1)); + bitstring_t output_str(buffer.get(), UnsafeNumericCast (blob.GetSize() + 1)); Bit::BlobToBit(blob, output_str); return output_str.GetString(); } -void Bit::BitToBlob(string_t bit, string_t &output_blob) { +void Bit::BitToBlob(bitstring_t bit, string_t &output_blob) { D_ASSERT(bit.GetSize() == output_blob.GetSize() + 1); auto data = const_data_ptr_cast(bit.GetData()); @@ -189,7 +189,7 @@ void Bit::BitToBlob(string_t bit, string_t &output_blob) { } } -string Bit::BitToBlob(string_t bit) { +string Bit::BitToBlob(bitstring_t bit) { D_ASSERT(bit.GetSize() > 1); auto buffer = make_unsafe_uniq_array_uninitialized (bit.GetSize() - 1); @@ -199,32 +199,53 @@ string Bit::BitToBlob(string_t bit) { } // **** scalar functions **** -void Bit::BitString(const string_t &input, const idx_t &bit_length, string_t &result) { +void Bit::BitString(const string_t &input, idx_t bit_length, bitstring_t &result) { char *res_buf = result.GetDataWriteable(); const char *buf = input.GetData(); auto padding = ComputePadding(bit_length); res_buf[0] = padding; + auto padding_len = UnsafeNumericCast (padding); for (idx_t i = 0; i < bit_length; i++) { if (i < bit_length - input.GetSize()) { - Bit::SetBit(result, i, 0); + Bit::SetBitInternal(result, i + padding_len, 0); } else { idx_t bit = buf[i - (bit_length - input.GetSize())] == '1' ? 1 : 0; + Bit::SetBitInternal(result, i + padding_len, bit); + } + } + Bit::Finalize(result); +} + +void Bit::ExtendBitString(const bitstring_t &input, idx_t bit_length, bitstring_t &result) { + uint8_t *res_buf = reinterpret_cast (result.GetDataWriteable()); + + auto padding = ComputePadding(bit_length); + res_buf[0] = static_cast (padding); + + idx_t original_length = Bit::BitLength(input); + D_ASSERT(bit_length >= original_length); + idx_t shift = bit_length - original_length; + for (idx_t i = 0; i < bit_length; i++) { + if (i < shift) { + Bit::SetBit(result, i, 0); + } else { + idx_t bit = Bit::GetBit(input, i - shift); Bit::SetBit(result, i, bit); } } Bit::Finalize(result); } -idx_t Bit::BitLength(string_t bits) { +idx_t Bit::BitLength(bitstring_t bits) { return ((bits.GetSize() - 1) * 8) - GetBitPadding(bits); } -idx_t Bit::OctetLength(string_t bits) { +idx_t Bit::OctetLength(bitstring_t bits) { return bits.GetSize() - 1; } -idx_t Bit::BitCount(string_t bits) { +idx_t Bit::BitCount(bitstring_t bits) { idx_t count = 0; const char *buf = bits.GetData(); for (idx_t byte_idx = 1; byte_idx < OctetLength(bits) + 1; byte_idx++) { @@ -235,7 +256,7 @@ idx_t Bit::BitCount(string_t bits) { return count - GetBitPadding(bits); } -idx_t Bit::BitPosition(string_t substring, string_t bits) { +idx_t Bit::BitPosition(bitstring_t substring, bitstring_t bits) { const char *buf = bits.GetData(); auto len = bits.GetSize(); auto substr_len = BitLength(substring); @@ -269,7 +290,7 @@ idx_t Bit::BitPosition(string_t substring, string_t bits) { return 0; } -idx_t Bit::GetBit(string_t bit_string, idx_t n) { +idx_t Bit::GetBit(bitstring_t bit_string, idx_t n) { return Bit::GetBitInternal(bit_string, n + GetBitPadding(bit_string)); } @@ -277,7 +298,7 @@ idx_t Bit::GetBitIndex(idx_t n) { return n / 8 + 1; } -idx_t Bit::GetBitInternal(string_t bit_string, idx_t n) { +idx_t Bit::GetBitInternal(bitstring_t bit_string, idx_t n) { const char *buf = bit_string.GetData(); auto idx = Bit::GetBitIndex(n); D_ASSERT(idx < bit_string.GetSize()); @@ -285,12 +306,12 @@ idx_t Bit::GetBitInternal(string_t bit_string, idx_t n) { return (byte & 1 ? 1 : 0); } -void Bit::SetBit(string_t &bit_string, idx_t n, idx_t new_value) { +void Bit::SetBit(bitstring_t &bit_string, idx_t n, idx_t new_value) { SetBitInternal(bit_string, n + GetBitPadding(bit_string), new_value); Bit::Finalize(bit_string); } -void Bit::SetBitInternal(string_t &bit_string, idx_t n, idx_t new_value) { +void Bit::SetBitInternal(bitstring_t &bit_string, idx_t n, idx_t new_value) { uint8_t *buf = reinterpret_cast (bit_string.GetDataWriteable()); auto idx = Bit::GetBitIndex(n); @@ -305,39 +326,41 @@ void Bit::SetBitInternal(string_t &bit_string, idx_t n, idx_t new_value) { } // **** BITWISE operators **** -void Bit::RightShift(const string_t &bit_string, const idx_t &shift, string_t &result) { +void Bit::RightShift(const bitstring_t &bit_string, idx_t shift, bitstring_t &result) { uint8_t *res_buf = reinterpret_cast (result.GetDataWriteable()); const uint8_t *buf = reinterpret_cast (bit_string.GetData()); res_buf[0] = buf[0]; + auto padding = GetBitPadding(result); for (idx_t i = 0; i < Bit::BitLength(result); i++) { if (i < shift) { - Bit::SetBit(result, i, 0); + Bit::SetBitInternal(result, i + padding, 0); } else { idx_t bit = Bit::GetBit(bit_string, i - shift); - Bit::SetBit(result, i, bit); + Bit::SetBitInternal(result, i + padding, bit); } } Bit::Finalize(result); } -void Bit::LeftShift(const string_t &bit_string, const idx_t &shift, string_t &result) { +void Bit::LeftShift(const bitstring_t &bit_string, idx_t shift, bitstring_t &result) { uint8_t *res_buf = reinterpret_cast (result.GetDataWriteable()); const uint8_t *buf = reinterpret_cast (bit_string.GetData()); res_buf[0] = buf[0]; + auto padding = GetBitPadding(result); for (idx_t i = 0; i < Bit::BitLength(bit_string); i++) { if (i < (Bit::BitLength(bit_string) - shift)) { idx_t bit = Bit::GetBit(bit_string, shift + i); - Bit::SetBit(result, i, bit); + Bit::SetBitInternal(result, i + padding, bit); } else { - Bit::SetBit(result, i, 0); + Bit::SetBitInternal(result, i + padding, 0); } } Bit::Finalize(result); } -void Bit::BitwiseAnd(const string_t &rhs, const string_t &lhs, string_t &result) { +void Bit::BitwiseAnd(const bitstring_t &rhs, const bitstring_t &lhs, bitstring_t &result) { if (Bit::BitLength(lhs) != Bit::BitLength(rhs)) { throw InvalidInputException("Cannot AND bit strings of different sizes"); } @@ -353,7 +376,7 @@ void Bit::BitwiseAnd(const string_t &rhs, const string_t &lhs, string_t &result) Bit::Finalize(result); } -void Bit::BitwiseOr(const string_t &rhs, const string_t &lhs, string_t &result) { +void Bit::BitwiseOr(const bitstring_t &rhs, const bitstring_t &lhs, bitstring_t &result) { if (Bit::BitLength(lhs) != Bit::BitLength(rhs)) { throw InvalidInputException("Cannot OR bit strings of different sizes"); } @@ -369,7 +392,7 @@ void Bit::BitwiseOr(const string_t &rhs, const string_t &lhs, string_t &result) Bit::Finalize(result); } -void Bit::BitwiseXor(const string_t &rhs, const string_t &lhs, string_t &result) { +void Bit::BitwiseXor(const bitstring_t &rhs, const bitstring_t &lhs, bitstring_t &result) { if (Bit::BitLength(lhs) != Bit::BitLength(rhs)) { throw InvalidInputException("Cannot XOR bit strings of different sizes"); } @@ -385,7 +408,7 @@ void Bit::BitwiseXor(const string_t &rhs, const string_t &lhs, string_t &result) Bit::Finalize(result); } -void Bit::BitwiseNot(const string_t &input, string_t &result) { +void Bit::BitwiseNot(const bitstring_t &input, bitstring_t &result) { uint8_t *result_buf = reinterpret_cast (result.GetDataWriteable()); const uint8_t *buf = reinterpret_cast (input.GetData()); @@ -396,7 +419,7 @@ void Bit::BitwiseNot(const string_t &input, string_t &result) { Bit::Finalize(result); } -void Bit::Verify(const string_t &input) { +void Bit::Verify(const bitstring_t &input) { #ifdef DEBUG // bit strings require all padding bits to be set to 1 auto padding = GetBitPadding(input); diff --git a/src/duckdb/src/common/types/data_chunk.cpp b/src/duckdb/src/common/types/data_chunk.cpp index eea02568..8b00a95f 100644 --- a/src/duckdb/src/common/types/data_chunk.cpp +++ b/src/duckdb/src/common/types/data_chunk.cpp @@ -26,50 +26,53 @@ DataChunk::~DataChunk() { } void DataChunk::InitializeEmpty(const vector &types) { - InitializeEmpty(types.begin(), types.end()); -} - -void DataChunk::Initialize(Allocator &allocator, const vector &types, idx_t capacity_p) { - Initialize(allocator, types.begin(), types.end(), capacity_p); + D_ASSERT(data.empty()); + capacity = STANDARD_VECTOR_SIZE; + for (idx_t i = 0; i < types.size(); i++) { + data.emplace_back(types[i], nullptr); + } } void DataChunk::Initialize(ClientContext &context, const vector &types, idx_t capacity_p) { Initialize(Allocator::Get(context), types, capacity_p); } -idx_t DataChunk::GetAllocationSize() const { - idx_t total_size = 0; - auto cardinality = size(); - for (auto &vec : data) { - total_size += vec.GetAllocationSize(cardinality); - } - return total_size; +void DataChunk::Initialize(Allocator &allocator, const vector &types, idx_t capacity_p) { + auto initialize = vector (types.size(), true); + Initialize(allocator, types, initialize, capacity_p); } -void DataChunk::Initialize(Allocator &allocator, vector ::const_iterator begin, - vector ::const_iterator end, idx_t capacity_p) { - D_ASSERT(data.empty()); // can only be initialized once - D_ASSERT(std::distance(begin, end) != 0); // empty chunk not allowed +void DataChunk::Initialize(ClientContext &context, const vector &types, const vector &initialize, + idx_t capacity_p) { + Initialize(Allocator::Get(context), types, initialize, capacity_p); +} + +void DataChunk::Initialize(Allocator &allocator, const vector &types, const vector &initialize, + idx_t capacity_p) { + D_ASSERT(types.size() == initialize.size()); + D_ASSERT(data.empty()); + capacity = capacity_p; - for (; begin != end; begin++) { - VectorCache cache(allocator, *begin, capacity); + for (idx_t i = 0; i < types.size(); i++) { + if (!initialize[i]) { + data.emplace_back(types[i], nullptr); + vector_caches.emplace_back(); + continue; + } + + VectorCache cache(allocator, types[i], capacity); data.emplace_back(cache); vector_caches.push_back(std::move(cache)); } } -void DataChunk::Initialize(ClientContext &context, vector ::const_iterator begin, - vector ::const_iterator end, idx_t capacity_p) { - Initialize(Allocator::Get(context), begin, end, capacity_p); -} - -void DataChunk::InitializeEmpty(vector ::const_iterator begin, vector ::const_iterator end) { - capacity = STANDARD_VECTOR_SIZE; - D_ASSERT(data.empty()); // can only be initialized once - D_ASSERT(std::distance(begin, end) != 0); // empty chunk not allowed - for (; begin != end; begin++) { - data.emplace_back(*begin, nullptr); +idx_t DataChunk::GetAllocationSize() const { + idx_t total_size = 0; + auto cardinality = size(); + for (auto &vec : data) { + total_size += vec.GetAllocationSize(cardinality); } + return total_size; } void DataChunk::Reset() { diff --git a/src/duckdb/src/common/types/vector_cache.cpp b/src/duckdb/src/common/types/vector_cache.cpp index 56664319..49ffe357 100644 --- a/src/duckdb/src/common/types/vector_cache.cpp +++ b/src/duckdb/src/common/types/vector_cache.cpp @@ -118,19 +118,25 @@ class VectorCacheBuffer : public VectorBuffer { idx_t capacity; }; -VectorCache::VectorCache(Allocator &allocator, const LogicalType &type_p, idx_t capacity_p) { +VectorCache::VectorCache() : buffer(nullptr) { +} + +VectorCache::VectorCache(Allocator &allocator, const LogicalType &type_p, const idx_t capacity_p) { buffer = make_buffer (allocator, type_p, capacity_p); } void VectorCache::ResetFromCache(Vector &result) const { - D_ASSERT(buffer); - auto &vcache = buffer->Cast (); - vcache.ResetFromCache(result, buffer); + if (!buffer) { + return; + } + auto &vector_cache = buffer->Cast (); + vector_cache.ResetFromCache(result, buffer); } const LogicalType &VectorCache::GetType() const { - auto &vcache = buffer->Cast (); - return vcache.GetType(); + D_ASSERT(buffer); + auto &vector_cache = buffer->Cast (); + return vector_cache.GetType(); } } // namespace duckdb diff --git a/src/duckdb/src/common/vector_operations/comparison_operators.cpp b/src/duckdb/src/common/vector_operations/comparison_operators.cpp index c66288d8..8a56cdfc 100644 --- a/src/duckdb/src/common/vector_operations/comparison_operators.cpp +++ b/src/duckdb/src/common/vector_operations/comparison_operators.cpp @@ -167,6 +167,9 @@ static void NestedComparisonExecutor(Vector &left, Vector &right, Vector &result auto &result_validity = ConstantVector::Validity(result); SelectionVector true_sel(1); auto match_count = ComparisonSelector::Select (left, right, nullptr, 1, &true_sel, nullptr, result_validity); + // since we are dealing with nested types where the values are not NULL, the result is always valid (i.e true or + // false) + result_validity.SetAllValid(1); auto result_data = ConstantVector::GetData (result); result_data[0] = match_count > 0; return; @@ -182,6 +185,10 @@ static void NestedComparisonExecutor(Vector &left, Vector &right, Vector &result if (!leftv.validity.AllValid() || !rightv.validity.AllValid()) { ComparesNotNull(leftv, rightv, result_validity, count); } + ValidityMask original_mask; + original_mask.SetAllValid(count); + original_mask.Copy(result_validity, count); + SelectionVector true_sel(count); SelectionVector false_sel(count); idx_t match_count = @@ -190,12 +197,19 @@ static void NestedComparisonExecutor(Vector &left, Vector &right, Vector &result for (idx_t i = 0; i < match_count; ++i) { const auto idx = true_sel.get_index(i); result_data[idx] = true; + // if the row was valid during the null check, set it to valid here as well + if (original_mask.RowIsValid(idx)) { + result_validity.SetValid(idx); + } } const idx_t no_match_count = count - match_count; for (idx_t i = 0; i < no_match_count; ++i) { const auto idx = false_sel.get_index(i); result_data[idx] = false; + if (original_mask.RowIsValid(idx)) { + result_validity.SetValid(idx); + } } } diff --git a/src/duckdb/src/core_functions/aggregate/distributive/bitstring_agg.cpp b/src/duckdb/src/core_functions/aggregate/distributive/bitstring_agg.cpp index 36920a47..f01cc50a 100644 --- a/src/duckdb/src/core_functions/aggregate/distributive/bitstring_agg.cpp +++ b/src/duckdb/src/core_functions/aggregate/distributive/bitstring_agg.cpp @@ -8,6 +8,8 @@ #include "duckdb/execution/expression_executor.hpp" #include "duckdb/common/types/cast_helpers.hpp" #include "duckdb/common/operator/subtract.hpp" +#include "duckdb/common/serializer/deserializer.hpp" +#include "duckdb/common/serializer/serializer.hpp" namespace duckdb { @@ -43,6 +45,21 @@ struct BitstringAggBindData : public FunctionData { } return false; } + + static void Serialize(Serializer &serializer, const optional_ptr bind_data_p, + const AggregateFunction &) { + auto &bind_data = bind_data_p->Cast (); + serializer.WriteProperty(100, "min", bind_data.min); + serializer.WriteProperty(101, "max", bind_data.max); + } + + static unique_ptr Deserialize(Deserializer &deserializer, AggregateFunction &) { + Value min; + Value max; + deserializer.ReadProperty(100, "min", min); + deserializer.ReadProperty(101, "max", max); + return make_uniq (min, max); + } }; struct BitStringAggOperation { @@ -247,7 +264,9 @@ static void BindBitString(AggregateFunctionSet &bitstring_agg, const LogicalType auto function = AggregateFunction::UnaryAggregateDestructor , TYPE, string_t, BitStringAggOperation>( type, LogicalType::BIT); - function.bind = BindBitstringAgg; // create new a 'BitstringAggBindData' + function.bind = BindBitstringAgg; // create new a 'BitstringAggBindData' + function.serialize = BitstringAggBindData::Serialize; + function.deserialize = BitstringAggBindData::Deserialize; function.statistics = BitstringPropagateStats; // stores min and max from column stats in BitstringAggBindData bitstring_agg.AddFunction(function); // uses the BitstringAggBindData to access statistics for creating bitstring function.arguments = {type, type, type}; diff --git a/src/duckdb/src/core_functions/aggregate/distributive/minmax.cpp b/src/duckdb/src/core_functions/aggregate/distributive/minmax.cpp index dba09e5a..9642da15 100644 --- a/src/duckdb/src/core_functions/aggregate/distributive/minmax.cpp +++ b/src/duckdb/src/core_functions/aggregate/distributive/minmax.cpp @@ -315,8 +315,8 @@ static AggregateFunction GetMinMaxOperator(const LogicalType &type) { auto internal_type = type.InternalType(); switch (internal_type) { case PhysicalType::VARCHAR: - return AggregateFunction::UnaryAggregateDestructor (type.id(), - type.id()); + return AggregateFunction::UnaryAggregateDestructor (type, + type); case PhysicalType::LIST: case PhysicalType::STRUCT: case PhysicalType::ARRAY: diff --git a/src/duckdb/src/core_functions/aggregate/holistic/approx_top_k.cpp b/src/duckdb/src/core_functions/aggregate/holistic/approx_top_k.cpp index 19b3ae88..b1cf41e0 100644 --- a/src/duckdb/src/core_functions/aggregate/holistic/approx_top_k.cpp +++ b/src/duckdb/src/core_functions/aggregate/holistic/approx_top_k.cpp @@ -48,7 +48,7 @@ struct ApproxTopKValue { uint32_t capacity = 0; }; -struct ApproxTopKState { +struct InternalApproxTopKState { // the top-k data structure has two components // a list of k values sorted on "count" (i.e. values[0] has the lowest count) // a lookup map: string_t -> idx in "values" array @@ -169,15 +169,34 @@ struct ApproxTopKState { } }; +struct ApproxTopKState { + InternalApproxTopKState *state; + + InternalApproxTopKState &GetState() { + if (!state) { + state = new InternalApproxTopKState(); + } + return *state; + } + + const InternalApproxTopKState &GetState() const { + if (!state) { + throw InternalException("No state available"); + } + return *state; + } +}; + struct ApproxTopKOperation { template static void Initialize(STATE &state) { - new (&state) STATE(); + state.state = nullptr; } template - static void Operation(STATE &state, const TYPE &input, AggregateInputData &aggr_input, Vector &top_k_vector, + static void Operation(STATE &aggr_state, const TYPE &input, AggregateInputData &aggr_input, Vector &top_k_vector, idx_t offset, idx_t count) { + auto &state = aggr_state.GetState(); if (state.values.empty()) { static constexpr int64_t MAX_APPROX_K = 1000000; // not initialized yet - initialize the K value and set all counters to 0 @@ -208,7 +227,13 @@ struct ApproxTopKOperation { } template - static void Combine(const STATE &source, STATE &target, AggregateInputData &aggr_input) { + static void Combine(const STATE &aggr_source, STATE &aggr_target, AggregateInputData &aggr_input) { + if (!aggr_source.state) { + // source state is empty + return; + } + auto &source = aggr_source.GetState(); + auto &target = aggr_target.GetState(); if (source.values.empty()) { // source is empty return; @@ -279,7 +304,7 @@ struct ApproxTopKOperation { template static void Destroy(STATE &state, AggregateInputData &aggr_input_data) { - state.~STATE(); + delete state.state; } static bool IgnoreNull() { @@ -324,7 +349,7 @@ static void ApproxTopKFinalize(Vector &state_vector, AggregateInputData &, Vecto idx_t new_entries = 0; // figure out how much space we need for (idx_t i = 0; i < count; i++) { - auto &state = *states[sdata.sel->get_index(i)]; + auto &state = states[sdata.sel->get_index(i)]->GetState(); if (state.values.empty()) { continue; } @@ -340,7 +365,7 @@ static void ApproxTopKFinalize(Vector &state_vector, AggregateInputData &, Vecto idx_t current_offset = old_len; for (idx_t i = 0; i < count; i++) { const auto rid = i + offset; - auto &state = *states[sdata.sel->get_index(i)]; + auto &state = states[sdata.sel->get_index(i)]->GetState(); if (state.values.empty()) { mask.SetInvalid(rid); continue; diff --git a/src/duckdb/src/core_functions/function_list.cpp b/src/duckdb/src/core_functions/function_list.cpp index c01d3e85..ca77e030 100644 --- a/src/duckdb/src/core_functions/function_list.cpp +++ b/src/duckdb/src/core_functions/function_list.cpp @@ -52,7 +52,6 @@ static const StaticFunctionDefinition internal_functions[] = { DUCKDB_SCALAR_FUNCTION_SET(BitwiseAndFun), DUCKDB_SCALAR_FUNCTION_ALIAS(ListHasAnyFunAlias), DUCKDB_SCALAR_FUNCTION(PowOperatorFun), - DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ListNegativeInnerProductFunAlias), DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ListDistanceFunAlias), DUCKDB_SCALAR_FUNCTION_SET(LeftShiftFun), DUCKDB_SCALAR_FUNCTION_SET_ALIAS(ListCosineDistanceFunAlias), @@ -117,7 +116,7 @@ static const StaticFunctionDefinition internal_functions[] = { DUCKDB_AGGREGATE_FUNCTION_SET(BitOrFun), DUCKDB_SCALAR_FUNCTION(BitPositionFun), DUCKDB_AGGREGATE_FUNCTION_SET(BitXorFun), - DUCKDB_SCALAR_FUNCTION(BitStringFun), + DUCKDB_SCALAR_FUNCTION_SET(BitStringFun), DUCKDB_AGGREGATE_FUNCTION_SET(BitstringAggFun), DUCKDB_AGGREGATE_FUNCTION(BoolAndFun), DUCKDB_AGGREGATE_FUNCTION(BoolOrFun), diff --git a/src/duckdb/src/core_functions/scalar/bit/bitstring.cpp b/src/duckdb/src/core_functions/scalar/bit/bitstring.cpp index fc176885..9a9a5eae 100644 --- a/src/duckdb/src/core_functions/scalar/bit/bitstring.cpp +++ b/src/duckdb/src/core_functions/scalar/bit/bitstring.cpp @@ -7,28 +7,46 @@ namespace duckdb { //===--------------------------------------------------------------------===// // BitStringFunction //===--------------------------------------------------------------------===// +template static void BitStringFunction(DataChunk &args, ExpressionState &state, Vector &result) { BinaryExecutor::Execute ( args.data[0], args.data[1], result, args.size(), [&](string_t input, int32_t n) { if (n < 0) { throw InvalidInputException("The bitstring length cannot be negative"); } - if (idx_t(n) < input.GetSize()) { + idx_t input_length; + if (FROM_STRING) { + input_length = input.GetSize(); + } else { + input_length = Bit::BitLength(input); + } + if (idx_t(n) < input_length) { throw InvalidInputException("Length must be equal or larger than input string"); } idx_t len; - Bit::TryGetBitStringSize(input, len, nullptr); // string verification + if (FROM_STRING) { + Bit::TryGetBitStringSize(input, len, nullptr); // string verification + } len = Bit::ComputeBitstringLen(UnsafeNumericCast (n)); string_t target = StringVector::EmptyString(result, len); - Bit::BitString(input, UnsafeNumericCast (n), target); + if (FROM_STRING) { + Bit::BitString(input, UnsafeNumericCast (n), target); + } else { + Bit::ExtendBitString(input, UnsafeNumericCast (n), target); + } target.Finalize(); return target; }); } -ScalarFunction BitStringFun::GetFunction() { - return ScalarFunction({LogicalType::VARCHAR, LogicalType::INTEGER}, LogicalType::BIT, BitStringFunction); +ScalarFunctionSet BitStringFun::GetFunctions() { + ScalarFunctionSet bitstring; + bitstring.AddFunction( + ScalarFunction({LogicalType::VARCHAR, LogicalType::INTEGER}, LogicalType::BIT, BitStringFunction )); + bitstring.AddFunction( + ScalarFunction({LogicalType::BIT, LogicalType::INTEGER}, LogicalType::BIT, BitStringFunction )); + return bitstring; } //===--------------------------------------------------------------------===// diff --git a/src/duckdb/src/core_functions/scalar/date/date_diff.cpp b/src/duckdb/src/core_functions/scalar/date/date_diff.cpp index 6266dda3..36376a2b 100644 --- a/src/duckdb/src/core_functions/scalar/date/date_diff.cpp +++ b/src/duckdb/src/core_functions/scalar/date/date_diff.cpp @@ -28,6 +28,14 @@ struct DateDiff { }); } + // We need to truncate down, not towards 0 + static inline int64_t Truncate(int64_t value, int64_t units) { + return (value + (value < 0)) / units - (value < 0); + } + static inline int64_t Diff(int64_t start, int64_t end, int64_t units) { + return Truncate(end, units) - Truncate(start, units); + } + struct YearOperator { template static inline TR Operation(TA startdate, TB enddate) { @@ -204,30 +212,28 @@ template <> int64_t DateDiff::MillisecondsOperator::Operation(timestamp_t startdate, timestamp_t enddate) { D_ASSERT(Timestamp::IsFinite(startdate)); D_ASSERT(Timestamp::IsFinite(enddate)); - return Timestamp::GetEpochMs(enddate) - Timestamp::GetEpochMs(startdate); + return Diff(startdate.value, enddate.value, Interval::MICROS_PER_MSEC); } template <> int64_t DateDiff::SecondsOperator::Operation(timestamp_t startdate, timestamp_t enddate) { D_ASSERT(Timestamp::IsFinite(startdate)); D_ASSERT(Timestamp::IsFinite(enddate)); - return Timestamp::GetEpochSeconds(enddate) - Timestamp::GetEpochSeconds(startdate); + return Diff(startdate.value, enddate.value, Interval::MICROS_PER_SEC); } template <> int64_t DateDiff::MinutesOperator::Operation(timestamp_t startdate, timestamp_t enddate) { D_ASSERT(Timestamp::IsFinite(startdate)); D_ASSERT(Timestamp::IsFinite(enddate)); - return Timestamp::GetEpochSeconds(enddate) / Interval::SECS_PER_MINUTE - - Timestamp::GetEpochSeconds(startdate) / Interval::SECS_PER_MINUTE; + return Diff(startdate.value, enddate.value, Interval::MICROS_PER_MINUTE); } template <> int64_t DateDiff::HoursOperator::Operation(timestamp_t startdate, timestamp_t enddate) { D_ASSERT(Timestamp::IsFinite(startdate)); D_ASSERT(Timestamp::IsFinite(enddate)); - return Timestamp::GetEpochSeconds(enddate) / Interval::SECS_PER_HOUR - - Timestamp::GetEpochSeconds(startdate) / Interval::SECS_PER_HOUR; + return Diff(startdate.value, enddate.value, Interval::MICROS_PER_HOUR); } // TIME specialisations diff --git a/src/duckdb/src/core_functions/scalar/date/date_part.cpp b/src/duckdb/src/core_functions/scalar/date/date_part.cpp index c234e1e3..ebe65f78 100644 --- a/src/duckdb/src/core_functions/scalar/date/date_part.cpp +++ b/src/duckdb/src/core_functions/scalar/date/date_part.cpp @@ -412,7 +412,7 @@ struct DatePart { D_ASSERT(input.ColumnCount() == 1); UnaryExecutor::Execute (input.data[0], result, input.size(), [&](int64_t input) { - // milisecond amounts provided to epoch_ms should never be considered infinite + // millisecond amounts provided to epoch_ms should never be considered infinite // instead such values will just throw when converted to microseconds return Timestamp::FromEpochMsPossiblyInfinite(input); }); diff --git a/src/duckdb/src/execution/expression_executor.cpp b/src/duckdb/src/execution/expression_executor.cpp index 716672d8..458348be 100644 --- a/src/duckdb/src/execution/expression_executor.cpp +++ b/src/duckdb/src/execution/expression_executor.cpp @@ -170,10 +170,16 @@ unique_ptr ExpressionExecutor::InitializeState(const Expression void ExpressionExecutor::Execute(const Expression &expr, ExpressionState *state, const SelectionVector *sel, idx_t count, Vector &result) { #ifdef DEBUG - // the result vector has to be used for the first time or has to be reset - // otherwise, the validity mask might contain previous (now incorrect) data + // The result vector must be used for the first time, or must be reset. + // Otherwise, the validity mask can contain previous (now incorrect) data. if (result.GetVectorType() == VectorType::FLAT_VECTOR) { - D_ASSERT(FlatVector::Validity(result).CheckAllValid(count)); + + // We do not initialize vector caches for these expressions. + if (expr.GetExpressionClass() != ExpressionClass::BOUND_REF && + expr.GetExpressionClass() != ExpressionClass::BOUND_CONSTANT && + expr.GetExpressionClass() != ExpressionClass::BOUND_PARAMETER) { + D_ASSERT(FlatVector::Validity(result).CheckAllValid(count)); + } } #endif diff --git a/src/duckdb/src/execution/expression_executor/execute_between.cpp b/src/duckdb/src/execution/expression_executor/execute_between.cpp index ca7d45f7..95ff4507 100644 --- a/src/duckdb/src/execution/expression_executor/execute_between.cpp +++ b/src/duckdb/src/execution/expression_executor/execute_between.cpp @@ -89,9 +89,10 @@ static idx_t BetweenLoopTypeSwitch(Vector &input, Vector &lower, Vector &upper, unique_ptr ExpressionExecutor::InitializeState(const BoundBetweenExpression &expr, ExpressionExecutorState &root) { auto result = make_uniq (expr, root); - result->AddChild(expr.input.get()); - result->AddChild(expr.lower.get()); - result->AddChild(expr.upper.get()); + result->AddChild(*expr.input); + result->AddChild(*expr.lower); + result->AddChild(*expr.upper); + result->Finalize(); return result; } diff --git a/src/duckdb/src/execution/expression_executor/execute_case.cpp b/src/duckdb/src/execution/expression_executor/execute_case.cpp index 37d50af5..cdeae311 100644 --- a/src/duckdb/src/execution/expression_executor/execute_case.cpp +++ b/src/duckdb/src/execution/expression_executor/execute_case.cpp @@ -18,10 +18,11 @@ unique_ptr ExpressionExecutor::InitializeState(const BoundCaseE ExpressionExecutorState &root) { auto result = make_uniq (expr, root); for (auto &case_check : expr.case_checks) { - result->AddChild(case_check.when_expr.get()); - result->AddChild(case_check.then_expr.get()); + result->AddChild(*case_check.when_expr); + result->AddChild(*case_check.then_expr); } - result->AddChild(expr.else_expr.get()); + result->AddChild(*expr.else_expr); + result->Finalize(); return std::move(result); } diff --git a/src/duckdb/src/execution/expression_executor/execute_cast.cpp b/src/duckdb/src/execution/expression_executor/execute_cast.cpp index 688ffbb9..c0cca588 100644 --- a/src/duckdb/src/execution/expression_executor/execute_cast.cpp +++ b/src/duckdb/src/execution/expression_executor/execute_cast.cpp @@ -8,8 +8,9 @@ namespace duckdb { unique_ptr ExpressionExecutor::InitializeState(const BoundCastExpression &expr, ExpressionExecutorState &root) { auto result = make_uniq (expr, root); - result->AddChild(expr.child.get()); + result->AddChild(*expr.child); result->Finalize(); + if (expr.bound_cast.init_local_state) { CastLocalStateParameters parameters(root.executor->GetContext(), expr.bound_cast.cast_data); result->local_state = expr.bound_cast.init_local_state(parameters); diff --git a/src/duckdb/src/execution/expression_executor/execute_comparison.cpp b/src/duckdb/src/execution/expression_executor/execute_comparison.cpp index 58a4e480..949bc7ab 100644 --- a/src/duckdb/src/execution/expression_executor/execute_comparison.cpp +++ b/src/duckdb/src/execution/expression_executor/execute_comparison.cpp @@ -12,8 +12,9 @@ namespace duckdb { unique_ptr ExpressionExecutor::InitializeState(const BoundComparisonExpression &expr, ExpressionExecutorState &root) { auto result = make_uniq (expr, root); - result->AddChild(expr.left.get()); - result->AddChild(expr.right.get()); + result->AddChild(*expr.left); + result->AddChild(*expr.right); + result->Finalize(); return result; } diff --git a/src/duckdb/src/execution/expression_executor/execute_conjunction.cpp b/src/duckdb/src/execution/expression_executor/execute_conjunction.cpp index 37161cfd..8ea55d63 100644 --- a/src/duckdb/src/execution/expression_executor/execute_conjunction.cpp +++ b/src/duckdb/src/execution/expression_executor/execute_conjunction.cpp @@ -18,8 +18,9 @@ unique_ptr ExpressionExecutor::InitializeState(const BoundConju ExpressionExecutorState &root) { auto result = make_uniq (expr, root); for (auto &child : expr.children) { - result->AddChild(child.get()); + result->AddChild(*child); } + result->Finalize(); return std::move(result); } diff --git a/src/duckdb/src/execution/expression_executor/execute_function.cpp b/src/duckdb/src/execution/expression_executor/execute_function.cpp index 0a7d3261..7fe9df2f 100644 --- a/src/duckdb/src/execution/expression_executor/execute_function.cpp +++ b/src/duckdb/src/execution/expression_executor/execute_function.cpp @@ -14,8 +14,9 @@ unique_ptr ExpressionExecutor::InitializeState(const BoundFunct ExpressionExecutorState &root) { auto result = make_uniq (expr, root); for (auto &child : expr.children) { - result->AddChild(child.get()); + result->AddChild(*child); } + result->Finalize(); if (expr.function.init_local_state) { result->local_state = expr.function.init_local_state(*result, expr, expr.bind_info.get()); diff --git a/src/duckdb/src/execution/expression_executor/execute_operator.cpp b/src/duckdb/src/execution/expression_executor/execute_operator.cpp index f357ff9c..7db87478 100644 --- a/src/duckdb/src/execution/expression_executor/execute_operator.cpp +++ b/src/duckdb/src/execution/expression_executor/execute_operator.cpp @@ -8,8 +8,9 @@ unique_ptr ExpressionExecutor::InitializeState(const BoundOpera ExpressionExecutorState &root) { auto result = make_uniq (expr, root); for (auto &child : expr.children) { - result->AddChild(child.get()); + result->AddChild(*child); } + result->Finalize(); return result; } @@ -33,7 +34,7 @@ void ExpressionExecutor::Execute(const BoundOperatorExpression &expr, Expression intermediate.Reference(false_val); // in rhs is a list of constants - // for every child, OR the result of the comparision with the left + // for every child, OR the result of the comparison with the left // to get the overall result. for (idx_t child = 1; child < expr.children.size(); child++) { Vector vector_to_check(expr.children[child]->return_type); diff --git a/src/duckdb/src/execution/expression_executor/execute_reference.cpp b/src/duckdb/src/execution/expression_executor/execute_reference.cpp index 4dac1539..88fdfa63 100644 --- a/src/duckdb/src/execution/expression_executor/execute_reference.cpp +++ b/src/duckdb/src/execution/expression_executor/execute_reference.cpp @@ -6,7 +6,7 @@ namespace duckdb { unique_ptr ExpressionExecutor::InitializeState(const BoundReferenceExpression &expr, ExpressionExecutorState &root) { auto result = make_uniq (expr, root); - result->Finalize(true); + result->Finalize(); return result; } diff --git a/src/duckdb/src/execution/expression_executor_state.cpp b/src/duckdb/src/execution/expression_executor_state.cpp index 44161f94..070a399d 100644 --- a/src/duckdb/src/execution/expression_executor_state.cpp +++ b/src/duckdb/src/execution/expression_executor_state.cpp @@ -6,20 +6,22 @@ namespace duckdb { -void ExpressionState::AddChild(Expression *expr) { - types.push_back(expr->return_type); - child_states.push_back(ExpressionExecutor::InitializeState(*expr, root)); +void ExpressionState::AddChild(Expression &child_expr) { + types.push_back(child_expr.return_type); + auto child_state = ExpressionExecutor::InitializeState(child_expr, root); + child_states.push_back(std::move(child_state)); + + auto expr_class = child_expr.GetExpressionClass(); + auto initialize_child = expr_class != ExpressionClass::BOUND_REF && expr_class != ExpressionClass::BOUND_CONSTANT && + expr_class != ExpressionClass::BOUND_PARAMETER; + initialize.push_back(initialize_child); } -void ExpressionState::Finalize(bool empty) { +void ExpressionState::Finalize() { if (types.empty()) { return; } - if (empty) { - intermediate_chunk.InitializeEmpty(types); - } else { - intermediate_chunk.Initialize(GetAllocator(), types); - } + intermediate_chunk.Initialize(GetAllocator(), types, initialize); } Allocator &ExpressionState::GetAllocator() { diff --git a/src/duckdb/src/execution/index/art/fixed_size_allocator.cpp b/src/duckdb/src/execution/index/art/fixed_size_allocator.cpp new file mode 100644 index 00000000..ac1526e2 --- /dev/null +++ b/src/duckdb/src/execution/index/art/fixed_size_allocator.cpp @@ -0,0 +1,238 @@ +#include "duckdb/execution/index/art/fixed_size_allocator.hpp" + +namespace duckdb { + +constexpr idx_t FixedSizeAllocator::BASE[]; +constexpr uint8_t FixedSizeAllocator::SHIFT[]; + +FixedSizeAllocator::FixedSizeAllocator(const idx_t allocation_size, Allocator &allocator) + : allocation_size(allocation_size), total_allocations(0), allocator(allocator) { + + // calculate how many allocations fit into one buffer + + idx_t bits_per_value = sizeof(validity_t) * 8; + idx_t curr_alloc_size = 0; + + bitmask_count = 0; + allocations_per_buffer = 0; + + while (curr_alloc_size < BUFFER_ALLOC_SIZE) { + if (!bitmask_count || (bitmask_count * bits_per_value) % allocations_per_buffer == 0) { + bitmask_count++; + curr_alloc_size += sizeof(validity_t); + } + + auto remaining_alloc_size = BUFFER_ALLOC_SIZE - curr_alloc_size; + auto remaining_allocations = MinValue(remaining_alloc_size / allocation_size, bits_per_value); + + if (remaining_allocations == 0) { + break; + } + + allocations_per_buffer += remaining_allocations; + curr_alloc_size += remaining_allocations * allocation_size; + } + + allocation_offset = bitmask_count * sizeof(validity_t); +} + +FixedSizeAllocator::~FixedSizeAllocator() { + for (auto &buffer : buffers) { + allocator.FreeData(buffer.ptr, BUFFER_ALLOC_SIZE); + } +} + +Node FixedSizeAllocator::New() { + + // no more free pointers + if (buffers_with_free_space.empty()) { + + // add a new buffer + idx_t buffer_id = buffers.size(); + D_ASSERT(buffer_id <= (uint32_t)DConstants::INVALID_INDEX); + auto buffer = allocator.AllocateData(BUFFER_ALLOC_SIZE); + buffers.emplace_back(buffer, 0); + buffers_with_free_space.insert(buffer_id); + + // set the bitmask + ValidityMask mask(reinterpret_cast (buffer)); + mask.SetAllValid(allocations_per_buffer); + } + + // return a pointer + D_ASSERT(!buffers_with_free_space.empty()); + auto buffer_id = (uint32_t)*buffers_with_free_space.begin(); + + auto bitmask_ptr = reinterpret_cast (buffers[buffer_id].ptr); + ValidityMask mask(bitmask_ptr); + auto offset = GetOffset(mask, buffers[buffer_id].allocation_count); + + buffers[buffer_id].allocation_count++; + total_allocations++; + if (buffers[buffer_id].allocation_count == allocations_per_buffer) { + buffers_with_free_space.erase(buffer_id); + } + + return Node(buffer_id, offset); +} + +void FixedSizeAllocator::Free(const Node ptr) { + auto bitmask_ptr = reinterpret_cast (buffers[ptr.GetBufferId()].ptr); + ValidityMask mask(bitmask_ptr); + D_ASSERT(!mask.RowIsValid(ptr.GetOffset())); + mask.SetValid(ptr.GetOffset()); + buffers_with_free_space.insert(ptr.GetBufferId()); + + D_ASSERT(total_allocations > 0); + D_ASSERT(buffers[ptr.GetBufferId()].allocation_count > 0); + buffers[ptr.GetBufferId()].allocation_count--; + total_allocations--; +} + +void FixedSizeAllocator::Reset() { + + for (auto &buffer : buffers) { + allocator.FreeData(buffer.ptr, BUFFER_ALLOC_SIZE); + } + buffers.clear(); + buffers_with_free_space.clear(); + total_allocations = 0; +} + +void FixedSizeAllocator::Merge(FixedSizeAllocator &other) { + + D_ASSERT(allocation_size == other.allocation_size); + + // remember the buffer count and merge the buffers + idx_t buffer_count = buffers.size(); + for (auto &buffer : other.buffers) { + buffers.push_back(buffer); + } + other.buffers.clear(); + + // merge the buffers with free spaces + for (auto &buffer_id : other.buffers_with_free_space) { + buffers_with_free_space.insert(buffer_id + buffer_count); + } + other.buffers_with_free_space.clear(); + + // add the total allocations + total_allocations += other.total_allocations; +} + +bool FixedSizeAllocator::InitializeVacuum() { + + if (total_allocations == 0) { + Reset(); + return false; + } + + auto total_available_allocations = allocations_per_buffer * buffers.size(); + D_ASSERT(total_available_allocations >= total_allocations); + auto total_free_positions = total_available_allocations - total_allocations; + + // vacuum_count buffers can be freed + auto vacuum_count = total_free_positions / allocations_per_buffer; + + // calculate the vacuum threshold adaptively + D_ASSERT(vacuum_count < buffers.size()); + idx_t memory_usage = GetMemoryUsage(); + idx_t excess_memory_usage = vacuum_count * BUFFER_ALLOC_SIZE; + auto excess_percentage = (double)excess_memory_usage / (double)memory_usage; + auto threshold = (double)VACUUM_THRESHOLD / 100.0; + if (excess_percentage < threshold) { + return false; + } + + min_vacuum_buffer_id = buffers.size() - vacuum_count; + + // remove all invalid buffers from the available buffer list to ensure that we do not reuse them + auto it = buffers_with_free_space.begin(); + while (it != buffers_with_free_space.end()) { + if (*it >= min_vacuum_buffer_id) { + it = buffers_with_free_space.erase(it); + } else { + it++; + } + } + + return true; +} + +void FixedSizeAllocator::FinalizeVacuum() { + + // free all (now unused) buffers + while (min_vacuum_buffer_id < buffers.size()) { + allocator.FreeData(buffers.back().ptr, BUFFER_ALLOC_SIZE); + buffers.pop_back(); + } +} + +Node FixedSizeAllocator::VacuumPointer(const Node ptr) { + + // we do not need to adjust the bitmask of the old buffer, because we will free the entire + // buffer after the vacuum operation + + auto new_ptr = New(); + + // new increases the allocation count + total_allocations--; + + memcpy(Get(new_ptr), Get(ptr), allocation_size); + return new_ptr; +} + +void FixedSizeAllocator::Verify() const { +#ifdef DEBUG + auto total_available_allocations = allocations_per_buffer * buffers.size(); + D_ASSERT(total_available_allocations >= total_allocations); + D_ASSERT(buffers.size() >= buffers_with_free_space.size()); +#endif +} + +uint32_t FixedSizeAllocator::GetOffset(ValidityMask &mask, const idx_t allocation_count) { + + auto data = mask.GetData(); + + // fills up a buffer sequentially before searching for free bits + if (mask.RowIsValid(allocation_count)) { + mask.SetInvalid(allocation_count); + return allocation_count; + } + + // get an entry with free bits + for (idx_t entry_idx = 0; entry_idx < bitmask_count; entry_idx++) { + if (data[entry_idx] != 0) { + + // find the position of the free bit + auto entry = data[entry_idx]; + idx_t first_valid_bit = 0; + + // this loop finds the position of the rightmost set bit in entry and stores it + // in first_valid_bit + for (idx_t i = 0; i < 6; i++) { + // set the left half of the bits of this level to zero and test if the entry is still not zero + if (entry & BASE[i]) { + // first valid bit is in the rightmost s[i] bits + // permanently set the left half of the bits to zero + entry &= BASE[i]; + } else { + // first valid bit is in the leftmost s[i] bits + // shift by s[i] for the next iteration and add s[i] to the position of the rightmost set bit + entry >>= SHIFT[i]; + first_valid_bit += SHIFT[i]; + } + } + D_ASSERT(entry); + + auto prev_bits = entry_idx * sizeof(validity_t) * 8; + D_ASSERT(mask.RowIsValid(prev_bits + first_valid_bit)); + mask.SetInvalid(prev_bits + first_valid_bit); + return (prev_bits + first_valid_bit); + } + } + + throw InternalException("Invalid bitmask of FixedSizeAllocator"); +} + +} // namespace duckdb diff --git a/src/duckdb/src/execution/index/art/plan_art.cpp b/src/duckdb/src/execution/index/art/plan_art.cpp new file mode 100644 index 00000000..2acc5699 --- /dev/null +++ b/src/duckdb/src/execution/index/art/plan_art.cpp @@ -0,0 +1,94 @@ + +#include "duckdb/execution/operator/order/physical_order.hpp" +#include "duckdb/execution/operator/projection/physical_projection.hpp" +#include "duckdb/execution/operator/filter/physical_filter.hpp" +#include "duckdb/execution/operator/schema/physical_create_art_index.hpp" + +#include "duckdb/planner/expression/bound_operator_expression.hpp" +#include "duckdb/planner/expression/bound_reference_expression.hpp" +#include "duckdb/planner/operator/logical_create_index.hpp" + +#include "duckdb/execution/index/art/art.hpp" + +namespace duckdb { + +unique_ptr ART::CreatePlan(PlanIndexInput &input) { + // generate a physical plan for the parallel index creation which consists of the following operators + // table scan - projection (for expression execution) - filter (NOT NULL) - order (if applicable) - create index + + auto &op = input.op; + auto &table_scan = input.table_scan; + + vector new_column_types; + vector > select_list; + for (idx_t i = 0; i < op.expressions.size(); i++) { + new_column_types.push_back(op.expressions[i]->return_type); + select_list.push_back(std::move(op.expressions[i])); + } + new_column_types.emplace_back(LogicalType::ROW_TYPE); + select_list.push_back(make_uniq (LogicalType::ROW_TYPE, op.info->scan_types.size() - 1)); + + auto projection = make_uniq (new_column_types, std::move(select_list), op.estimated_cardinality); + projection->children.push_back(std::move(table_scan)); + + // filter operator for IS_NOT_NULL on each key column + + vector filter_types; + vector > filter_select_list; + + for (idx_t i = 0; i < new_column_types.size() - 1; i++) { + filter_types.push_back(new_column_types[i]); + auto is_not_null_expr = + make_uniq (ExpressionType::OPERATOR_IS_NOT_NULL, LogicalType::BOOLEAN); + auto bound_ref = make_uniq (new_column_types[i], i); + is_not_null_expr->children.push_back(std::move(bound_ref)); + filter_select_list.push_back(std::move(is_not_null_expr)); + } + + auto null_filter = + make_uniq (std::move(filter_types), std::move(filter_select_list), op.estimated_cardinality); + null_filter->types.emplace_back(LogicalType::ROW_TYPE); + null_filter->children.push_back(std::move(projection)); + + // determine if we sort the data prior to index creation + // we don't sort, if either VARCHAR or compound key + auto perform_sorting = true; + if (op.unbound_expressions.size() > 1) { + perform_sorting = false; + } else if (op.unbound_expressions[0]->return_type.InternalType() == PhysicalType::VARCHAR) { + perform_sorting = false; + } + + // actual physical create index operator + + auto physical_create_index = + make_uniq (op, op.table, op.info->column_ids, std::move(op.info), + std::move(op.unbound_expressions), op.estimated_cardinality, perform_sorting); + + if (perform_sorting) { + + // optional order operator + vector orders; + vector projections; + for (idx_t i = 0; i < new_column_types.size() - 1; i++) { + auto col_expr = make_uniq_base (new_column_types[i], i); + orders.emplace_back(OrderType::ASCENDING, OrderByNullType::NULLS_FIRST, std::move(col_expr)); + projections.emplace_back(i); + } + projections.emplace_back(new_column_types.size() - 1); + + auto physical_order = make_uniq (new_column_types, std::move(orders), std::move(projections), + op.estimated_cardinality); + physical_order->children.push_back(std::move(null_filter)); + + physical_create_index->children.push_back(std::move(physical_order)); + } else { + + // no ordering + physical_create_index->children.push_back(std::move(null_filter)); + } + + return std::move(physical_create_index); +} + +} // namespace duckdb diff --git a/src/duckdb/src/execution/index/index_type_set.cpp b/src/duckdb/src/execution/index/index_type_set.cpp index 4e1dda7e..4fe7cda4 100644 --- a/src/duckdb/src/execution/index/index_type_set.cpp +++ b/src/duckdb/src/execution/index/index_type_set.cpp @@ -5,10 +5,13 @@ namespace duckdb { IndexTypeSet::IndexTypeSet() { - // Register the ART index type + + // Register the ART index type by default IndexType art_index_type; art_index_type.name = ART::TYPE_NAME; art_index_type.create_instance = ART::Create; + art_index_type.create_plan = ART::CreatePlan; + RegisterIndexType(art_index_type); } diff --git a/src/duckdb/src/execution/join_hashtable.cpp b/src/duckdb/src/execution/join_hashtable.cpp index e19d2a7e..095745c3 100644 --- a/src/duckdb/src/execution/join_hashtable.cpp +++ b/src/duckdb/src/execution/join_hashtable.cpp @@ -453,23 +453,21 @@ static inline data_ptr_t InsertRowToEntry(atomic &entry, const data_ // if we expect the entry to be empty, if the operation fails we need to cancel the whole operation as another // key might have been inserted in the meantime that does not match the current key if (EXPECT_EMPTY) { - // add nullptr to the end of the list to mark the end StorePointer(nullptr, row_ptr_to_insert + pointer_offset); ht_entry_t new_empty_entry = ht_entry_t::GetDesiredEntry(row_ptr_to_insert, salt); ht_entry_t expected_empty_entry = ht_entry_t::GetEmptyEntry(); - std::atomic_compare_exchange_weak(&entry, &expected_empty_entry, new_empty_entry); + entry.compare_exchange_strong(expected_empty_entry, new_empty_entry, std::memory_order_acquire, + std::memory_order_relaxed); // if the expected empty entry actually was null, we can just return the pointer, and it will be a nullptr // if the expected entry was filled in the meantime, we need to cancel the operation and will return the // pointer to the next entry return expected_empty_entry.GetPointerOrNull(); - } - - // if we expect the entry to be full, we know that even if the insert fails the keys still match so we can - // just keep trying until we succeed - else { + } else { + // if we expect the entry to be full, we know that even if the insert fails the keys still match so we can + // just keep trying until we succeed ht_entry_t expected_current_entry = entry.load(std::memory_order_relaxed); ht_entry_t desired_new_entry = ht_entry_t::GetDesiredEntry(row_ptr_to_insert, salt); D_ASSERT(expected_current_entry.IsOccupied()); @@ -477,7 +475,8 @@ static inline data_ptr_t InsertRowToEntry(atomic &entry, const data_ do { data_ptr_t current_row_pointer = expected_current_entry.GetPointer(); StorePointer(current_row_pointer, row_ptr_to_insert + pointer_offset); - } while (!std::atomic_compare_exchange_weak(&entry, &expected_current_entry, desired_new_entry)); + } while (!entry.compare_exchange_weak(expected_current_entry, desired_new_entry, std::memory_order_release, + std::memory_order_relaxed)); return nullptr; } diff --git a/src/duckdb/src/execution/operator/aggregate/physical_hash_aggregate.cpp b/src/duckdb/src/execution/operator/aggregate/physical_hash_aggregate.cpp index c4cf4b55..e7d0c756 100644 --- a/src/duckdb/src/execution/operator/aggregate/physical_hash_aggregate.cpp +++ b/src/duckdb/src/execution/operator/aggregate/physical_hash_aggregate.cpp @@ -319,15 +319,17 @@ void PhysicalHashAggregate::SinkDistinctGrouping(ExecutionContext &context, Data for (idx_t group_idx = 0; group_idx < grouped_aggregate_data.groups.size(); group_idx++) { auto &group = grouped_aggregate_data.groups[group_idx]; auto &bound_ref = group->Cast (); - filtered_input.data[bound_ref.index].Reference(chunk.data[bound_ref.index]); + auto &col = filtered_input.data[bound_ref.index]; + col.Reference(chunk.data[bound_ref.index]); + col.Slice(sel_vec, count); } for (idx_t child_idx = 0; child_idx < aggregate.children.size(); child_idx++) { auto &child = aggregate.children[child_idx]; auto &bound_ref = child->Cast (); - - filtered_input.data[bound_ref.index].Reference(chunk.data[bound_ref.index]); + auto &col = filtered_input.data[bound_ref.index]; + col.Reference(chunk.data[bound_ref.index]); + col.Slice(sel_vec, count); } - filtered_input.Slice(sel_vec, count); filtered_input.SetCardinality(count); radix_table.Sink(context, filtered_input, sink_input, empty_chunk, empty_filter); diff --git a/src/duckdb/src/execution/operator/csv_scanner/buffer_manager/csv_buffer_manager.cpp b/src/duckdb/src/execution/operator/csv_scanner/buffer_manager/csv_buffer_manager.cpp index c8a7d167..064595f3 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/buffer_manager/csv_buffer_manager.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/buffer_manager/csv_buffer_manager.cpp @@ -119,15 +119,15 @@ void CSVBufferManager::ResetBuffer(const idx_t buffer_idx) { } } -idx_t CSVBufferManager::GetBufferSize() { +idx_t CSVBufferManager::GetBufferSize() const { return buffer_size; } -idx_t CSVBufferManager::BufferCount() { +idx_t CSVBufferManager::BufferCount() const { return cached_buffers.size(); } -bool CSVBufferManager::Done() { +bool CSVBufferManager::Done() const { return done; } @@ -144,7 +144,7 @@ void CSVBufferManager::ResetBufferManager() { } } -string CSVBufferManager::GetFilePath() { +string CSVBufferManager::GetFilePath() const { return file_path; } diff --git a/src/duckdb/src/execution/operator/csv_scanner/scanner/base_scanner.cpp b/src/duckdb/src/execution/operator/csv_scanner/scanner/base_scanner.cpp index 63e93eda..757598e1 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/scanner/base_scanner.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/scanner/base_scanner.cpp @@ -1,6 +1,6 @@ #include "duckdb/execution/operator/csv_scanner/base_scanner.hpp" -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" #include "duckdb/execution/operator/csv_scanner/skip_scanner.hpp" namespace duckdb { diff --git a/src/duckdb/src/execution/operator/csv_scanner/scanner/csv_schema.cpp b/src/duckdb/src/execution/operator/csv_scanner/scanner/csv_schema.cpp index 5d6a9b0d..139398d7 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/scanner/csv_schema.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/scanner/csv_schema.cpp @@ -60,14 +60,53 @@ bool CSVSchema::Empty() const { return columns.empty(); } -bool CSVSchema::SchemasMatch(string &error_message, vector &names, vector &types, - const string &cur_file_path) { - D_ASSERT(names.size() == types.size()); +bool CSVSchema::SchemasMatch(string &error_message, SnifferResult &sniffer_result, const string &cur_file_path, + bool is_minimal_sniffer) const { + D_ASSERT(sniffer_result.names.size() == sniffer_result.return_types.size()); bool match = true; unordered_map current_schema; - for (idx_t i = 0; i < names.size(); i++) { + + for (idx_t i = 0; i < sniffer_result.names.size(); i++) { // Populate our little schema - current_schema[names[i]] = {types[i], i}; + current_schema[sniffer_result.names[i]] = {sniffer_result.return_types[i], i}; + } + if (is_minimal_sniffer) { + auto min_sniffer = static_cast (sniffer_result); + if (!min_sniffer.more_than_one_row) { + bool min_sniff_match = true; + // If we don't have more than one row, either the names must match or the types must match. + for (auto &column : columns) { + if (current_schema.find(column.name) == current_schema.end()) { + min_sniff_match = false; + break; + } + } + if (min_sniff_match) { + return true; + } + // Otherwise, the types must match. + min_sniff_match = true; + if (sniffer_result.return_types.size() == columns.size()) { + idx_t return_type_idx = 0; + for (auto &column : columns) { + if (column.type != sniffer_result.return_types[return_type_idx++]) { + min_sniff_match = false; + break; + } + } + } else { + min_sniff_match = false; + } + if (min_sniff_match) { + // If we got here, we have the right types but the wrong names, lets fix the names + idx_t sniff_name_idx = 0; + for (auto &column : columns) { + sniffer_result.names[sniff_name_idx++] = column.name; + } + return true; + } + } + // If we got to this point, the minimal sniffer doesn't match, we throw an error. } // Here we check if the schema of a given file matched our original schema // We consider it's not a match if: diff --git a/src/duckdb/src/execution/operator/csv_scanner/scanner/string_value_scanner.cpp b/src/duckdb/src/execution/operator/csv_scanner/scanner/string_value_scanner.cpp index 173ca8a1..9662e849 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/scanner/string_value_scanner.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/scanner/string_value_scanner.cpp @@ -258,7 +258,7 @@ void StringValueResult::AddValueToVector(const char *value_ptr, const idx_t size // We check for a weird case, where we ignore an extra value, if it is a null value return; } - validity_mask[chunk_col_id]->SetInvalid(number_of_rows); + validity_mask[chunk_col_id]->SetInvalid(static_cast (number_of_rows)); } cur_col_id++; chunk_col_id++; @@ -447,7 +447,11 @@ void StringValueResult::AddValueToVector(const char *value_ptr, const idx_t size } DataChunk &StringValueResult::ToChunk() { - parse_chunk.SetCardinality(number_of_rows); + if (number_of_rows < 0) { + throw InternalException("CSVScanner: ToChunk() function. Has a negative number of rows, this indicates an " + "issue with the error handler."); + } + parse_chunk.SetCardinality(static_cast (number_of_rows)); return parse_chunk; } @@ -658,7 +662,7 @@ bool LineError::HandleErrors(StringValueResult &result) { result.RemoveLastLine(); } else { // Otherwise, we add it to the borked rows to remove it later and just cleanup the column variables. - result.borked_rows.insert(result.number_of_rows); + result.borked_rows.insert(static_cast (result.number_of_rows)); result.cur_col_id = 0; result.chunk_col_id = 0; } @@ -740,9 +744,9 @@ bool StringValueResult::AddRowInternal() { } if (current_errors.HandleErrors(*this)) { - line_positions_per_row[number_of_rows] = current_line_position; + line_positions_per_row[static_cast (number_of_rows)] = current_line_position; number_of_rows++; - if (number_of_rows >= result_size) { + if (static_cast (number_of_rows) >= result_size) { // We have a full chunk return true; } @@ -769,7 +773,7 @@ bool StringValueResult::AddRowInternal() { if (empty) { static_cast (vector_ptr[chunk_col_id])[number_of_rows] = string_t(); } else { - validity_mask[chunk_col_id]->SetInvalid(number_of_rows); + validity_mask[chunk_col_id]->SetInvalid(static_cast (number_of_rows)); } cur_col_id++; chunk_col_id++; @@ -799,11 +803,11 @@ bool StringValueResult::AddRowInternal() { RemoveLastLine(); } } - line_positions_per_row[number_of_rows] = current_line_position; + line_positions_per_row[static_cast (number_of_rows)] = current_line_position; cur_col_id = 0; chunk_col_id = 0; number_of_rows++; - if (number_of_rows >= result_size) { + if (static_cast (number_of_rows) >= result_size) { // We have a full chunk return true; } @@ -861,12 +865,12 @@ bool StringValueResult::EmptyLine(StringValueResult &result, const idx_t buffer_ if (empty) { static_cast (result.vector_ptr[0])[result.number_of_rows] = string_t(); } else { - result.validity_mask[0]->SetInvalid(result.number_of_rows); + result.validity_mask[0]->SetInvalid(static_cast (result.number_of_rows)); } result.number_of_rows++; } } - if (result.number_of_rows >= result.result_size) { + if (static_cast (result.number_of_rows) >= result.result_size) { // We have a full chunk return true; } @@ -1043,15 +1047,15 @@ void StringValueScanner::Flush(DataChunk &insert_chunk) { } if (!result.borked_rows.empty()) { // We must remove the borked lines from our chunk - SelectionVector succesful_rows(parse_chunk.size()); + SelectionVector successful_rows(parse_chunk.size()); idx_t sel_idx = 0; for (idx_t row_idx = 0; row_idx < parse_chunk.size(); row_idx++) { if (result.borked_rows.find(row_idx) == result.borked_rows.end()) { - succesful_rows.set_index(sel_idx++, row_idx); + successful_rows.set_index(sel_idx++, row_idx); } } // Now we slice the result - insert_chunk.Slice(succesful_rows, sel_idx); + insert_chunk.Slice(successful_rows, sel_idx); } } @@ -1389,7 +1393,7 @@ void StringValueResult::SkipBOM() const { void StringValueResult::RemoveLastLine() { // potentially de-nullify values for (idx_t i = 0; i < chunk_col_id; i++) { - validity_mask[i]->SetValid(number_of_rows); + validity_mask[i]->SetValid(static_cast (number_of_rows)); } // reset column trackers cur_col_id = 0; @@ -1470,10 +1474,6 @@ void StringValueScanner::SetStart() { } return; } - if (state_machine->options.IgnoreErrors()) { - // If we are ignoring errors we don't really need to figure out a line. - return; - } // The result size of the data after skipping the row is one line // We have to look for a new line that fits our schema // 1. We walk until the next new line @@ -1524,7 +1524,7 @@ void StringValueScanner::SetStart() { } void StringValueScanner::FinalizeChunkProcess() { - if (result.number_of_rows >= result.result_size || iterator.done) { + if (static_cast (result.number_of_rows) >= result.result_size || iterator.done) { // We are done if (!sniffing) { if (csv_file_scan) { @@ -1562,14 +1562,18 @@ void StringValueScanner::FinalizeChunkProcess() { if (result.current_errors.HasErrorType(UNTERMINATED_QUOTES)) { has_unterminated_quotes = true; } - result.current_errors.HandleErrors(result); + if (result.current_errors.HandleErrors(result)) { + result.number_of_rows++; + } } if (states.IsQuotedCurrent() && !has_unterminated_quotes) { // If we finish the execution of a buffer, and we end in a quoted state, it means we have unterminated // quotes result.current_errors.Insert(UNTERMINATED_QUOTES, result.cur_col_id, result.chunk_col_id, result.last_position); - result.current_errors.HandleErrors(result); + if (result.current_errors.HandleErrors(result)) { + result.number_of_rows++; + } } if (!iterator.done) { if (iterator.pos.buffer_pos >= iterator.GetEndPos() || iterator.pos.buffer_idx > iterator.GetBufferIdx() || @@ -1580,9 +1584,9 @@ void StringValueScanner::FinalizeChunkProcess() { } else { // 2) If a boundary is not set // We read until the chunk is complete, or we have nothing else to read. - while (!FinishedFile() && result.number_of_rows < result.result_size) { + while (!FinishedFile() && static_cast (result.number_of_rows) < result.result_size) { MoveToNextBuffer(); - if (result.number_of_rows >= result.result_size) { + if (static_cast (result.number_of_rows) >= result.result_size) { return; } if (cur_buffer_handle) { @@ -1592,7 +1596,7 @@ void StringValueScanner::FinalizeChunkProcess() { iterator.done = FinishedFile(); if (result.null_padding && result.number_of_rows < STANDARD_VECTOR_SIZE && result.chunk_col_id > 0) { while (result.chunk_col_id < result.parse_chunk.ColumnCount()) { - result.validity_mask[result.chunk_col_id++]->SetInvalid(result.number_of_rows); + result.validity_mask[result.chunk_col_id++]->SetInvalid(static_cast (result.number_of_rows)); result.cur_col_id++; } result.number_of_rows++; diff --git a/src/duckdb/src/execution/operator/csv_scanner/sniffer/csv_sniffer.cpp b/src/duckdb/src/execution/operator/csv_scanner/sniffer/csv_sniffer.cpp index bee31f88..950d7489 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/sniffer/csv_sniffer.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/sniffer/csv_sniffer.cpp @@ -1,4 +1,4 @@ -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" #include "duckdb/common/types/value.hpp" namespace duckdb { @@ -41,7 +41,7 @@ void MatchAndReplace(CSVOption &original, CSVOption &sniffed, const string // We verify that the user input matches the sniffed value if (original != sniffed) { error += "CSV Sniffer: Sniffer detected value different than the user input for the " + name; - error += " options \n Set: " + original.FormatValue() + " Sniffed: " + sniffed.FormatValue() + "\n"; + error += " options \n Set: " + original.FormatValue() + ", Sniffed: " + sniffed.FormatValue() + "\n"; } } else { // We replace the value of original with the sniffed value @@ -88,15 +88,14 @@ void CSVSniffer::SetResultOptions() { options.dialect_options.rows_until_header = best_candidate->GetStateMachine().dialect_options.rows_until_header; } -SnifferResult CSVSniffer::MinimalSniff() { +AdaptiveSnifferResult CSVSniffer::MinimalSniff() { if (set_columns.IsSet()) { // Nothing to see here - return SnifferResult(*set_columns.types, *set_columns.names); + return AdaptiveSnifferResult(*set_columns.types, *set_columns.names, true); } // Return Types detected vector return_types; // Column Names detected - vector names; buffer_manager->sniffing = true; constexpr idx_t result_size = 2; @@ -106,7 +105,8 @@ SnifferResult CSVSniffer::MinimalSniff() { ColumnCountScanner count_scanner(buffer_manager, state_machine, error_handler, result_size); auto &sniffed_column_counts = count_scanner.ParseChunk(); if (sniffed_column_counts.result_position == 0) { - return {{}, {}}; + // The file is an empty file, we just return + return {{}, {}, false}; } state_machine->dialect_options.num_cols = sniffed_column_counts[0].number_of_columns; @@ -130,20 +130,20 @@ SnifferResult CSVSniffer::MinimalSniff() { // Possibly Gather Header vector potential_header; - if (start_row != 0) { - for (idx_t col_idx = 0; col_idx < data_chunk.ColumnCount(); col_idx++) { - auto &cur_vector = data_chunk.data[col_idx]; - auto vector_data = FlatVector::GetData (cur_vector); - auto &validity = FlatVector::Validity(cur_vector); - HeaderValue val; - if (validity.RowIsValid(0)) { - val = HeaderValue(vector_data[0]); - } - potential_header.emplace_back(val); + + for (idx_t col_idx = 0; col_idx < data_chunk.ColumnCount(); col_idx++) { + auto &cur_vector = data_chunk.data[col_idx]; + auto vector_data = FlatVector::GetData (cur_vector); + auto &validity = FlatVector::Validity(cur_vector); + HeaderValue val; + if (validity.RowIsValid(0)) { + val = HeaderValue(vector_data[0]); } + potential_header.emplace_back(val); } - names = DetectHeaderInternal(buffer_manager->context, potential_header, *state_machine, set_columns, - best_sql_types_candidates_per_column_idx, options, *error_handler); + + vector names = DetectHeaderInternal(buffer_manager->context, potential_header, *state_machine, set_columns, + best_sql_types_candidates_per_column_idx, options, *error_handler); for (idx_t column_idx = 0; column_idx < best_sql_types_candidates_per_column_idx.size(); column_idx++) { LogicalType d_type = best_sql_types_candidates_per_column_idx[column_idx].back(); @@ -153,10 +153,10 @@ SnifferResult CSVSniffer::MinimalSniff() { detected_types.push_back(d_type); } - return {detected_types, names}; + return {detected_types, names, sniffed_column_counts.result_position > 1}; } -SnifferResult CSVSniffer::AdaptiveSniff(CSVSchema &file_schema) { +SnifferResult CSVSniffer::AdaptiveSniff(const CSVSchema &file_schema) { auto min_sniff_res = MinimalSniff(); bool run_full = error_handler->AnyErrors() || detection_error_handler->AnyErrors(); // Check if we are happy with the result or if we need to do more sniffing @@ -164,8 +164,7 @@ SnifferResult CSVSniffer::AdaptiveSniff(CSVSchema &file_schema) { // If we got no errors, we also run full if schemas do not match. if (!set_columns.IsSet() && !options.file_options.AnySet()) { string error; - run_full = - !file_schema.SchemasMatch(error, min_sniff_res.names, min_sniff_res.return_types, options.file_path); + run_full = !file_schema.SchemasMatch(error, min_sniff_res, options.file_path, true); } } if (run_full) { @@ -173,14 +172,14 @@ SnifferResult CSVSniffer::AdaptiveSniff(CSVSchema &file_schema) { auto full_sniffer = SniffCSV(); if (!set_columns.IsSet() && !options.file_options.AnySet()) { string error; - if (!file_schema.SchemasMatch(error, full_sniffer.names, full_sniffer.return_types, options.file_path) && + if (!file_schema.SchemasMatch(error, full_sniffer, options.file_path, false) && !options.ignore_errors.GetValue()) { throw InvalidInputException(error); } } return full_sniffer; } - return min_sniff_res; + return min_sniff_res.ToSnifferResult(); } SnifferResult CSVSniffer::SniffCSV(bool force_match) { buffer_manager->sniffing = true; @@ -228,8 +227,8 @@ SnifferResult CSVSniffer::SniffCSV(bool force_match) { if (set_names.size() == names.size()) { for (idx_t i = 0; i < set_columns.Size(); i++) { if (set_names[i] != names[i]) { - header_error += "Column at position: " + to_string(i) + " Set name: " + set_names[i] + - " Sniffed Name: " + names[i] + "\n"; + header_error += "Column at position: " + to_string(i) + ", Set name: " + set_names[i] + + ", Sniffed Name: " + names[i] + "\n"; match = false; } } diff --git a/src/duckdb/src/execution/operator/csv_scanner/sniffer/dialect_detection.cpp b/src/duckdb/src/execution/operator/csv_scanner/sniffer/dialect_detection.cpp index bf142a93..43cef4fc 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/sniffer/dialect_detection.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/sniffer/dialect_detection.cpp @@ -1,5 +1,5 @@ #include "duckdb/common/shared_ptr.hpp" -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" #include "duckdb/main/client_data.hpp" #include "duckdb/execution/operator/csv_scanner/csv_reader_options.hpp" @@ -302,6 +302,8 @@ void CSVSniffer::AnalyzeDialectCandidate(unique_ptr scanner, // Whether there are more values (rows) available that are consistent, exceeding the current best. bool more_values = consistent_rows > best_consistent_rows && num_cols >= max_columns_found; + bool more_columns = consistent_rows == best_consistent_rows && num_cols > max_columns_found; + // If additional padding is required when compared to the previous padding count. bool require_more_padding = padding_count > prev_padding_count; @@ -338,10 +340,10 @@ void CSVSniffer::AnalyzeDialectCandidate(unique_ptr scanner, // - There are more values and no additional padding is required. // - There's more than one column and less padding is required. if (rows_consistent && - (single_column_before || (more_values && !require_more_padding) || + (single_column_before || ((more_values || more_columns) && !require_more_padding) || (more_than_one_column && require_less_padding)) && !invalid_padding && comments_are_acceptable) { - if (!candidates.empty() && set_columns.IsSet() && max_columns_found == candidates.size()) { + if (!candidates.empty() && set_columns.IsSet() && max_columns_found == set_columns.Size()) { // We have a candidate that fits our requirements better return; } diff --git a/src/duckdb/src/execution/operator/csv_scanner/sniffer/header_detection.cpp b/src/duckdb/src/execution/operator/csv_scanner/sniffer/header_detection.cpp index fd050400..9475f594 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/sniffer/header_detection.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/sniffer/header_detection.cpp @@ -1,5 +1,5 @@ #include "duckdb/common/types/cast_helpers.hpp" -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" #include "duckdb/execution/operator/csv_scanner/csv_reader_options.hpp" #include "utf8proc.hpp" @@ -114,9 +114,9 @@ bool CSVSniffer::DetectHeaderWithSetColumn(ClientContext &context, vector Error(error); + error_handler->Error(error, true); } // Assert that it's all good at this point. D_ASSERT(best_candidate && !best_format_candidates.empty()); diff --git a/src/duckdb/src/execution/operator/csv_scanner/sniffer/type_refinement.cpp b/src/duckdb/src/execution/operator/csv_scanner/sniffer/type_refinement.cpp index 43d69318..8d3e2684 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/sniffer/type_refinement.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/sniffer/type_refinement.cpp @@ -1,4 +1,4 @@ -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" #include "duckdb/execution/operator/csv_scanner/csv_casting.hpp" namespace duckdb { diff --git a/src/duckdb/src/execution/operator/csv_scanner/sniffer/type_replacement.cpp b/src/duckdb/src/execution/operator/csv_scanner/sniffer/type_replacement.cpp index 34fa4146..a693144d 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/sniffer/type_replacement.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/sniffer/type_replacement.cpp @@ -1,4 +1,4 @@ -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" namespace duckdb { void CSVSniffer::ReplaceTypes() { diff --git a/src/duckdb/src/execution/operator/csv_scanner/state_machine/csv_state_machine.cpp b/src/duckdb/src/execution/operator/csv_scanner/state_machine/csv_state_machine.cpp index 665c5b39..eae140f7 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/state_machine/csv_state_machine.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/state_machine/csv_state_machine.cpp @@ -1,5 +1,5 @@ #include "duckdb/execution/operator/csv_scanner/csv_state_machine.hpp" -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" #include "utf8proc_wrapper.hpp" #include "duckdb/main/error_manager.hpp" #include "duckdb/execution/operator/csv_scanner/csv_state_machine_cache.hpp" diff --git a/src/duckdb/src/execution/operator/csv_scanner/state_machine/csv_state_machine_cache.cpp b/src/duckdb/src/execution/operator/csv_scanner/state_machine/csv_state_machine_cache.cpp index 6c93cc93..9c40809c 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/state_machine/csv_state_machine_cache.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/state_machine/csv_state_machine_cache.cpp @@ -1,6 +1,6 @@ #include "duckdb/execution/operator/csv_scanner/csv_state_machine.hpp" #include "duckdb/execution/operator/csv_scanner/csv_state_machine_cache.hpp" -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" namespace duckdb { @@ -26,10 +26,10 @@ void CSVStateMachineCache::Insert(const CSVStateMachineOptions &state_machine_op switch (cur_state) { case CSVState::QUOTED: case CSVState::QUOTED_NEW_LINE: + case CSVState::ESCAPE: InitializeTransitionArray(transition_array, cur_state, CSVState::QUOTED); break; case CSVState::UNQUOTED: - case CSVState::ESCAPE: InitializeTransitionArray(transition_array, cur_state, CSVState::INVALID); break; case CSVState::COMMENT: diff --git a/src/duckdb/src/execution/operator/csv_scanner/table_function/csv_file_scanner.cpp b/src/duckdb/src/execution/operator/csv_scanner/table_function/csv_file_scanner.cpp index e3589486..3e457580 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/table_function/csv_file_scanner.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/table_function/csv_file_scanner.cpp @@ -1,6 +1,6 @@ #include "duckdb/execution/operator/csv_scanner/csv_file_scanner.hpp" -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" #include "duckdb/execution/operator/csv_scanner/skip_scanner.hpp" #include "duckdb/function/table/read_csv.hpp" diff --git a/src/duckdb/src/execution/operator/csv_scanner/table_function/global_csv_state.cpp b/src/duckdb/src/execution/operator/csv_scanner/table_function/global_csv_state.cpp index 4f3e9dce..cefb1341 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/table_function/global_csv_state.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/table_function/global_csv_state.cpp @@ -1,6 +1,6 @@ #include "duckdb/execution/operator/csv_scanner/global_csv_state.hpp" -#include "duckdb/execution/operator/csv_scanner/csv_sniffer.hpp" +#include "duckdb/execution/operator/csv_scanner/sniffer/csv_sniffer.hpp" #include "duckdb/execution/operator/csv_scanner/scanner_boundary.hpp" #include "duckdb/execution/operator/csv_scanner/skip_scanner.hpp" #include "duckdb/execution/operator/persistent/csv_rejects_table.hpp" diff --git a/src/duckdb/src/execution/operator/csv_scanner/util/csv_reader_options.cpp b/src/duckdb/src/execution/operator/csv_scanner/util/csv_reader_options.cpp index 21f910ec..97fb22a9 100644 --- a/src/duckdb/src/execution/operator/csv_scanner/util/csv_reader_options.cpp +++ b/src/duckdb/src/execution/operator/csv_scanner/util/csv_reader_options.cpp @@ -4,6 +4,7 @@ #include "duckdb/common/string_util.hpp" #include "duckdb/common/enum_util.hpp" #include "duckdb/common/multi_file_reader.hpp" +#include "duckdb/common/set.hpp" namespace duckdb { @@ -404,7 +405,7 @@ string CSVReaderOptions::ToString(const string ¤t_file_path) const { auto &skip_rows = dialect_options.skip_rows; auto &header = dialect_options.header; - string error = " file=" + current_file_path + "\n "; + string error = " file = " + current_file_path + "\n "; // Let's first print options that can either be set by the user or by the sniffer // delimiter error += FormatOptionLine("delimiter", delimiter); @@ -427,13 +428,13 @@ string CSVReaderOptions::ToString(const string ¤t_file_path) const { // Now we do options that can only be set by the user, that might hold some general significance // null padding - error += "null_padding=" + std::to_string(null_padding) + "\n "; + error += "null_padding = " + std::to_string(null_padding) + "\n "; // sample_size - error += "sample_size=" + std::to_string(sample_size_chunks * STANDARD_VECTOR_SIZE) + "\n "; + error += "sample_size = " + std::to_string(sample_size_chunks * STANDARD_VECTOR_SIZE) + "\n "; // ignore_errors - error += "ignore_errors=" + ignore_errors.FormatValue() + "\n "; + error += "ignore_errors = " + ignore_errors.FormatValue() + "\n "; // all_varchar - error += "all_varchar=" + std::to_string(all_varchar) + "\n"; + error += "all_varchar = " + std::to_string(all_varchar) + "\n"; // Add information regarding sniffer mismatches (if any) error += sniffer_user_mismatch_error; @@ -452,15 +453,15 @@ static Value StringVectorToValue(const vector &vec) { static uint8_t GetCandidateSpecificity(const LogicalType &candidate_type) { //! Const ht with accepted auto_types and their weights in specificity const duckdb::unordered_map auto_type_candidates_specificity { - {(uint8_t)LogicalTypeId::VARCHAR, 0}, {(uint8_t)LogicalTypeId::DOUBLE, 1}, - {(uint8_t)LogicalTypeId::FLOAT, 2}, {(uint8_t)LogicalTypeId::DECIMAL, 3}, - {(uint8_t)LogicalTypeId::BIGINT, 4}, {(uint8_t)LogicalTypeId::INTEGER, 5}, - {(uint8_t)LogicalTypeId::SMALLINT, 6}, {(uint8_t)LogicalTypeId::TINYINT, 7}, - {(uint8_t)LogicalTypeId::TIMESTAMP, 8}, {(uint8_t)LogicalTypeId::DATE, 9}, - {(uint8_t)LogicalTypeId::TIME, 10}, {(uint8_t)LogicalTypeId::BOOLEAN, 11}, - {(uint8_t)LogicalTypeId::SQLNULL, 12}}; - - auto id = (uint8_t)candidate_type.id(); + {static_cast (LogicalTypeId::VARCHAR), 0}, {static_cast (LogicalTypeId::DOUBLE), 1}, + {static_cast (LogicalTypeId::FLOAT), 2}, {static_cast (LogicalTypeId::DECIMAL), 3}, + {static_cast (LogicalTypeId::BIGINT), 4}, {static_cast (LogicalTypeId::INTEGER), 5}, + {static_cast (LogicalTypeId::SMALLINT), 6}, {static_cast (LogicalTypeId::TINYINT), 7}, + {static_cast