using TidierFiles
-
-read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv", skip = 2, n_max = 3, col_select = ["ID", "Score"], missingstring = ["4"])
+The path can be a file available either locally or on the web.
+read_csv("https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv", skip = 2, n_max = 3, col_select = ["ID", "Score"], missingstring = ["4"])
3×2 DataFrame
Row │ ID Score
diff --git a/previews/PR4/reference/index.html b/previews/PR4/reference/index.html
index 93f76c0..c1e2b5d 100644
--- a/previews/PR4/reference/index.html
+++ b/previews/PR4/reference/index.html
@@ -480,7 +480,7 @@
-source
+source
#
TidierFiles.read_csv
— Method.
read_csv(file; delim=',',col_names=true, skip=0, n_max=Inf,
@@ -503,7 +503,7 @@
-source
+source
#
TidierFiles.read_delim
— Method.
read_delim(file; delim=' ',col_names=true, skip=0, n_max=Inf,
@@ -529,7 +529,7 @@
-source
+source
#
TidierFiles.read_dta
— Method.
function read_dta(data_file; encoding=nothing, col_select=nothing, skip=0, n_max=Inf)
@@ -550,7 +550,7 @@
-source
+source
#
TidierFiles.read_fwf
— Method.
read_fwf(filepath::String; num_lines::Int=4, col_names=nothing)
@@ -582,7 +582,7 @@
-source
+source
#
TidierFiles.read_sas
— Method.
diff --git a/previews/PR4/search/search_index.json b/previews/PR4/search/search_index.json
index 66a2434..afbe279 100644
--- a/previews/PR4/search/search_index.json
+++ b/previews/PR4/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#what-is-tidierfilesjl","title":"What is TidierFiles.jl?","text":"TidierFiles.jl is a 100% Julia implementation of the readr and haven R packages. Powered by the CSV.jl, XLSX.jl and ReadStatTables.jl packages, TidierFiles.jl seeks to harmonize file reading/writing by unifying the arguments across multiple file types.
TidierFiles.jl currently supports
Example
read_csv
and write_csv
read_tsv
and write_tsv
read_xlsx
and write_xlsx
read_delim
and write_delim
read_table
and write_table
read_fwf
and fwf_empty
read_sav
and write_sav
(.sav and .por) read_sas
and write_sas
(.sas7bdat and .xpt) read_dta
and write_dta
(.dta)
Read functions include the following arguments and support HTTP reading:
path
missingstring
col_names
col_select
num_threads
skip
n_max
delim
(where applies)
using TidierFiles\n\nread_csv(\"https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv\", skip = 2, n_max = 3, col_select = [\"ID\", \"Score\"], missingstring = [\"4\"])\n
3\u00d72 DataFrame\n Row \u2502 ID Score \n \u2502 Int64? Int64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 3 77\n 2 \u2502 missing 85\n 3 \u2502 5 95\n
"},{"location":"reference/","title":"Reference","text":""},{"location":"reference/#index","title":"Index","text":" TidierFiles.fwf_empty
TidierFiles.read_csv
TidierFiles.read_delim
TidierFiles.read_dta
TidierFiles.read_fwf
TidierFiles.read_sas
TidierFiles.read_sav
TidierFiles.read_table
TidierFiles.read_tsv
TidierFiles.read_xlsx
TidierFiles.write_csv
TidierFiles.write_dta
TidierFiles.write_sas
TidierFiles.write_sav
TidierFiles.write_table
TidierFiles.write_tsv
TidierFiles.write_xlsx
"},{"location":"reference/#reference-exported-functions","title":"Reference - Exported functions","text":"# TidierFiles.fwf_empty
\u2014 Method.
fwf_empty(filepath::String; num_lines::Int=4, col_names=nothing)\n
Analyze a fixed-width format (FWF) file to automatically determine column widths and provide column names.
Arguments
filepath
::String: Path to the FWF file to analyze.
num_lines::Int=4: Number of lines to sample from the beginning of the file for analysis. Default is 4.
col_names
: Optional; a vector of strings specifying column names. If not provided, column names are generated as Column1, Column2, etc.
Returns
- A tuple containing two elements:
- A vector of integers representing the detected column widths.
- A vector of strings representing the column names.
Examples
julia> fwf_data = \n \"John Smith 35 12345 Software Engineer 120,000 \\nJane Doe 29 2345 Marketing Manager 95,000 \\nAlice Jones 42 123456 CEO 250,000 \\nBob Brown 31 12345 Product Manager 110,000 \\nCharlie Day 28 345 Sales Associate 70,000 \\nDiane Poe 35 23456 Data Scientist 130,000 \\nEve Stone 40 123456 Chief Financial Off 200,000 \\nFrank Moore 33 1234 Graphic Designer 80,000 \\nGrace Lee 27 123456 Software Developer 115,000 \\nHank Zuse 45 12345 System Analyst 120,000 \";\n\njulia> open(\"fwftest.txt\", \"w\") do file\n write(file, fwf_data)\n end;\n\njulia> path = \"fwftest.txt\";\n\njulia> fwf_empty(path)\n([13, 5, 8, 20, 8], [\"Column_1\", \"Column_2\", \"Column_3\", \"Column_4\", \"Column_5\"])\n\njulia> fwf_empty(path, num_lines=4, col_names = [\"Name\", \"Age\", \"ID\", \"Position\", \"Salary\"])\n([13, 5, 8, 20, 8], [\"Name\", \"Age\", \"ID\", \"Position\", \"Salary\"])\n
source
# TidierFiles.read_csv
\u2014 Method.
read_csv(file; delim=',',col_names=true, skip=0, n_max=Inf, \n comment=nothing, missingstring=\"\", col_select, escape_double=true, col_types=nothing, num_threads = 1)\n
Reads a CSV file or URL into a DataFrame, with options to specify delimiter, column names, and other CSV parsing options.
Arguments
file
: Path to the CSV file or a URL to a CSV file. delim
: The character delimiting fields in the file. Default is ','. col_names
: Indicates if the first row of the CSV is used as column names. Can be true, false, or an array of strings. Default is true. skip
: Number of initial lines to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf (read all rows). -col_select
: Optional vector of symbols or strings to select which columns to load. comment
: Character that starts a comment line. Lines beginning with this character are ignored. Default is nothing (no comment lines). missingstring
: String that represents missing values in the CSV. Default is \"\", can be set to a vector of multiple items. escape_double
: Indicates whether to interpret two consecutive quote characters as a single quote in the data. Default is true. num_threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Defaults to 1
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_csv(df, \"csvtest.csv\");\n\njulia> read_csv(\"csvtest.csv\", skip = 2, n_max = 3, missingstring = [\"95\", \"Charlie\"])\n3\u00d73 DataFrame\n Row \u2502 ID Name Score \n \u2502 Int64 String7 Int64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 3 missing 77\n 2 \u2502 4 David 85\n 3 \u2502 5 Eva missing \n
source
# TidierFiles.read_delim
\u2014 Method.
read_delim(file; delim=' ',col_names=true, skip=0, n_max=Inf, \n comment=nothing, missingstring=\"\", col_select, escape_double=true, col_types=nothing)\n
Reads a delimited file or URL into a DataFrame, with options to specify delimiter, column names, and other CSV parsing options.
Arguments
file
: Path to the CSV file or a URL to a CSV file. delim
: The character delimiting fields in the file. Default is ','. col_names
: Indicates if the first row of the CSV is used as column names. Can be true, false, or an array of strings. Default is true. skip
: Number of initial lines to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf (read all rows). -col_select
: Optional vector of symbols or strings to select which columns to load. comment
: Character that starts a comment line. Lines beginning with this character are ignored. Default is nothing (no comment lines). missingstring
: String that represents missing values in the CSV. Default is \"\", can be set to a vector of multiple items. escape_double
: Indicates whether to interpret two consecutive quote characters as a single quote in the data. Default is true. col_types
: An optional specification of column types, can be a single type applied to all columns, or a collection of types with one for each column. Default is nothing (types are inferred). num_threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Default is the number of available threads.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_csv(df, \"csvtest.csv\");\n\njulia> read_delim(\"csvtest.csv\", delim = \",\", col_names = false, num_threads = 4) # col_names are false here for the purpose of demonstration\n6\u00d73 DataFrame\n Row \u2502 Column1 Column2 Column3 \n \u2502 String3 String7 String7 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 ID Name Score\n 2 \u2502 1 Alice 88\n 3 \u2502 2 Bob 92\n 4 \u2502 3 Charlie 77\n 5 \u2502 4 David 85\n 6 \u2502 5 Eva 95\n
source
# TidierFiles.read_dta
\u2014 Method.
function read_dta(data_file; encoding=nothing, col_select=nothing, skip=0, n_max=Inf)\n
Read data from a Stata (.dta) file into a DataFrame, supporting both local and remote sources.
Arguments
-filepath
: The path to the .dta file or a URL pointing to such a file. If a URL is provided, the file will be downloaded and then read. encoding
: Optional; specifies the encoding of the input file. If not provided, defaults to the package's or function's default. col_select
: Optional; allows specifying a subset of columns to read. This can be a vector of column names or indices. If nothing, all columns are read. skip=0: Number of rows at the beginning of the file to skip before reading. n*max=Inf: Maximum number of rows to read from the file, after skipping. If Inf, read all available rows. num*threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Defaults to 1
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_dta(df, \"test.dta\");\n\njulia> read_dta(\"test.dta\")\n2\u00d72 DataFrame\n Row \u2502 AA AB \n \u2502 String3 Float64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.read_fwf
\u2014 Method.
read_fwf(filepath::String; num_lines::Int=4, col_names=nothing)\n
Read fixed-width format (FWF) files into a DataFrame.
Arguments
filepath
::String: Path to the FWF file to read. widths_colnames
::Tuple{Vector{Int}, Union{Nothing, Vector{String}}}: A tuple containing two elements: - A vector of integers specifying the widths of each field. - Optionally, a vector of strings specifying column names. If nothing, column names are generated as Column1, Column2, etc. skip_to
=0: Number of lines at the beginning of the file to skip before reading data. n_max
=nothing: Maximum number of lines to read from the file. If nothing, read all lines.
Examples
julia> fwf_data = \n \"John Smith 35 12345 Software Engineer 120,000 \\nJane Doe 29 2345 Marketing Manager 95,000 \\nAlice Jones 42 123456 CEO 250,000 \\nBob Brown 31 12345 Product Manager 110,000 \\nCharlie Day 28 345 Sales Associate 70,000 \\nDiane Poe 35 23456 Data Scientist 130,000 \\nEve Stone 40 123456 Chief Financial Off 200,000 \\nFrank Moore 33 1234 Graphic Designer 80,000 \\nGrace Lee 27 123456 Software Developer 115,000 \\nHank Zuse 45 12345 System Analyst 120,000 \";\n\njulia> open(\"fwftest.txt\", \"w\") do file\n write(file, fwf_data)\n end;\n\njulia> path = \"fwftest.txt\";\n\njulia> read_fwf(path, fwf_empty(path, num_lines=4, col_names = [\"Name\", \"Age\", \"ID\", \"Position\", \"Salary\"]), skip_to=3, n_max=3)\n3\u00d75 DataFrame\n Row \u2502 Name Age ID Position Salary \n \u2502 String String String String String \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 Bob Brown 31 12345 Product Manager 110,000\n 2 \u2502 Charlie Day 28 345 Sales Associate 70,000\n 3 \u2502 Diane Poe 35 23456 Data Scientist 130,000\n
source
# TidierFiles.read_sas
\u2014 Method.
function read_sas(data_file; encoding=nothing, col_select=nothing, skip=0, n_max=Inf, num_threads)\n
Read data from a SAS (.sas7bdat and .xpt) file into a DataFrame, supporting both local and remote sources.
Arguments
-filepath
: The path to the .dta file or a URL pointing to such a file. If a URL is provided, the file will be downloaded and then read. encoding
: Optional; specifies the encoding of the input file. If not provided, defaults to the package's or function's default. col_select
: Optional; allows specifying a subset of columns to read. This can be a vector of column names or indices. If nothing, all columns are read. skip=0: Number of rows at the beginning of the file to skip before reading. n*max=Inf: Maximum number of rows to read from the file, after skipping. If Inf, read all available rows. num*threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Defaults to 1
Examples
```jldoctest julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);
julia> write_sas(df, \"test.sas7bdat\");
julia> read_sas(\"test.sas7bdat\") 2\u00d72 DataFrame Row \u2502 AA AB \u2502 String3 Float64 \u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 1 \u2502 sav 10.1 2 \u2502 por 10.2
julia> write_sas(df, \"test.xpt\");
julia> read_sas(\"test.xpt\") 2\u00d72 DataFrame Row \u2502 AA AB \u2502 String3 Float64 \u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 1 \u2502 sav 10.1 2 \u2502 por 10.2
source
# TidierFiles.read_sav
\u2014 Method.
function read_sav(data_file; encoding=nothing, col_select=nothing, skip=0, n_max=Inf)\n
Read data from a SPSS (.sav and .por) file into a DataFrame, supporting both local and remote sources.
Arguments
-filepath
: The path to the .sav or .por file or a URL pointing to such a file. If a URL is provided, the file will be downloaded and then read. encoding
: Optional; specifies the encoding of the input file. If not provided, defaults to the package's or function's default. col_select
: Optional; allows specifying a subset of columns to read. This can be a vector of column names or indices. If nothing, all columns are read. skip=0: Number of rows at the beginning of the file to skip before reading. n*max=Inf: Maximum number of rows to read from the file, after skipping. If Inf, read all available rows. num*threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Defaults to 1
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_sav(df, \"test.sav\");\n\njulia> read_sav(\"test.sav\")\n2\u00d72 DataFrame\n Row \u2502 AA AB \n \u2502 String Float64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n\njulia> write_sav(df, \"test.por\");\n\njulia> read_sav(\"test.por\")\n2\u00d72 DataFrame\n Row \u2502 AA AB \n \u2502 String Float64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.read_table
\u2014 Method.
read_table(file; col_names=true, skip=0, n_max=Inf, comment=nothing, col_select, missingstring=\"\", kwargs...)\n
Read a table from a file where columns are separated by any amount of whitespace, processing it into a DataFrame.
Arguments
-file
: The path to the file to read. -col_names
=true: Indicates whether the first non-skipped line should be treated as column names. If false, columns are named automatically. -skip
: Number of lines at the beginning of the file to skip before processing starts. -n_max
: The maximum number of lines to read from the file, after skipping. Inf means read all lines. -col_select
: Optional vector of symbols or strings to select which columns to load. -comment
: A character or string indicating the start of a comment. Lines starting with this character are ignored. -missingstring
: The string that represents missing values in the table. -kwargs
: Additional keyword arguments passed to CSV.File.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_table(df, \"tabletest.txt\");\n\njulia> read_table(\"tabletest.txt\", skip = 2, n_max = 3, col_select = [\"Name\"])\n3\u00d71 DataFrame\n Row \u2502 Name \n \u2502 String7 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 Charlie\n 2 \u2502 David\n 3 \u2502 Eva\n
source
# TidierFiles.read_tsv
\u2014 Method.
read_tsv(file; delim=' ',col_names=true, skip=0, n_max=Inf, \n comment=nothing, missingstring=\"\", col_select, escape_double=true, col_types=nothing)\n
Reads a TSV file or URL into a DataFrame, with options to specify delimiter, column names, and other CSV parsing options.
Arguments
file
: Path to the TSV file or a URL to a TSV file. delim
: The character delimiting fields in the file. Default is ','. col_names
: Indicates if the first row of the CSV is used as column names. Can be true, false, or an array of strings. Default is true. skip
: Number of initial lines to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf (read all rows). -col_select
: Optional vector of symbols or strings to select which columns to load. comment
: Character that starts a comment line. Lines beginning with this character are ignored. Default is nothing (no comment lines). missingstring
: String that represents missing values in the CSV. Default is \"\", can be set to a vector of multiple items. escape_double
: Indicates whether to interpret two consecutive quote characters as a single quote in the data. Default is true. num_threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Default is the number of available threads.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_tsv(df, \"tsvtest.tsv\");\n\njulia> read_tsv(\"tsvtest.tsv\", skip = 2, n_max = 3, missingstring = [\"Charlie\"])\n3\u00d73 DataFrame\n Row \u2502 ID Name Score \n \u2502 Int64 String7 Int64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 3 missing 77\n 2 \u2502 4 David 85\n 3 \u2502 5 Eva 95\n
source
# TidierFiles.read_xlsx
\u2014 Method.
read_xlsx(path; sheet, range, col_names, col_types, missingstring, trim_ws, skip, n_max, guess_max)\n
Read data from an Excel file into a DataFrame.
Arguments
-path
: The path to the Excel file to be read. -sheet
: Specifies the sheet to be read. Can be either the name of the sheet as a string or its index as an integer. If nothing, the first sheet is read. -range
: Specifies a specific range of cells to be read from the sheet. If nothing, the entire sheet is read. -col_names
: Indicates whether the first row of the specified range should be treated as column names. If false, columns will be named automatically. -col_types
: Allows specifying column types explicitly. Can be a single type applied to all columns, a list or a dictionary mapping column names or indices to types. If nothing, types will be inferred. -missingstring
: The value or vector that represents missing values in the Excel file. -trim_ws
: Whether to trim leading and trailing whitespace from cells in the Excel file. -skip
: Number of rows to skip at the beginning of the sheet or range before reading data. -n_max
: The maximum number of rows to read from the sheet or range, after skipping. Inf means read all available rows. -guess_max
: The maximum number of rows to scan for type guessing and column names detection. Only relevant if coltypes is nothing or colnames is true. If nothing, a default heuristic is used.
Examples
julia> df = DataFrame(integers=[1, 2, 3, 4],\n strings=[\"This\", \"Package makes\", \"File reading/writing\", \"even smoother\"],\n floats=[10.2, 20.3, 30.4, 40.5]);\n\njulia> df2 = DataFrame(AA=[\"aa\", \"bb\"], AB=[10.1, 10.2]);\n\njulia> write_xlsx((\"REPORT_A\" => df, \"REPORT_B\" => df2); path=\"xlsxtest.xlsx\", overwrite = true);\n\njulia> read_xlsx(\"xlsxtest.xlsx\", sheet = \"REPORT_A\", skip = 1, n_max = 4, missingstring = [2])\n3\u00d73 DataFrame\n Row \u2502 integers strings floats \n \u2502 Any String Float64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 missing Package makes 20.3\n 2 \u2502 3 File reading/writing 30.4\n 3 \u2502 4 even smoother 40.5\n
source
# TidierFiles.write_csv
\u2014 Method.
write_csv(DataFrame, filepath; na = \"\", append = false, col_names = true, missingstring, eol = \"\n
\", num_threads = Threads.nthreads()) Write a DataFrame to a CSV (comma-separated values) file.
Arguments
x
: The DataFrame to write to the CSV file. file
: The path to the output CSV file. missingstring
: = \"\": The string to represent missing values in the output file. Default is an empty string. append
: Whether to append to the file if it already exists. Default is false. col_names
: = true: Whether to write column names as the first line of the file. Default is true. eol
: = \"
\": The end-of-line character to use in the output file. Default is the newline character.
num_threads
= Threads.nthreads(): The number of threads to use for writing the file. Default is the number of available threads.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_csv(df, \"csvtest.csv\");\n
source
# TidierFiles.write_dta
\u2014 Method.
write_dta(df, path)\n
Write a DataFrame to a Stata (.dta) file.
Arguments -df
: The DataFrame to be written to a file. -path
: String as path where the .dta file will be created. If a file at this path already exists, it will be overwritten.
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_dta(df, \"test.dta\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.write_sas
\u2014 Method.
write_sas(df, path)\n
Write a DataFrame to a SAS (.sas7bdat or .xpt) file.
Arguments -df
: The DataFrame to be written to a file. -path
: String as path where the .dta file will be created. If a file at this path already exists, it will be overwritten.
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_sas(df, \"test.sas7bdat\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n\njulia> write_sas(df, \"test.xpt\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.write_sav
\u2014 Method.
write_sav(df, path)\n
Write a DataFrame to a SPSS (.sav or .por) file.
Arguments -df
: The DataFrame to be written to a file. -path
: String as path where the .dta file will be created. If a file at this path already exists, it will be overwritten.
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_sav(df, \"test.sav\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n\njulia> write_sav(df, \"test.por\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.write_table
\u2014 Method.
write_table(x, file; delim = ' ', na, append, col_names, eol, num_threads)\n
Write a DataFrame to a file, allowing for customization of the delimiter and other options.
Arguments
-x
: The DataFrame to write to a file. -file
: The path to the file where the DataFrame will be written. -delim: Character to use as the field delimiter. The default is tab (' '), making it a TSV (tab-separated values) file by default, but can be changed to accommodate other formats. -missingstring
: The string to represent missing data in the output file. -append
: Whether to append to the file if it already exists. If false, the file will be overwritten. -col_names
: Whether to write column names as the first line of the file. If appending to an existing file with append = true, column names will not be written regardless of this parameter's value. -eol
: The end-of-line character to use in the file. Defaults to \" \". -num_threads
: Number of threads to use for writing the file. Uses the number of available Julia threads by default.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_table(df, \"tabletest.txt\");\n
source
# TidierFiles.write_tsv
\u2014 Method.
write_tsv(DataFrame, filepath; na = \"\", append = false, col_names = true, missingstring, eol = \"\n
\", num_threads = Threads.nthreads()) Write a DataFrame to a TSV (tab-separated values) file.
Arguments
x
: The DataFrame to write to the TSV file. file
: The path to the output TSV file. missingstring
: = \"\": The string to represent missing values in the output file. Default is an empty string. append
: Whether to append to the file if it already exists. Default is false. col_names
: = true: Whether to write column names as the first line of the file. Default is true. eol
: = \"
\": The end-of-line character to use in the output file. Default is the newline character.
num_threads
= Threads.nthreads(): The number of threads to use for writing the file. Default is the number of available threads.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_tsv(df, \"tsvtest.tsv\");\n
source
# TidierFiles.write_xlsx
\u2014 Method.
write_xlsx(x; path, overwrite)\n
Write a DataFrame, or multiple DataFrames, to an Excel file.
"},{"location":"reference/#arguments-x-the-data-to-write-can-be-a-single-pairstring-dataframe-for-writing-one-sheet-or-a-tuple-of-such-pairs-for-writing-multiple-sheets-the-string-in-each-pair-specifies-the-sheet-name-and-the-dataframe-is-the-data-to-write-to-that-sheet-path-the-path-to-the-excel-file-where-the-data-will-be-written-overwrite-defaults-to-false-whether-to-overwrite-an-existing-file-if-false-an-error-is-thrown-when-attempting-to-write-to-an-existing-file","title":"Arguments -x
: The data to write. Can be a single Pair{String, DataFrame} for writing one sheet, or a Tuple of such pairs for writing multiple sheets. The String in each pair specifies the sheet name, and the DataFrame is the data to write to that sheet. -path
: The path to the Excel file where the data will be written. -overwrite
: Defaults to false. Whether to overwrite an existing file. If false, an error is thrown when attempting to write to an existing file.","text":"Examples
julia> df = DataFrame(integers=[1, 2, 3, 4],\n strings=[\"This\", \"Package makes\", \"File reading/writing\", \"even smoother\"],\n floats=[10.2, 20.3, 30.4, 40.5]);\n\njulia> df2 = DataFrame(AA=[\"aa\", \"bb\"], AB=[10.1, 10.2]);\n\njulia> write_xlsx((\"REPORT_A\" => df, \"REPORT_B\" => df2); path=\"xlsxtest.xlsx\", overwrite = true);\n
source
"},{"location":"reference/#reference-internal-functions","title":"Reference - Internal functions","text":""},{"location":"examples/generated/UserGuide/delim/","title":"Delimited Files","text":"The goal of reading and writing throughout TidierFiles.jl is to use consistent syntax. This functions on this page focus on delimited files and are powered by CSV.jl.
using TidierFiles\n
"},{"location":"examples/generated/UserGuide/delim/#read_csvtsvdelim","title":"read_csv/tsv/delim","text":"read_csv(\"https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv\", skip = 2, n_max = 3, col_select = [\"ID\", \"Score\"], missingstring = [\"4\"])\n\n#read_csv(file; delim=',', col_names=true, skip=0, n_max=Inf, comment=nothing, missingstring=\"\", col_select=nothing, escape_double=true, col_types=nothing, num_threads=1)\n\n#read_tsv(file; delim='\\t', col_names=true, skip=0, n_max=Inf, comment=nothing, missingstring=\"\", col_select=nothing, escape_double=true, col_types=nothing, num_threads=Threads.nthreads())\n\n#read_delim(file; delim='\\t', col_names=true, skip=0, n_max=Inf, comment=nothing, missingstring=\"\", col_select=nothing, escape_double=true, col_types=nothing, num_threads=Threads.nthreads())\n\n#These functions read a delimited file (CSV, TSV, or custom delimiter) into a DataFrame. The arguments are:\n
3\u00d72 DataFrame RowIDScoreInt64?Int6413772missing853595 file
: Path to the file or a URL. delim
: Field delimiter. Default is ',' for read_csv
, '\\t' for read_tsv
and read_delim
. col_names
: Use first row as column names. Can be true
, false
, or an array of strings. Default is true
. skip
: Number of lines to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf
(read all rows). comment
: Character indicating comment lines to ignore. Default is nothing
. missingstring
: String(s) representing missing values. Default is \"\"
. col_select
: Optional vector of symbols or strings to select columns to load. Default is nothing
. escape_double
: Interpret two consecutive quote characters as a single quote. Default is true
. col_types
: Optional specification of column types. Default is nothing
(types are inferred). num_threads
: Number of threads to use for parallel execution. Default is 1 for read_csv
and the number of available threads for read_tsv
and read_delim
.
The functions return a DataFrame containing the parsed data from the file.
"},{"location":"examples/generated/UserGuide/delim/#write_csv-and-write_tsv","title":"write_csv
and # ## write_tsv
","text":"writecsv(x, file; missingstring=\"\", append=false, colnames=true, eol=\"\\n\", num_threads=Threads.nthreads())
writetsv(x, file; missingstring=\"\", append=false, colnames=true, eol=\"\\n\", num_threads=Threads.nthreads())
These functions write a DataFrame to a CSV or TSV file. The arguments are:
x
: The DataFrame to write. file
: The path to the output file. missingstring
: The string to represent missing values. Default is an empty string. append
: Whether to append to an existing file. Default is false
. col_names
: Whether to write column names as the first line. Default is true
. eol
: The end-of-line character. Default is \"\\n\"
. num_threads
: The number of threads to use for writing. Default is the number of available threads.
"},{"location":"examples/generated/UserGuide/delim/#read_table","title":"read_table
","text":"readtable(file; colnames=true, skip=0, nmax=Inf, comment=nothing, colselect=nothing, missingstring=\"\", num_threads)
This function reads a table from a whitespace-delimited file into a DataFrame. The arguments are:
file
: The path to the file to read. col_names
: Whether the first non-skipped line contains column names. Default is true
. skip
: Number of lines to skip before processing. Default is 0. n_max
: Maximum number of lines to read. Default is Inf
(read all lines). comment
: Character or string indicating comment lines to ignore. Default is nothing
. col_select
: Optional vector of symbols or strings to select columns to load. Default is nothing
. missingstring
: The string representing missing values. Default is \"\"
. num_threads
: The number of threads to use for writing. Default is the number of available threads.
"},{"location":"examples/generated/UserGuide/delim/#write_table","title":"write_table
","text":"writetable(x, file; delim='\\t', missingstring=\"\", append=false, colnames=true, eol=\"\\n\", num_threads=Threads.nthreads())
This function writes a DataFrame to a file with customizable delimiter and options. The arguments are:
x
: The DataFrame to write. file
: The path to the output file. delim
: The field delimiter. Default is '\\t'
(tab-separated). missingstring
: The string to represent missing values. Default is \"\"
. append
: Whether to append to an existing file. Default is false
. col_names
: Whether to write column names as the first line. Default is true
. eol
: The end-of-line character. Default is \"\\n\"
. num_threads
: The number of threads to use for writing. Default is the number of available threads.
This page was generated using Literate.jl.
"},{"location":"examples/generated/UserGuide/stats/","title":"Stats Files","text":"The functions for reading and writing stats files are made possible by ReadStatTables.jl
"},{"location":"examples/generated/UserGuide/stats/#reading-stats-files","title":"reading stats files","text":"readdta(filepath; encoding=nothing, colselect=nothing, skip=0, nmax=Inf, numthreads=1) readsas(filepath; encoding=nothing, colselect=nothing, skip=0, nmax=Inf, numthreads=1) readsav(filepath; encoding=nothing, colselect=nothing, skip=0, nmax=Inf, numthreads=1)
These functions read data from Stata (.dta), SAS (.sas7bdat and .xpt), and SPSS (.sav and .por) files into a DataFrame. The arguments are:
filepath
: The path to the file or a URL pointing to the file. If a URL is provided, the file will be downloaded and then read. encoding
: Optional; specifies the encoding of the input file. Default is the package's or function's default. col_select
: Optional; allows specifying a subset of columns to read. Can be a vector of column names or indices. Default is nothing
(all columns are read). skip
: Number of rows to skip at the beginning of the file. Default is 0. n_max
: Maximum number of rows to read after skipping. Default is Inf
(read all rows). num_threads
: Number of concurrent tasks or threads to use for processing. Default is 1.
"},{"location":"examples/generated/UserGuide/stats/#writing-stats-files","title":"writing stats files","text":"writesav(df, path) writesas(df, path) write_dta(df, path)
These functions write a DataFrame to SPSS (.sav or .por), SAS (.sas7bdat or .xpt), and Stata (.dta) files. The arguments are:
df
: The DataFrame to be written to a file. path
: The path where the file will be created. If a file at this path already exists, it will be overwritten.
This page was generated using Literate.jl.
"},{"location":"examples/generated/UserGuide/xl/","title":"Excel Files","text":"Reading and writing XLSX files are made possible by XLSX.jl
"},{"location":"examples/generated/UserGuide/xl/#read_xlsx","title":"read_xlsx
","text":"readxlsx(path; sheet=nothing, range=nothing, colnames=true, coltypes=nothing, missingstring=\"\", trimws=true, skip=0, nmax=Inf, guessmax=nothing)
This function reads data from an Excel file into a DataFrame. The arguments are:
path
: The path or URL to the Excel file to be read. sheet
: The sheet to be read. Can be a sheet name (string) or index (integer). Default is the first sheet. range
: A specific range of cells to be read from the sheet. Default is the entire sheet. col_names
: Whether the first row of the range contains column names. Default is true
. col_types
: Explicit specification of column types. Can be a single type, a list, or a dictionary mapping column names or indices to types. Default is nothing
(types are inferred). missingstring
: The string representing missing values. Default is \"\"
. trim_ws
: Whether to trim leading and trailing whitespace from cells. Default is true
. skip
: Number of rows to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf
(read all rows). guess_max
: Maximum number of rows to scan for type guessing and column names detection. Default is nothing
(a default heuristic is used).
"},{"location":"examples/generated/UserGuide/xl/#write_xlsx","title":"write_xlsx
","text":"write_xlsx(x; path, overwrite=false)
This function writes a DataFrame, or multiple DataFrames, to an Excel file. The arguments are:
x
: The data to write. Can be a single Pair{String, DataFrame}
for writing one sheet, or a Tuple
of such pairs for writing multiple sheets. The String
in each pair specifies the sheet name, and the DataFrame
is the data to write to that sheet. path
: The path to the output Excel file. overwrite
: Whether to overwrite an existing file. Default is false
.
This page was generated using Literate.jl.
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#tidierfilesjl","title":"TidierFiles.jl","text":""},{"location":"#what-is-tidierfilesjl","title":"What is TidierFiles.jl?","text":"TidierFiles.jl is a 100% Julia implementation of the readr, haven, readxl, and writexl R packages.
Powered by the CSV.jl, XLSX.jl and ReadStatTables.jl packages, TidierFiles.jl aims to bring a consistent interface to the reading and writing of tabular data, including a consistent syntax to read files locally versus from the web and consistent keyword arguments across data formats.
Currently supported file types:
read_csv
and write_csv
read_tsv
and write_tsv
read_xlsx
and write_xlsx
read_delim
and write_delim
read_table
and write_table
read_fwf
and fwf_empty
read_sav
and write_sav
(.sav and .por) read_sas
and write_sas
(.sas7bdat and .xpt) read_dta
and write_dta
(.dta)
"},{"location":"#examples","title":"Examples","text":"Here is an example of how to write and read a CSV file.
using TidierFiles\n\ndf = DataFrame(\n integers = [1, 2, 3, 4],\n strings = [\"This\", \"Package makes\", \"File reading/writing\", \"even smoother\"],\n floats = [10.2, 20.3, 30.4, 40.5],\n dates = [Date(2018,2,20), Date(2018,2,21), Date(2018,2,22), Date(2018,2,23)],\n times = [Dates.Time(19,10), Dates.Time(19,20), Dates.Time(19,30), Dates.Time(19,40)]\n )\n\nwrite_csv(df, \"testing.csv\" , col_names = true)\n\nread_csv(\"testing.csv\", missingstring=[\"40.5\", \"10.2\"])\n
4\u00d75 DataFrame\n Row \u2502 integers strings floats dates times \n \u2502 Int64 String31 Float64? Date Time \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 1 This missing 2018-02-20 19:10:00\n 2 \u2502 2 Package makes 20.3 2018-02-21 19:20:00\n 3 \u2502 3 File reading/writing 30.4 2018-02-22 19:30:00\n 4 \u2502 4 even smoother missing 2018-02-23 19:40:00:00\n
The file reading functions include the following keyword arguments:
path
missingstring
col_names
col_select
num_threads
skip
n_max
delim
(where applicable)
The path can be a file available either locally or on the web.
read_csv(\"https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv\", skip = 2, n_max = 3, col_select = [\"ID\", \"Score\"], missingstring = [\"4\"])\n
3\u00d72 DataFrame\n Row \u2502 ID Score \n \u2502 Int64? Int64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 3 77\n 2 \u2502 missing 85\n 3 \u2502 5 95\n
"},{"location":"reference/","title":"Reference","text":""},{"location":"reference/#index","title":"Index","text":" TidierFiles.fwf_empty
TidierFiles.read_csv
TidierFiles.read_delim
TidierFiles.read_dta
TidierFiles.read_fwf
TidierFiles.read_sas
TidierFiles.read_sav
TidierFiles.read_table
TidierFiles.read_tsv
TidierFiles.read_xlsx
TidierFiles.write_csv
TidierFiles.write_dta
TidierFiles.write_sas
TidierFiles.write_sav
TidierFiles.write_table
TidierFiles.write_tsv
TidierFiles.write_xlsx
"},{"location":"reference/#reference-exported-functions","title":"Reference - Exported functions","text":"# TidierFiles.fwf_empty
\u2014 Method.
fwf_empty(filepath::String; num_lines::Int=4, col_names=nothing)\n
Analyze a fixed-width format (FWF) file to automatically determine column widths and provide column names.
Arguments
filepath
::String: Path to the FWF file to analyze.
num_lines::Int=4: Number of lines to sample from the beginning of the file for analysis. Default is 4.
col_names
: Optional; a vector of strings specifying column names. If not provided, column names are generated as Column1, Column2, etc.
Returns
- A tuple containing two elements:
- A vector of integers representing the detected column widths.
- A vector of strings representing the column names.
Examples
julia> fwf_data = \n \"John Smith 35 12345 Software Engineer 120,000 \\nJane Doe 29 2345 Marketing Manager 95,000 \\nAlice Jones 42 123456 CEO 250,000 \\nBob Brown 31 12345 Product Manager 110,000 \\nCharlie Day 28 345 Sales Associate 70,000 \\nDiane Poe 35 23456 Data Scientist 130,000 \\nEve Stone 40 123456 Chief Financial Off 200,000 \\nFrank Moore 33 1234 Graphic Designer 80,000 \\nGrace Lee 27 123456 Software Developer 115,000 \\nHank Zuse 45 12345 System Analyst 120,000 \";\n\njulia> open(\"fwftest.txt\", \"w\") do file\n write(file, fwf_data)\n end;\n\njulia> path = \"fwftest.txt\";\n\njulia> fwf_empty(path)\n([13, 5, 8, 20, 8], [\"Column_1\", \"Column_2\", \"Column_3\", \"Column_4\", \"Column_5\"])\n\njulia> fwf_empty(path, num_lines=4, col_names = [\"Name\", \"Age\", \"ID\", \"Position\", \"Salary\"])\n([13, 5, 8, 20, 8], [\"Name\", \"Age\", \"ID\", \"Position\", \"Salary\"])\n
source
# TidierFiles.read_csv
\u2014 Method.
read_csv(file; delim=',',col_names=true, skip=0, n_max=Inf, \n comment=nothing, missingstring=\"\", col_select, escape_double=true, col_types=nothing, num_threads = 1)\n
Reads a CSV file or URL into a DataFrame, with options to specify delimiter, column names, and other CSV parsing options.
Arguments
file
: Path to the CSV file or a URL to a CSV file. delim
: The character delimiting fields in the file. Default is ','. col_names
: Indicates if the first row of the CSV is used as column names. Can be true, false, or an array of strings. Default is true. skip
: Number of initial lines to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf (read all rows). -col_select
: Optional vector of symbols or strings to select which columns to load. comment
: Character that starts a comment line. Lines beginning with this character are ignored. Default is nothing (no comment lines). missingstring
: String that represents missing values in the CSV. Default is \"\", can be set to a vector of multiple items. escape_double
: Indicates whether to interpret two consecutive quote characters as a single quote in the data. Default is true. num_threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Defaults to 1
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_csv(df, \"csvtest.csv\");\n\njulia> read_csv(\"csvtest.csv\", skip = 2, n_max = 3, missingstring = [\"95\", \"Charlie\"])\n3\u00d73 DataFrame\n Row \u2502 ID Name Score \n \u2502 Int64 String7 Int64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 3 missing 77\n 2 \u2502 4 David 85\n 3 \u2502 5 Eva missing \n
source
# TidierFiles.read_delim
\u2014 Method.
read_delim(file; delim=' ',col_names=true, skip=0, n_max=Inf, \n comment=nothing, missingstring=\"\", col_select, escape_double=true, col_types=nothing)\n
Reads a delimited file or URL into a DataFrame, with options to specify delimiter, column names, and other CSV parsing options.
Arguments
file
: Path to the CSV file or a URL to a CSV file. delim
: The character delimiting fields in the file. Default is ','. col_names
: Indicates if the first row of the CSV is used as column names. Can be true, false, or an array of strings. Default is true. skip
: Number of initial lines to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf (read all rows). -col_select
: Optional vector of symbols or strings to select which columns to load. comment
: Character that starts a comment line. Lines beginning with this character are ignored. Default is nothing (no comment lines). missingstring
: String that represents missing values in the CSV. Default is \"\", can be set to a vector of multiple items. escape_double
: Indicates whether to interpret two consecutive quote characters as a single quote in the data. Default is true. col_types
: An optional specification of column types, can be a single type applied to all columns, or a collection of types with one for each column. Default is nothing (types are inferred). num_threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Default is the number of available threads.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_csv(df, \"csvtest.csv\");\n\njulia> read_delim(\"csvtest.csv\", delim = \",\", col_names = false, num_threads = 4) # col_names are false here for the purpose of demonstration\n6\u00d73 DataFrame\n Row \u2502 Column1 Column2 Column3 \n \u2502 String3 String7 String7 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 ID Name Score\n 2 \u2502 1 Alice 88\n 3 \u2502 2 Bob 92\n 4 \u2502 3 Charlie 77\n 5 \u2502 4 David 85\n 6 \u2502 5 Eva 95\n
source
# TidierFiles.read_dta
\u2014 Method.
function read_dta(data_file; encoding=nothing, col_select=nothing, skip=0, n_max=Inf)\n
Read data from a Stata (.dta) file into a DataFrame, supporting both local and remote sources.
Arguments
-filepath
: The path to the .dta file or a URL pointing to such a file. If a URL is provided, the file will be downloaded and then read. encoding
: Optional; specifies the encoding of the input file. If not provided, defaults to the package's or function's default. col_select
: Optional; allows specifying a subset of columns to read. This can be a vector of column names or indices. If nothing, all columns are read. skip=0: Number of rows at the beginning of the file to skip before reading. n*max=Inf: Maximum number of rows to read from the file, after skipping. If Inf, read all available rows. num*threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Defaults to 1
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_dta(df, \"test.dta\");\n\njulia> read_dta(\"test.dta\")\n2\u00d72 DataFrame\n Row \u2502 AA AB \n \u2502 String3 Float64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.read_fwf
\u2014 Method.
read_fwf(filepath::String; num_lines::Int=4, col_names=nothing)\n
Read fixed-width format (FWF) files into a DataFrame.
Arguments
filepath
::String: Path to the FWF file to read. widths_colnames
::Tuple{Vector{Int}, Union{Nothing, Vector{String}}}: A tuple containing two elements: - A vector of integers specifying the widths of each field. - Optionally, a vector of strings specifying column names. If nothing, column names are generated as Column1, Column2, etc. skip_to
=0: Number of lines at the beginning of the file to skip before reading data. n_max
=nothing: Maximum number of lines to read from the file. If nothing, read all lines.
Examples
julia> fwf_data = \n \"John Smith 35 12345 Software Engineer 120,000 \\nJane Doe 29 2345 Marketing Manager 95,000 \\nAlice Jones 42 123456 CEO 250,000 \\nBob Brown 31 12345 Product Manager 110,000 \\nCharlie Day 28 345 Sales Associate 70,000 \\nDiane Poe 35 23456 Data Scientist 130,000 \\nEve Stone 40 123456 Chief Financial Off 200,000 \\nFrank Moore 33 1234 Graphic Designer 80,000 \\nGrace Lee 27 123456 Software Developer 115,000 \\nHank Zuse 45 12345 System Analyst 120,000 \";\n\njulia> open(\"fwftest.txt\", \"w\") do file\n write(file, fwf_data)\n end;\n\njulia> path = \"fwftest.txt\";\n\njulia> read_fwf(path, fwf_empty(path, num_lines=4, col_names = [\"Name\", \"Age\", \"ID\", \"Position\", \"Salary\"]), skip_to=3, n_max=3)\n3\u00d75 DataFrame\n Row \u2502 Name Age ID Position Salary \n \u2502 String String String String String \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 Bob Brown 31 12345 Product Manager 110,000\n 2 \u2502 Charlie Day 28 345 Sales Associate 70,000\n 3 \u2502 Diane Poe 35 23456 Data Scientist 130,000\n
source
# TidierFiles.read_sas
\u2014 Method.
function read_sas(data_file; encoding=nothing, col_select=nothing, skip=0, n_max=Inf, num_threads)\n
Read data from a SAS (.sas7bdat and .xpt) file into a DataFrame, supporting both local and remote sources.
Arguments
-filepath
: The path to the .dta file or a URL pointing to such a file. If a URL is provided, the file will be downloaded and then read. encoding
: Optional; specifies the encoding of the input file. If not provided, defaults to the package's or function's default. col_select
: Optional; allows specifying a subset of columns to read. This can be a vector of column names or indices. If nothing, all columns are read. skip=0: Number of rows at the beginning of the file to skip before reading. n*max=Inf: Maximum number of rows to read from the file, after skipping. If Inf, read all available rows. num*threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Defaults to 1
Examples
```jldoctest julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);
julia> write_sas(df, \"test.sas7bdat\");
julia> read_sas(\"test.sas7bdat\") 2\u00d72 DataFrame Row \u2502 AA AB \u2502 String3 Float64 \u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 1 \u2502 sav 10.1 2 \u2502 por 10.2
julia> write_sas(df, \"test.xpt\");
julia> read_sas(\"test.xpt\") 2\u00d72 DataFrame Row \u2502 AA AB \u2502 String3 Float64 \u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 1 \u2502 sav 10.1 2 \u2502 por 10.2
source
# TidierFiles.read_sav
\u2014 Method.
function read_sav(data_file; encoding=nothing, col_select=nothing, skip=0, n_max=Inf)\n
Read data from a SPSS (.sav and .por) file into a DataFrame, supporting both local and remote sources.
Arguments
-filepath
: The path to the .sav or .por file or a URL pointing to such a file. If a URL is provided, the file will be downloaded and then read. encoding
: Optional; specifies the encoding of the input file. If not provided, defaults to the package's or function's default. col_select
: Optional; allows specifying a subset of columns to read. This can be a vector of column names or indices. If nothing, all columns are read. skip=0: Number of rows at the beginning of the file to skip before reading. n*max=Inf: Maximum number of rows to read from the file, after skipping. If Inf, read all available rows. num*threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Defaults to 1
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_sav(df, \"test.sav\");\n\njulia> read_sav(\"test.sav\")\n2\u00d72 DataFrame\n Row \u2502 AA AB \n \u2502 String Float64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n\njulia> write_sav(df, \"test.por\");\n\njulia> read_sav(\"test.por\")\n2\u00d72 DataFrame\n Row \u2502 AA AB \n \u2502 String Float64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.read_table
\u2014 Method.
read_table(file; col_names=true, skip=0, n_max=Inf, comment=nothing, col_select, missingstring=\"\", kwargs...)\n
Read a table from a file where columns are separated by any amount of whitespace, processing it into a DataFrame.
Arguments
-file
: The path to the file to read. -col_names
=true: Indicates whether the first non-skipped line should be treated as column names. If false, columns are named automatically. -skip
: Number of lines at the beginning of the file to skip before processing starts. -n_max
: The maximum number of lines to read from the file, after skipping. Inf means read all lines. -col_select
: Optional vector of symbols or strings to select which columns to load. -comment
: A character or string indicating the start of a comment. Lines starting with this character are ignored. -missingstring
: The string that represents missing values in the table. -kwargs
: Additional keyword arguments passed to CSV.File.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_table(df, \"tabletest.txt\");\n\njulia> read_table(\"tabletest.txt\", skip = 2, n_max = 3, col_select = [\"Name\"])\n3\u00d71 DataFrame\n Row \u2502 Name \n \u2502 String7 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 Charlie\n 2 \u2502 David\n 3 \u2502 Eva\n
source
# TidierFiles.read_tsv
\u2014 Method.
read_tsv(file; delim=' ',col_names=true, skip=0, n_max=Inf, \n comment=nothing, missingstring=\"\", col_select, escape_double=true, col_types=nothing)\n
Reads a TSV file or URL into a DataFrame, with options to specify delimiter, column names, and other CSV parsing options.
Arguments
file
: Path to the TSV file or a URL to a TSV file. delim
: The character delimiting fields in the file. Default is ','. col_names
: Indicates if the first row of the CSV is used as column names. Can be true, false, or an array of strings. Default is true. skip
: Number of initial lines to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf (read all rows). -col_select
: Optional vector of symbols or strings to select which columns to load. comment
: Character that starts a comment line. Lines beginning with this character are ignored. Default is nothing (no comment lines). missingstring
: String that represents missing values in the CSV. Default is \"\", can be set to a vector of multiple items. escape_double
: Indicates whether to interpret two consecutive quote characters as a single quote in the data. Default is true. num_threads
: specifies the number of concurrent tasks or threads to use for processing, allowing for parallel execution. Default is the number of available threads.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_tsv(df, \"tsvtest.tsv\");\n\njulia> read_tsv(\"tsvtest.tsv\", skip = 2, n_max = 3, missingstring = [\"Charlie\"])\n3\u00d73 DataFrame\n Row \u2502 ID Name Score \n \u2502 Int64 String7 Int64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 3 missing 77\n 2 \u2502 4 David 85\n 3 \u2502 5 Eva 95\n
source
# TidierFiles.read_xlsx
\u2014 Method.
read_xlsx(path; sheet, range, col_names, col_types, missingstring, trim_ws, skip, n_max, guess_max)\n
Read data from an Excel file into a DataFrame.
Arguments
-path
: The path to the Excel file to be read. -sheet
: Specifies the sheet to be read. Can be either the name of the sheet as a string or its index as an integer. If nothing, the first sheet is read. -range
: Specifies a specific range of cells to be read from the sheet. If nothing, the entire sheet is read. -col_names
: Indicates whether the first row of the specified range should be treated as column names. If false, columns will be named automatically. -col_types
: Allows specifying column types explicitly. Can be a single type applied to all columns, a list or a dictionary mapping column names or indices to types. If nothing, types will be inferred. -missingstring
: The value or vector that represents missing values in the Excel file. -trim_ws
: Whether to trim leading and trailing whitespace from cells in the Excel file. -skip
: Number of rows to skip at the beginning of the sheet or range before reading data. -n_max
: The maximum number of rows to read from the sheet or range, after skipping. Inf means read all available rows. -guess_max
: The maximum number of rows to scan for type guessing and column names detection. Only relevant if coltypes is nothing or colnames is true. If nothing, a default heuristic is used.
Examples
julia> df = DataFrame(integers=[1, 2, 3, 4],\n strings=[\"This\", \"Package makes\", \"File reading/writing\", \"even smoother\"],\n floats=[10.2, 20.3, 30.4, 40.5]);\n\njulia> df2 = DataFrame(AA=[\"aa\", \"bb\"], AB=[10.1, 10.2]);\n\njulia> write_xlsx((\"REPORT_A\" => df, \"REPORT_B\" => df2); path=\"xlsxtest.xlsx\", overwrite = true);\n\njulia> read_xlsx(\"xlsxtest.xlsx\", sheet = \"REPORT_A\", skip = 1, n_max = 4, missingstring = [2])\n3\u00d73 DataFrame\n Row \u2502 integers strings floats \n \u2502 Any String Float64 \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 missing Package makes 20.3\n 2 \u2502 3 File reading/writing 30.4\n 3 \u2502 4 even smoother 40.5\n
source
# TidierFiles.write_csv
\u2014 Method.
write_csv(DataFrame, filepath; na = \"\", append = false, col_names = true, missingstring, eol = \"\n
\", num_threads = Threads.nthreads()) Write a DataFrame to a CSV (comma-separated values) file.
Arguments
x
: The DataFrame to write to the CSV file. file
: The path to the output CSV file. missingstring
: = \"\": The string to represent missing values in the output file. Default is an empty string. append
: Whether to append to the file if it already exists. Default is false. col_names
: = true: Whether to write column names as the first line of the file. Default is true. eol
: = \"
\": The end-of-line character to use in the output file. Default is the newline character.
num_threads
= Threads.nthreads(): The number of threads to use for writing the file. Default is the number of available threads.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_csv(df, \"csvtest.csv\");\n
source
# TidierFiles.write_dta
\u2014 Method.
write_dta(df, path)\n
Write a DataFrame to a Stata (.dta) file.
Arguments -df
: The DataFrame to be written to a file. -path
: String as path where the .dta file will be created. If a file at this path already exists, it will be overwritten.
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_dta(df, \"test.dta\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.write_sas
\u2014 Method.
write_sas(df, path)\n
Write a DataFrame to a SAS (.sas7bdat or .xpt) file.
Arguments -df
: The DataFrame to be written to a file. -path
: String as path where the .dta file will be created. If a file at this path already exists, it will be overwritten.
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_sas(df, \"test.sas7bdat\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n\njulia> write_sas(df, \"test.xpt\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.write_sav
\u2014 Method.
write_sav(df, path)\n
Write a DataFrame to a SPSS (.sav or .por) file.
Arguments -df
: The DataFrame to be written to a file. -path
: String as path where the .dta file will be created. If a file at this path already exists, it will be overwritten.
Examples
julia> df = DataFrame(AA=[\"sav\", \"por\"], AB=[10.1, 10.2]);\n\njulia> write_sav(df, \"test.sav\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n\njulia> write_sav(df, \"test.por\")\n2\u00d72 ReadStatTable:\n Row \u2502 AA AB \n \u2502 String Float64? \n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 1 \u2502 sav 10.1\n 2 \u2502 por 10.2\n
source
# TidierFiles.write_table
\u2014 Method.
write_table(x, file; delim = ' ', na, append, col_names, eol, num_threads)\n
Write a DataFrame to a file, allowing for customization of the delimiter and other options.
Arguments
-x
: The DataFrame to write to a file. -file
: The path to the file where the DataFrame will be written. -delim: Character to use as the field delimiter. The default is tab (' '), making it a TSV (tab-separated values) file by default, but can be changed to accommodate other formats. -missingstring
: The string to represent missing data in the output file. -append
: Whether to append to the file if it already exists. If false, the file will be overwritten. -col_names
: Whether to write column names as the first line of the file. If appending to an existing file with append = true, column names will not be written regardless of this parameter's value. -eol
: The end-of-line character to use in the file. Defaults to \" \". -num_threads
: Number of threads to use for writing the file. Uses the number of available Julia threads by default.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_table(df, \"tabletest.txt\");\n
source
# TidierFiles.write_tsv
\u2014 Method.
write_tsv(DataFrame, filepath; na = \"\", append = false, col_names = true, missingstring, eol = \"\n
\", num_threads = Threads.nthreads()) Write a DataFrame to a TSV (tab-separated values) file.
Arguments
x
: The DataFrame to write to the TSV file. file
: The path to the output TSV file. missingstring
: = \"\": The string to represent missing values in the output file. Default is an empty string. append
: Whether to append to the file if it already exists. Default is false. col_names
: = true: Whether to write column names as the first line of the file. Default is true. eol
: = \"
\": The end-of-line character to use in the output file. Default is the newline character.
num_threads
= Threads.nthreads(): The number of threads to use for writing the file. Default is the number of available threads.
Examples
julia> df = DataFrame(ID = 1:5, Name = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eva\"], Score = [88, 92, 77, 85, 95]);\n\njulia> write_tsv(df, \"tsvtest.tsv\");\n
source
# TidierFiles.write_xlsx
\u2014 Method.
write_xlsx(x; path, overwrite)\n
Write a DataFrame, or multiple DataFrames, to an Excel file.
"},{"location":"reference/#arguments-x-the-data-to-write-can-be-a-single-pairstring-dataframe-for-writing-one-sheet-or-a-tuple-of-such-pairs-for-writing-multiple-sheets-the-string-in-each-pair-specifies-the-sheet-name-and-the-dataframe-is-the-data-to-write-to-that-sheet-path-the-path-to-the-excel-file-where-the-data-will-be-written-overwrite-defaults-to-false-whether-to-overwrite-an-existing-file-if-false-an-error-is-thrown-when-attempting-to-write-to-an-existing-file","title":"Arguments -x
: The data to write. Can be a single Pair{String, DataFrame} for writing one sheet, or a Tuple of such pairs for writing multiple sheets. The String in each pair specifies the sheet name, and the DataFrame is the data to write to that sheet. -path
: The path to the Excel file where the data will be written. -overwrite
: Defaults to false. Whether to overwrite an existing file. If false, an error is thrown when attempting to write to an existing file.","text":"Examples
julia> df = DataFrame(integers=[1, 2, 3, 4],\n strings=[\"This\", \"Package makes\", \"File reading/writing\", \"even smoother\"],\n floats=[10.2, 20.3, 30.4, 40.5]);\n\njulia> df2 = DataFrame(AA=[\"aa\", \"bb\"], AB=[10.1, 10.2]);\n\njulia> write_xlsx((\"REPORT_A\" => df, \"REPORT_B\" => df2); path=\"xlsxtest.xlsx\", overwrite = true);\n
source
"},{"location":"reference/#reference-internal-functions","title":"Reference - Internal functions","text":""},{"location":"examples/generated/UserGuide/delim/","title":"Delimited Files","text":"The goal of reading and writing throughout TidierFiles.jl is to use consistent syntax. This functions on this page focus on delimited files and are powered by CSV.jl.
using TidierFiles\n
"},{"location":"examples/generated/UserGuide/delim/#read_csvtsvdelim","title":"read_csv/tsv/delim","text":"read_csv(\"https://raw.githubusercontent.com/TidierOrg/TidierFiles.jl/main/testing_files/csvtest.csv\", skip = 2, n_max = 3, col_select = [\"ID\", \"Score\"], missingstring = [\"4\"])\n\n#read_csv(file; delim=',', col_names=true, skip=0, n_max=Inf, comment=nothing, missingstring=\"\", col_select=nothing, escape_double=true, col_types=nothing, num_threads=1)\n\n#read_tsv(file; delim='\\t', col_names=true, skip=0, n_max=Inf, comment=nothing, missingstring=\"\", col_select=nothing, escape_double=true, col_types=nothing, num_threads=Threads.nthreads())\n\n#read_delim(file; delim='\\t', col_names=true, skip=0, n_max=Inf, comment=nothing, missingstring=\"\", col_select=nothing, escape_double=true, col_types=nothing, num_threads=Threads.nthreads())\n\n#These functions read a delimited file (CSV, TSV, or custom delimiter) into a DataFrame. The arguments are:\n
3\u00d72 DataFrame RowIDScoreInt64?Int6413772missing853595 file
: Path to the file or a URL. delim
: Field delimiter. Default is ',' for read_csv
, '\\t' for read_tsv
and read_delim
. col_names
: Use first row as column names. Can be true
, false
, or an array of strings. Default is true
. skip
: Number of lines to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf
(read all rows). comment
: Character indicating comment lines to ignore. Default is nothing
. missingstring
: String(s) representing missing values. Default is \"\"
. col_select
: Optional vector of symbols or strings to select columns to load. Default is nothing
. escape_double
: Interpret two consecutive quote characters as a single quote. Default is true
. col_types
: Optional specification of column types. Default is nothing
(types are inferred). num_threads
: Number of threads to use for parallel execution. Default is 1 for read_csv
and the number of available threads for read_tsv
and read_delim
.
The functions return a DataFrame containing the parsed data from the file.
"},{"location":"examples/generated/UserGuide/delim/#write_csv-and-write_tsv","title":"write_csv
and # ## write_tsv
","text":"writecsv(x, file; missingstring=\"\", append=false, colnames=true, eol=\"\\n\", num_threads=Threads.nthreads())
writetsv(x, file; missingstring=\"\", append=false, colnames=true, eol=\"\\n\", num_threads=Threads.nthreads())
These functions write a DataFrame to a CSV or TSV file. The arguments are:
x
: The DataFrame to write. file
: The path to the output file. missingstring
: The string to represent missing values. Default is an empty string. append
: Whether to append to an existing file. Default is false
. col_names
: Whether to write column names as the first line. Default is true
. eol
: The end-of-line character. Default is \"\\n\"
. num_threads
: The number of threads to use for writing. Default is the number of available threads.
"},{"location":"examples/generated/UserGuide/delim/#read_table","title":"read_table
","text":"readtable(file; colnames=true, skip=0, nmax=Inf, comment=nothing, colselect=nothing, missingstring=\"\", num_threads)
This function reads a table from a whitespace-delimited file into a DataFrame. The arguments are:
file
: The path to the file to read. col_names
: Whether the first non-skipped line contains column names. Default is true
. skip
: Number of lines to skip before processing. Default is 0. n_max
: Maximum number of lines to read. Default is Inf
(read all lines). comment
: Character or string indicating comment lines to ignore. Default is nothing
. col_select
: Optional vector of symbols or strings to select columns to load. Default is nothing
. missingstring
: The string representing missing values. Default is \"\"
. num_threads
: The number of threads to use for writing. Default is the number of available threads.
"},{"location":"examples/generated/UserGuide/delim/#write_table","title":"write_table
","text":"writetable(x, file; delim='\\t', missingstring=\"\", append=false, colnames=true, eol=\"\\n\", num_threads=Threads.nthreads())
This function writes a DataFrame to a file with customizable delimiter and options. The arguments are:
x
: The DataFrame to write. file
: The path to the output file. delim
: The field delimiter. Default is '\\t'
(tab-separated). missingstring
: The string to represent missing values. Default is \"\"
. append
: Whether to append to an existing file. Default is false
. col_names
: Whether to write column names as the first line. Default is true
. eol
: The end-of-line character. Default is \"\\n\"
. num_threads
: The number of threads to use for writing. Default is the number of available threads.
This page was generated using Literate.jl.
"},{"location":"examples/generated/UserGuide/stats/","title":"Stats Files","text":"The functions for reading and writing stats files are made possible by ReadStatTables.jl
"},{"location":"examples/generated/UserGuide/stats/#reading-stats-files","title":"reading stats files","text":"readdta(filepath; encoding=nothing, colselect=nothing, skip=0, nmax=Inf, numthreads=1) readsas(filepath; encoding=nothing, colselect=nothing, skip=0, nmax=Inf, numthreads=1) readsav(filepath; encoding=nothing, colselect=nothing, skip=0, nmax=Inf, numthreads=1)
These functions read data from Stata (.dta), SAS (.sas7bdat and .xpt), and SPSS (.sav and .por) files into a DataFrame. The arguments are:
filepath
: The path to the file or a URL pointing to the file. If a URL is provided, the file will be downloaded and then read. encoding
: Optional; specifies the encoding of the input file. Default is the package's or function's default. col_select
: Optional; allows specifying a subset of columns to read. Can be a vector of column names or indices. Default is nothing
(all columns are read). skip
: Number of rows to skip at the beginning of the file. Default is 0. n_max
: Maximum number of rows to read after skipping. Default is Inf
(read all rows). num_threads
: Number of concurrent tasks or threads to use for processing. Default is 1.
"},{"location":"examples/generated/UserGuide/stats/#writing-stats-files","title":"writing stats files","text":"writesav(df, path) writesas(df, path) write_dta(df, path)
These functions write a DataFrame to SPSS (.sav or .por), SAS (.sas7bdat or .xpt), and Stata (.dta) files. The arguments are:
df
: The DataFrame to be written to a file. path
: The path where the file will be created. If a file at this path already exists, it will be overwritten.
This page was generated using Literate.jl.
"},{"location":"examples/generated/UserGuide/xl/","title":"Excel Files","text":"Reading and writing XLSX files are made possible by XLSX.jl
"},{"location":"examples/generated/UserGuide/xl/#read_xlsx","title":"read_xlsx
","text":"readxlsx(path; sheet=nothing, range=nothing, colnames=true, coltypes=nothing, missingstring=\"\", trimws=true, skip=0, nmax=Inf, guessmax=nothing)
This function reads data from an Excel file into a DataFrame. The arguments are:
path
: The path or URL to the Excel file to be read. sheet
: The sheet to be read. Can be a sheet name (string) or index (integer). Default is the first sheet. range
: A specific range of cells to be read from the sheet. Default is the entire sheet. col_names
: Whether the first row of the range contains column names. Default is true
. col_types
: Explicit specification of column types. Can be a single type, a list, or a dictionary mapping column names or indices to types. Default is nothing
(types are inferred). missingstring
: The string representing missing values. Default is \"\"
. trim_ws
: Whether to trim leading and trailing whitespace from cells. Default is true
. skip
: Number of rows to skip before reading data. Default is 0. n_max
: Maximum number of rows to read. Default is Inf
(read all rows). guess_max
: Maximum number of rows to scan for type guessing and column names detection. Default is nothing
(a default heuristic is used).
"},{"location":"examples/generated/UserGuide/xl/#write_xlsx","title":"write_xlsx
","text":"write_xlsx(x; path, overwrite=false)
This function writes a DataFrame, or multiple DataFrames, to an Excel file. The arguments are:
x
: The data to write. Can be a single Pair{String, DataFrame}
for writing one sheet, or a Tuple
of such pairs for writing multiple sheets. The String
in each pair specifies the sheet name, and the DataFrame
is the data to write to that sheet. path
: The path to the output Excel file. overwrite
: Whether to overwrite an existing file. Default is false
.
This page was generated using Literate.jl.
"}]}
\ No newline at end of file
diff --git a/previews/PR4/sitemap.xml.gz b/previews/PR4/sitemap.xml.gz
index ea1ade2..7e678cc 100644
Binary files a/previews/PR4/sitemap.xml.gz and b/previews/PR4/sitemap.xml.gz differ