miguelraz · storopoli · Apr 18, 2021 · Apr 18, 2021 · Apr 18, 2021
diff --git a/literate_notebooks/src-PT-BR/01_constructors.jl b/literate_notebooks/src-PT-BR/01_constructors.jl
@@ -0,0 +1,143 @@
+# # Introduction to DataFrames
+# **[Bogumił Kamiński](http://bogumilkaminski.pl/about/), May 23, 2018**
+# 
+# Let's get started by loading the `DataFrames` package.
+
+using DataFrames
+
+# ## Constructors and conversion
+
+#-
+
+# ### Constructors
+# 
+# In this section, you'll see many ways to create a `DataFrame` using the `DataFrame()` constructor.
+# 
+# First, we could create an empty DataFrame,
+
+DataFrame() # empty DataFrame
+
+# Or we could call the constructor using keyword arguments to add columns to the `DataFrame`.
+
+DataFrame(A=1:3, B=rand(3), C=randstring.([3,3,3]))
+
+# We can create a `DataFrame` from a dictionary, in which case keys from the dictionary will be sorted to create the `DataFrame` columns.
+
+x = Dict("A" => [1,2], "B" => [true, false], "C" => ['a', 'b'])
+DataFrame(x)
+
+# Rather than explicitly creating a dictionary first, as above, we could pass `DataFrame` arguments with the syntax of dictionary key-value pairs. 
+# 
+# Note that in this case, we use symbols to denote the column names and arguments are not sorted. For example, `:A`, the symbol, produces `A`, the name of the first column here:
+
+DataFrame(:A => [1,2], :B => [true, false], :C => ['a', 'b'])
+
+# Here we create a `DataFrame` from a vector of vectors, and each vector becomes a column.
+
+DataFrame([rand(3) for i in 1:3])
+
+#  For now we can construct a single `DataFrame` from a `Vector` of atoms, creating a `DataFrame` with a single row. In future releases of DataFrames.jl, this will throw an error.
+
+DataFrame(rand(3))
+
+# Instead use a transposed vector if you have a vector of atoms (in this way you effectively pass a two dimensional array to the constructor which is supported).
+
+DataFrame(transpose([1, 2, 3]))
+
+# Pass a second argument to give the columns names.
+
+DataFrame([1:3, 4:6, 7:9], [:A, :B, :C])
+
+# Here we create a `DataFrame` from a matrix,
+
+DataFrame(rand(3,4))
+
+# and here we do the same but also pass column names.
+
+DataFrame(rand(3,4), Symbol.('a':'d'))
+
+# We can also construct an uninitialized DataFrame.
+# 
+# Here we pass column types, names and number of rows; we get `missing` in column :C because `Any >: Missing`.
+
+DataFrame([Int, Float64, Any], [:A, :B, :C], 1)
+
+# Here we create a `DataFrame`, but column `:C` is #undef and Jupyter has problem with displaying it. (This works OK at the REPL.)
+# 
+# This will be fixed in next release of DataFrames!
+
+DataFrame([Int, Float64, String], [:A, :B, :C], 1)
+
+# To initialize a `DataFrame` with column names, but no rows use
+
+DataFrame([Int, Float64, String], [:A, :B, :C], 0) 
+
+# This syntax gives us a quick way to create homogenous `DataFrame`.
+
+DataFrame(Int, 3, 5)
+
+# This example is similar, but has nonhomogenous columns.
+
+DataFrame([Int, Float64], 4)
+
+# Finally, we can create a `DataFrame` by copying an existing `DataFrame`.
+# 
+# Note that `copy` creates a shallow copy.
+
+y = DataFrame(x)
+z = copy(x)
+(x === y), (x === z), isequal(x, z)
+
+# ### Conversion to a matrix
+# 
+# Let's start by creating a `DataFrame` with two rows and two columns.
+
+x = DataFrame(x=1:2, y=["A", "B"])
+
+# We can create a matrix by passing this `DataFrame` to `Matrix`.
+
+Matrix(x)
+
+# This would work even if the `DataFrame` had some `missing`s:
+
+x = DataFrame(x=1:2, y=[missing,"B"])
+
+#-
+
+Matrix(x)
+
+# In the two previous matrix examples, Julia created matrices with elements of type `Any`. We can see more clearly that the type of matrix is inferred when we pass, for example, a `DataFrame` of integers to `Matrix`, creating a 2D `Array` of `Int64`s:
+
+x = DataFrame(x=1:2, y=3:4)
+
+#-
+
+Matrix(x)
+
+# In this next example, Julia correctly identifies that `Union` is needed to express the type of the resulting `Matrix` (which contains `missing`s).
+
+x = DataFrame(x=1:2, y=[missing,4])
+
+#-
+
+Matrix(x)
+
+# Note that we can't force a conversion of `missing` values to `Int`s!
+
+Matrix{Int}(x)
+
+# ### Handling of duplicate column names
+# 
+# We can pass the `makeunique` keyword argument to allow passing duplicate names (they get deduplicated)
+
+df = DataFrame(:a=>1, :a=>2, :a_1=>3; makeunique=true)
+
+# Otherwise, duplicates will not be allowed in the future.
+
+df = DataFrame(:a=>1, :a=>2, :a_1=>3)
+
+# A constructor that is passed column names as keyword arguments is a corner case.
+# You cannot pass `makeunique` to allow duplicates here.
+
+df = DataFrame(a=1, a=2, makeunique=true)
+
diff --git a/literate_notebooks/src-PT-BR/02_basicinfo.jl b/literate_notebooks/src-PT-BR/02_basicinfo.jl
@@ -0,0 +1,76 @@
+# # Introduction to DataFrames
+# **[Bogumił Kamiński](http://bogumilkaminski.pl/about/), May 23, 2018**
+
+using DataFrames # load package
+
+# ## Getting basic information about a data frame
+# 
+# Let's start by creating a `DataFrame` object, `x`, so that we can learn how to get information on that data frame.
+
+x = DataFrame(A = [1, 2], B = [1.0, missing], C = ["a", "b"])
+
+# The standard `size` function works to get dimensions of the `DataFrame`,
+
+size(x), size(x, 1), size(x, 2)
+
+# as well as `nrow` and `ncol` from R; `length` gives number of columns.
+
+nrow(x), ncol(x), length(x)
+
+# `describe` gives basic summary statistics of data in your `DataFrame`.
+
+describe(x)
+
+# Use `showcols` to get informaton about columns stored in a DataFrame.
+
+showcols(x)
+
+# `names` will return the names of all columns,
+
+names(x)
+
+# and `eltypes` returns their types.
+
+eltypes(x)
+
+# Here we create some large DataFrame
+
+y = DataFrame(rand(1:10, 1000, 10));
+
+# and then we can use `head` to peek into its top rows
+
+head(y)
+
+# and `tail` to see its bottom rows.
+
+tail(y, 3)
+
+# ### Most elementary get and set operations
+# 
+# Given the `DataFrame`, `x`, here are three ways to grab one of its columns as a `Vector`:
+
+x[1], x[:A], x[:, 1]
+
+# To grab one row as a DataFrame, we can index as follows.
+
+x[1, :]
+
+# We can grab a single cell or element with the same syntax to grab an element of an array.
+
+x[1, 1]
+
+# Assignment can be done in ranges to a scalar,
+
+x[1:2, 1:2] = 1
+x
+
+# to a vector of length equal to the number of assigned rows,
+
+x[1:2, 1:2] = [1,2]
+x
+
+# or to another data frame of matching size.
+
+x[1:2, 1:2] = DataFrame([5 6; 7 8])
+x
+
diff --git a/literate_notebooks/src-PT-BR/03_missingvalues.jl b/literate_notebooks/src-PT-BR/03_missingvalues.jl
@@ -0,0 +1,112 @@
+# # Introduction to DataFrames
+# **[Bogumił Kamiński](http://bogumilkaminski.pl/about/), May 23, 2018**
+
+using DataFrames # load package
+
+# ## Handling missing values
+# 
+# A singelton type `Missings.Missing` allows us to deal with missing values.
+
+missing, typeof(missing)
+
+# Arrays automatically create an appropriate union type.
+
+x = [1, 2, missing, 3]
+
+# `ismissing` checks if passed value is missing.
+
+ismissing(1), ismissing(missing), ismissing(x), ismissing.(x)
+
+# We can extract the type combined with Missing from a `Union` via
+# 
+# (This is useful for arrays!)
+
+eltype(x), Missings.T(eltype(x))
+
+# `missing` comparisons produce `missing`.
+
+missing == missing, missing != missing, missing < missing
+
+# This is also true when `missing`s are compared with values of other types.
+
+1 == missing, 1 != missing, 1 < missing
+
+# `isequal`, `isless`, and `===` produce results of type `Bool`.
+
+isequal(missing, missing), missing === missing, isequal(1, missing), isless(1, missing)
+
+# In the next few examples, we see that many (not all) functions handle `missing`.
+
+map(x -> x(missing), [sin, cos, zero, sqrt]) # part 1
+
+#-
+
+map(x -> x(missing, 1), [+, - , *, /, div]) # part 2 
+
+#-
+
+map(x -> x([1,2,missing]), [minimum, maximum, extrema, mean, any, float]) # part 3
+
+# `skipmissing` returns iterator skipping missing values. We can use `collect` and `skipmissing` to create an array that excludes these missing values.
+
+collect(skipmissing([1, missing, 2, missing]))
+
+# Similarly, here we combine `collect` and `Missings.replace` to create an array that replaces all missing values with some value (`NaN` in this case).
+
+collect(Missings.replace([1.0, missing, 2.0, missing], NaN))
+
+# Another way to do this:
+
+coalesce.([1.0, missing, 2.0, missing], NaN)
+
+# Caution: `nothing` would also be replaced here (for Julia 0.7 a more sophisticated behavior of `coalesce` that allows to avoid this problem is planned).
+
+coalesce.([1.0, missing, nothing, missing], NaN)
+
+# You can use `recode` if you have homogenous output types.
+
+recode([1.0, missing, 2.0, missing], missing=>NaN)
+
+# You can use `unique` or `levels` to get unique values with or without missings, respectively.
+
+unique([1, missing, 2, missing]), levels([1, missing, 2, missing])
+
+# In this next example, we convert `x` to `y` with `allowmissing`, where `y` has a type that accepts missings.
+
+x = [1,2,3]
+y = allowmissing(x)
+
+# Then, we convert back with `disallowmissing`. This would fail if `y` contained missing values!
+
+z = disallowmissing(y)
+x,y,z
+
+# In this next example, we show that the type of each column in `x` is initially `Int64`. After using `allowmissing!` to accept missing values in columns 1 and 3, the types of those columns become `Union`s of `Int64` and `Missings.Missing`.
+
+x = DataFrame(Int, 2, 3)
+println("Before: ", eltypes(x))
+allowmissing!(x, 1) # make first column accept missings
+allowmissing!(x, :x3) # make :x3 column accept missings
+println("After: ", eltypes(x))
+
+# In this next example, we'll use `completecases` to find all the rows of a `DataFrame` that have complete data.
+
+x = DataFrame(A=[1, missing, 3, 4], B=["A", "B", missing, "C"])
+println(x)
+println("Complete cases:\n", completecases(x))
+
+# We can use `dropmissing` or `dropmissing!` to remove the rows with incomplete data from a `DataFrame` and either create a new `DataFrame` or mutate the original in-place.
+
+y = dropmissing(x)
+dropmissing!(x)
+[x, y]
+
+# When we call `showcols` on a `DataFrame` with dropped missing values, the columns still allow missing values.
+
+showcols(x)
+
+# Since we've excluded missing values, we can safely use `disallowmissing!` so that the columns will no longer accept missing values.
+
+disallowmissing!(x)
+showcols(x)
+
diff --git a/literate_notebooks/src-PT-BR/04_loadsave.jl b/literate_notebooks/src-PT-BR/04_loadsave.jl
@@ -0,0 +1,64 @@
+# # Introduction to DataFrames
+# **[Bogumił Kamiński](http://bogumilkaminski.pl/about/), May 23, 2018**
+
+using DataFrames # load package
+
+# ## Load and save DataFrames
+# We do not cover all features of the packages. Please refer to their documentation to learn them.
+# 
+# Here we'll load `CSV` to read and write CSV files and `JLD`, which allows us to work with a Julia native binary format.
+
+using CSV
+using JLD
+
+# Let's create a simple `DataFrame` for testing purposes,
+
+x = DataFrame(A=[true, false, true], B=[1, 2, missing],
+              C=[missing, "b", "c"], D=['a', missing, 'c'])
+
+
+# and use `eltypes` to look at the columnwise types.
+
+eltypes(x)
+
+# Let's use `CSV` to save `x` to disk; make sure `x.csv` does not conflict with some file in your working directory.
+
+CSV.write("x.csv", x)
+
+# Now we can see how it was saved by reading `x.csv`.
+
+print(read("x.csv", String))
+
+# We can also load it back. `use_mmap=false` disables memory mapping so that on Windows the file can be deleted in the same session.
+
+y = CSV.read("x.csv", use_mmap=false)
+
+# When loading in a `DataFrame` from a `CSV`, all columns allow `Missing` by default. Note that the column types have changed!
+
+eltypes(y)
+
+# Now let's save `x` to a file in a binary format; make sure that `x.jld` does not exist in your working directory.
+
+save("x.jld", "x", x)
+
+# After loading in `x.jld` as `y`, `y` is identical to `x`.
+
+y = load("x.jld", "x")
+
+# Note that the column types of `y` are the same as those of `x`!
+
+eltypes(y)
+
+# Next, we'll create the files `bigdf.csv` and `bigdf.jld`, so be careful that you don't already have these files on disc!
+# 
+# In particular, we'll time how long it takes us to write a `DataFrame` with 10^3 rows and 10^5 columns to `.csv` and `.jld` files.  *You can expect JLD to be faster!* Use `compress=true` to reduce file sizes.
+
+bigdf = DataFrame(Bool, 10^3, 10^2)
+@time CSV.write("bigdf.csv", bigdf)
+@time save("bigdf.jld", "bigdf", bigdf)
+getfield.(stat.(["bigdf.csv", "bigdf.jld"]), :size)
+
+# Finally, let's clean up. Do not run the next cell unless you are sure that it will not erase your important files.
+
+foreach(rm, ["x.csv", "x.jld", "bigdf.csv", "bigdf.jld"])
+