Selecting rows
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0491718 | 0.691857 | 0.840384 | 0.198521 | 0.802561 |
| 2 | 0.119079 | 0.767518 | 0.89077 | 0.00819786 | 0.661425 |
| 3 | 0.393271 | 0.087253 | 0.138227 | 0.592041 | 0.347513 |
| 4 | 0.0240943 | 0.855718 | 0.347737 | 0.801055 | 0.778149 |
using : as row selector will copy columns
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0491718 | 0.691857 | 0.840384 | 0.198521 | 0.802561 |
| 2 | 0.119079 | 0.767518 | 0.89077 | 0.00819786 | 0.661425 |
| 3 | 0.393271 | 0.087253 | 0.138227 | 0.592041 | 0.347513 |
| 4 | 0.0240943 | 0.855718 | 0.347737 | 0.801055 | 0.778149 |
this is the same as
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0491718 | 0.691857 | 0.840384 | 0.198521 | 0.802561 |
| 2 | 0.119079 | 0.767518 | 0.89077 | 0.00819786 | 0.661425 |
| 3 | 0.393271 | 0.087253 | 0.138227 | 0.592041 | 0.347513 |
| 4 | 0.0240943 | 0.855718 | 0.347737 | 0.801055 | 0.778149 |
you can get a subset of rows of a data frame without copying using view to get a SubDataFrame
| Row | x1 | x2 | x3 |
|---|
| Float64 | Float64 | Float64 |
|---|
| 1 | 0.0491718 | 0.691857 | 0.840384 |
| 2 | 0.119079 | 0.767518 | 0.89077 |
| 3 | 0.393271 | 0.087253 | 0.138227 |
you still have a detailed reference to the parent
(4×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼─────────────────────────────────────────────────────
1 │ 0.0491718 0.691857 0.840384 0.198521 0.802561
2 │ 0.119079 0.767518 0.89077 0.00819786 0.661425
3 │ 0.393271 0.087253 0.138227 0.592041 0.347513
4 │ 0.0240943 0.855718 0.347737 0.801055 0.778149, (1:3, 1:3))
selecting a single row returns a DataFrameRow object which is also a view
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 3 | 0.393271 | 0.087253 | 0.138227 | 0.592041 | 0.347513 |
(4×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼─────────────────────────────────────────────────────
1 │ 0.0491718 0.691857 0.840384 0.198521 0.802561
2 │ 0.119079 0.767518 0.89077 0.00819786 0.661425
3 │ 0.393271 0.087253 0.138227 0.592041 0.347513
4 │ 0.0240943 0.855718 0.347737 0.801055 0.778149, (3, Base.OneTo(5)), 3)
let us add a column to a data frame by assigning a scalar broadcasting
4-element Vector{Int64}:
1
1
1
1
| Row | x1 | x2 | x3 | x4 | x5 | Z |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 | Int64 |
|---|
| 1 | 0.0491718 | 0.691857 | 0.840384 | 0.198521 | 0.802561 | 1 |
| 2 | 0.119079 | 0.767518 | 0.89077 | 0.00819786 | 0.661425 | 1 |
| 3 | 0.393271 | 0.087253 | 0.138227 | 0.592041 | 0.347513 | 1 |
| 4 | 0.0240943 | 0.855718 | 0.347737 | 0.801055 | 0.778149 | 1 |
Earlier we used : for column selection in a view (SubDataFrame and DataFrameRow). In this case a view will have all columns of the parent after the parent is mutated.
| Row | x1 | x2 | x3 | x4 | x5 | Z |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 | Int64 |
|---|
| 3 | 0.393271 | 0.087253 | 0.138227 | 0.592041 | 0.347513 | 1 |
(4×6 DataFrame
Row │ x1 x2 x3 x4 x5 Z
│ Float64 Float64 Float64 Float64 Float64 Int64
─────┼────────────────────────────────────────────────────────────
1 │ 0.0491718 0.691857 0.840384 0.198521 0.802561 1
2 │ 0.119079 0.767518 0.89077 0.00819786 0.661425 1
3 │ 0.393271 0.087253 0.138227 0.592041 0.347513 1
4 │ 0.0240943 0.855718 0.347737 0.801055 0.778149 1, (3, Base.OneTo(6)), 3)
Note that parent and parentindices refer to the true source of data for a DataFrameRow and rownumber refers to row number in the direct object that was used to create DataFrameRow
(4×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
4 │ 4, (3, Base.OneTo(1)), 1)
Reordering rows
We create some random data frame (and hope that x.x is not sorted :), which is quite likely with 12 rows)
| Row | id | x | y |
|---|
| Int64 | Float64 | Float64 |
|---|
| 1 | 1 | 0.830334 | 0.0 |
| 2 | 2 | 0.573132 | 0.0 |
| 3 | 3 | 0.176625 | 0.0 |
| 4 | 4 | 0.114935 | 0.0 |
| 5 | 5 | 0.7864 | 0.0 |
| 6 | 6 | 0.892598 | 0.0 |
| 7 | 7 | 0.452015 | 1.0 |
| 8 | 8 | 0.206873 | 1.0 |
| 9 | 9 | 0.286582 | 1.0 |
| 10 | 10 | 0.918916 | 1.0 |
| 11 | 11 | 0.991071 | 1.0 |
| 12 | 12 | 0.796831 | 1.0 |
check if a DataFrame or a subset of its columns is sorted
we sort x in place
| Row | id | x | y |
|---|
| Int64 | Float64 | Float64 |
|---|
| 1 | 4 | 0.114935 | 0.0 |
| 2 | 3 | 0.176625 | 0.0 |
| 3 | 8 | 0.206873 | 1.0 |
| 4 | 9 | 0.286582 | 1.0 |
| 5 | 7 | 0.452015 | 1.0 |
| 6 | 2 | 0.573132 | 0.0 |
| 7 | 5 | 0.7864 | 0.0 |
| 8 | 12 | 0.796831 | 1.0 |
| 9 | 1 | 0.830334 | 0.0 |
| 10 | 6 | 0.892598 | 0.0 |
| 11 | 10 | 0.918916 | 1.0 |
| 12 | 11 | 0.991071 | 1.0 |
now we create a new DataFrame
| Row | id | x | y |
|---|
| Int64 | Float64 | Float64 |
|---|
| 1 | 1 | 0.830334 | 0.0 |
| 2 | 2 | 0.573132 | 0.0 |
| 3 | 3 | 0.176625 | 0.0 |
| 4 | 4 | 0.114935 | 0.0 |
| 5 | 5 | 0.7864 | 0.0 |
| 6 | 6 | 0.892598 | 0.0 |
| 7 | 7 | 0.452015 | 1.0 |
| 8 | 8 | 0.206873 | 1.0 |
| 9 | 9 | 0.286582 | 1.0 |
| 10 | 10 | 0.918916 | 1.0 |
| 11 | 11 | 0.991071 | 1.0 |
| 12 | 12 | 0.796831 | 1.0 |
here we sort by two columns, first is decreasing, second is increasing
| Row | id | x | y |
|---|
| Int64 | Float64 | Float64 |
|---|
| 1 | 8 | 0.206873 | 1.0 |
| 2 | 9 | 0.286582 | 1.0 |
| 3 | 7 | 0.452015 | 1.0 |
| 4 | 12 | 0.796831 | 1.0 |
| 5 | 10 | 0.918916 | 1.0 |
| 6 | 11 | 0.991071 | 1.0 |
| 7 | 4 | 0.114935 | 0.0 |
| 8 | 3 | 0.176625 | 0.0 |
| 9 | 2 | 0.573132 | 0.0 |
| 10 | 5 | 0.7864 | 0.0 |
| 11 | 1 | 0.830334 | 0.0 |
| 12 | 6 | 0.892598 | 0.0 |
| Row | id | x | y |
|---|
| Int64 | Float64 | Float64 |
|---|
| 1 | 8 | 0.206873 | 1.0 |
| 2 | 9 | 0.286582 | 1.0 |
| 3 | 7 | 0.452015 | 1.0 |
| 4 | 12 | 0.796831 | 1.0 |
| 5 | 10 | 0.918916 | 1.0 |
| 6 | 11 | 0.991071 | 1.0 |
| 7 | 4 | 0.114935 | 0.0 |
| 8 | 3 | 0.176625 | 0.0 |
| 9 | 2 | 0.573132 | 0.0 |
| 10 | 5 | 0.7864 | 0.0 |
| 11 | 1 | 0.830334 | 0.0 |
| 12 | 6 | 0.892598 | 0.0 |
this is how you can shuffle rows
| Row | id | x | y |
|---|
| Int64 | Float64 | Float64 |
|---|
| 1 | 8 | 0.206873 | 1.0 |
| 2 | 12 | 0.796831 | 1.0 |
| 3 | 2 | 0.573132 | 0.0 |
| 4 | 1 | 0.830334 | 0.0 |
| 5 | 5 | 0.7864 | 0.0 |
| 6 | 9 | 0.286582 | 1.0 |
| 7 | 6 | 0.892598 | 0.0 |
| 8 | 4 | 0.114935 | 0.0 |
| 9 | 3 | 0.176625 | 0.0 |
| 10 | 7 | 0.452015 | 1.0 |
it is also easy to swap rows using broadcasted assignment
| Row | id | x | y |
|---|
| Int64 | Float64 | Float64 |
|---|
| 1 | 10 | 0.918916 | 1.0 |
| 2 | 2 | 0.573132 | 0.0 |
| 3 | 3 | 0.176625 | 0.0 |
| 4 | 4 | 0.114935 | 0.0 |
| 5 | 5 | 0.7864 | 0.0 |
| 6 | 6 | 0.892598 | 0.0 |
| 7 | 7 | 0.452015 | 1.0 |
| 8 | 8 | 0.206873 | 1.0 |
| 9 | 9 | 0.286582 | 1.0 |
| 10 | 1 | 0.830334 | 0.0 |
| 11 | 11 | 0.991071 | 1.0 |
| 12 | 12 | 0.796831 | 1.0 |
Merging/adding rows
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
merge by rows - data frames must have the same column names; the same is vcat
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
you can efficiently vcat a vector of DataFrames using reduce
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 7 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 8 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 9 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
get y with other order of names
| Row | x5 | x4 | x3 | x2 | x1 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.306016 | 0.140855 | 0.338402 | 0.218366 | 0.0294498 |
| 2 | 0.843511 | 0.4 | 0.0526195 | 0.52931 | 0.271436 |
| 3 | 0.896884 | 0.321968 | 0.188894 | 0.38624 | 0.32389 |
vcat is still possible as it does column name matching
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
but column names must still match
ArgumentError("column(s) x1 and x2 are missing from argument(s) 2")
unless you pass :intersect, :union or specific column names as keyword argument cols
| Row | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 |
|---|
| 1 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.188894 | 0.321968 | 0.896884 |
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64? | Float64? | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | missing | missing | 0.338402 | 0.140855 | 0.306016 |
| 5 | missing | missing | 0.0526195 | 0.4 | 0.843511 |
| 6 | missing | missing | 0.188894 | 0.321968 | 0.896884 |
| Row | x1 | x5 |
|---|
| Float64? | Float64 |
|---|
| 1 | 0.0294498 | 0.306016 |
| 2 | 0.271436 | 0.843511 |
| 3 | 0.32389 | 0.896884 |
| 4 | missing | 0.306016 |
| 5 | missing | 0.843511 |
| 6 | missing | 0.896884 |
append! modifies x in place
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
here column names must match exactly unless cols keyword argument is passed
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 7 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 8 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 9 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
standard repeat function works on rows; also inner and outer keyword arguments are accepted
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 7 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 8 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 9 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 10 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 11 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 12 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 13 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 14 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 15 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 16 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 17 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 18 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
push! adds one row to x at the end; one must pass a correct number of values unless cols keyword argument is passed
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 7 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 8 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 9 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 10 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 |
push! also works with dictionaries
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 7 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 8 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 9 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 10 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 |
| 11 | 11.0 | 12.0 | 13.0 | 14.0 | 15.0 |
and NamedTuples via name matching
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 7 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 8 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 9 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 10 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 |
| 11 | 11.0 | 12.0 | 13.0 | 14.0 | 15.0 |
| 12 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 |
and DataFrameRow also via name matching
| Row | x1 | x2 | x3 | x4 | x5 |
|---|
| Float64 | Float64 | Float64 | Float64 | Float64 |
|---|
| 1 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 2 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 3 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 4 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 5 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 6 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 7 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
| 8 | 0.271436 | 0.52931 | 0.0526195 | 0.4 | 0.843511 |
| 9 | 0.32389 | 0.38624 | 0.188894 | 0.321968 | 0.896884 |
| 10 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 |
| 11 | 11.0 | 12.0 | 13.0 | 14.0 | 15.0 |
| 12 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 |
| 13 | 0.0294498 | 0.218366 | 0.338402 | 0.140855 | 0.306016 |
Please consult the documentation of push!, append! and vcat for allowed values of cols keyword argument.
This keyword argument governs the way these functions perform column matching of passed arguments. Also append! and push! support a promote keyword argument that decides if column type promotion is allowed.
Let us here just give a quick example of how heterogeneous data can be stored in the data frame using these functionalities:
3-element Vector{NamedTuple}:
(a = 1, b = 2)
(a = missing, b = 10, c = 20)
(b = "s", c = 1, d = 1)
| Row | a | b | c | d |
|---|
| Int64? | Any | Int64? | Int64? |
|---|
| 1 | 1 | 2 | missing | missing |
| 2 | missing | 10 | 20 | missing |
| 3 | missing | s | 1 | 1 |
and we see that push! dynamically added columns as needed and updated their element types
Subsetting/removing rows
| Row | id | val |
|---|
| Int64 | Char |
|---|
| 1 | 1 | a |
| 2 | 2 | b |
| 3 | 3 | c |
| 4 | 4 | d |
| 5 | 5 | e |
| 6 | 6 | f |
| 7 | 7 | g |
| 8 | 8 | h |
| 9 | 9 | i |
| 10 | 10 | j |
by using indexing
a single row selection creates a DataFrameRow
while this is a DataFrame
this is a view
selects columns 1 and 2
| Row | id | val |
|---|
| Int64 | Char |
|---|
| 1 | 1 | a |
| 2 | 2 | b |
| 3 | 3 | c |
| 4 | 4 | d |
| 5 | 5 | e |
| 6 | 6 | f |
| 7 | 7 | g |
| 8 | 8 | h |
| 9 | 9 | i |
| 10 | 10 | j |
indexing by a Bool array, exact length match is required
| Row | id | val |
|---|
| Int64 | Char |
|---|
| 1 | 1 | a |
| 2 | 3 | c |
| 3 | 5 | e |
| 4 | 7 | g |
| 5 | 9 | i |
alternatively we can also create a view
| Row | id | val |
|---|
| Int64 | Char |
|---|
| 1 | 1 | a |
| 2 | 3 | c |
| 3 | 5 | e |
| 4 | 7 | g |
| 5 | 9 | i |
we can delete one row in place
| Row | id | val |
|---|
| Int64 | Char |
|---|
| 1 | 1 | a |
| 2 | 2 | b |
| 3 | 3 | c |
| 4 | 4 | d |
| 5 | 5 | e |
| 6 | 6 | f |
| 7 | 8 | h |
| 8 | 9 | i |
| 9 | 10 | j |
or a collection of rows, also in place
| Row | id | val |
|---|
| Int64 | Char |
|---|
| 1 | 1 | a |
| 2 | 2 | b |
| 3 | 3 | c |
| 4 | 4 | d |
| 5 | 5 | e |
| 6 | 9 | i |
| 7 | 10 | j |
you can also create a new DataFrame when deleting rows using Not indexing
| Row | id | val |
|---|
| Int64 | Char |
|---|
| 1 | 3 | c |
| 2 | 4 | d |
| 3 | 5 | e |
| 4 | 9 | i |
| 5 | 10 | j |
| Row | id | val |
|---|
| Int64 | Char |
|---|
| 1 | 1 | a |
| 2 | 2 | b |
| 3 | 3 | c |
| 4 | 4 | d |
| 5 | 5 | e |
| 6 | 9 | i |
| 7 | 10 | j |
now we move to row filtering
| Row | x1 | x2 | x3 |
|---|
| Int64 | Int64 | Int64 |
|---|
| 1 | 1 | 2 | 3 |
| 2 | 2 | 3 | 4 |
| 3 | 3 | 4 | 5 |
| 4 | 4 | 5 | 6 |
create a new DataFrame where filtering function operates on DataFrameRow
| Row | x1 | x2 | x3 |
|---|
| Int64 | Int64 | Int64 |
|---|
| 1 | 3 | 4 | 5 |
| 2 | 4 | 5 | 6 |
the same but as a view
| Row | x1 | x2 | x3 |
|---|
| Int64 | Int64 | Int64 |
|---|
| 1 | 3 | 4 | 5 |
| 2 | 4 | 5 | 6 |
or
| Row | x1 | x2 | x3 |
|---|
| Int64 | Int64 | Int64 |
|---|
| 1 | 3 | 4 | 5 |
| 2 | 4 | 5 | 6 |
in place modification of x, using the do-block syntax for a more complex transformation
| Row | x1 | x2 | x3 |
|---|
| Int64 | Int64 | Int64 |
|---|
| 1 | 1 | 2 | 3 |
| 2 | 3 | 4 | 5 |
A common operation is selection of rows for which a value in a column is contained in a given set. Here are a few ways in which you can achieve this.
| Row | x | y |
|---|
| Int64 | Int64 |
|---|
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 1 |
| 6 | 6 | 2 |
| 7 | 7 | 3 |
| 8 | 8 | 4 |
| 9 | 9 | 1 |
| 10 | 10 | 2 |
| 11 | 11 | 3 |
| 12 | 12 | 4 |
We select rows for which column y has value 1 or 4.
| Row | x | y |
|---|
| Int64 | Int64 |
|---|
| 1 | 1 | 1 |
| 2 | 4 | 4 |
| 3 | 5 | 1 |
| 4 | 8 | 4 |
| 5 | 9 | 1 |
| 6 | 12 | 4 |
| Row | x | y |
|---|
| Int64 | Int64 |
|---|
| 1 | 1 | 1 |
| 2 | 4 | 4 |
| 3 | 5 | 1 |
| 4 | 8 | 4 |
| 5 | 9 | 1 |
| 6 | 12 | 4 |
| Row | x | y |
|---|
| Int64 | Int64 |
|---|
| 1 | 1 | 1 |
| 2 | 4 | 4 |
| 3 | 5 | 1 |
| 4 | 8 | 4 |
| 5 | 9 | 1 |
| 6 | 12 | 4 |
DataFrames.jl also provides a subset function that works on whole columns and allows for multiple conditions:
| Row | x1 | x2 | x3 |
|---|
| Int64 | Int64 | Int64 |
|---|
| 1 | 1 | 2 | 3 |
| 2 | 2 | 3 | 4 |
| 3 | 3 | 4 | 5 |
| 4 | 4 | 5 | 6 |
| Row | x1 | x2 | x3 |
|---|
| Int64 | Int64 | Int64 |
|---|
| 1 | 1 | 2 | 3 |
Similarly an in-place subset! function is provided.