{ "cells": [ { "cell_type": "markdown", "source": [ "# Basic information about a data frame" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "using DataFrames" ], "metadata": {}, "execution_count": 1 }, { "cell_type": "markdown", "source": [ "Let's start by creating a `DataFrame` object, `x`, so that we can learn how to get information on that data frame." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m A \u001b[0m\u001b[1m B \u001b[0m\u001b[1m C \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Float64? \u001b[0m\u001b[90m String \u001b[0m\n─────┼──────────────────────────\n 1 │ 1 1.0 a\n 2 │ 2 \u001b[90m missing \u001b[0m b", "text/html": [ "
2×3 DataFrame
RowABC
Int64Float64?String
111.0a
22missingb
" ] }, "metadata": {}, "execution_count": 2 } ], "cell_type": "code", "source": [ "x = DataFrame(A=[1, 2], B=[1.0, missing], C=[\"a\", \"b\"])" ], "metadata": {}, "execution_count": 2 }, { "cell_type": "markdown", "source": [ "The standard `size` function works to get dimensions of the `DataFrame`," ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "((2, 3), 2, 3)" }, "metadata": {}, "execution_count": 3 } ], "cell_type": "code", "source": [ "size(x), size(x, 1), size(x, 2)" ], "metadata": {}, "execution_count": 3 }, { "cell_type": "markdown", "source": [ "as well as `nrow` and `ncol` from R." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "(2, 3)" }, "metadata": {}, "execution_count": 4 } ], "cell_type": "code", "source": [ "nrow(x), ncol(x)" ], "metadata": {}, "execution_count": 4 }, { "cell_type": "markdown", "source": [ "`describe` gives basic summary statistics of data in your `DataFrame` (check out the help of `describe` for information on how to customize shown statistics)." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m3×7 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m variable \u001b[0m\u001b[1m mean \u001b[0m\u001b[1m min \u001b[0m\u001b[1m median \u001b[0m\u001b[1m max \u001b[0m\u001b[1m nmissing \u001b[0m\u001b[1m eltype \u001b[0m\n │\u001b[90m Symbol \u001b[0m\u001b[90m Union… \u001b[0m\u001b[90m Any \u001b[0m\u001b[90m Union… \u001b[0m\u001b[90m Any \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Type \u001b[0m\n─────┼───────────────────────────────────────────────────────────────────────\n 1 │ A 1.5 1 1.5 2 0 Int64\n 2 │ B 1.0 1.0 1.0 1.0 1 Union{Missing, Float64}\n 3 │ C \u001b[90m \u001b[0m a \u001b[90m \u001b[0m b 0 String", "text/html": [ "
3×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1A1.511.520Int64
2B1.01.01.01.01Union{Missing, Float64}
3Cab0String
" ] }, "metadata": {}, "execution_count": 5 } ], "cell_type": "code", "source": [ "describe(x)" ], "metadata": {}, "execution_count": 5 }, { "cell_type": "markdown", "source": [ "you can limit the columns shown by `describe` using `cols` keyword argument" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×7 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m variable \u001b[0m\u001b[1m mean \u001b[0m\u001b[1m min \u001b[0m\u001b[1m median \u001b[0m\u001b[1m max \u001b[0m\u001b[1m nmissing \u001b[0m\u001b[1m eltype \u001b[0m ⋯\n │\u001b[90m Symbol \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Real \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Real \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Type \u001b[0m ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ A 1.5 1 1.5 2 0 Int64 ⋯\n 2 │ B 1.0 1.0 1.0 1.0 1 Union{Missing, Float6\n\u001b[36m 1 column omitted\u001b[0m", "text/html": [ "
2×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolFloat64RealFloat64RealInt64Type
1A1.511.520Int64
2B1.01.01.01.01Union{Missing, Float64}
" ] }, "metadata": {}, "execution_count": 6 } ], "cell_type": "code", "source": [ "describe(x, cols=1:2)" ], "metadata": {}, "execution_count": 6 }, { "cell_type": "markdown", "source": [ "`names` will return the names of all columns as strings" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "3-element Vector{String}:\n \"A\"\n \"B\"\n \"C\"" }, "metadata": {}, "execution_count": 7 } ], "cell_type": "code", "source": [ "names(x)" ], "metadata": {}, "execution_count": 7 }, { "cell_type": "markdown", "source": [ "you can also get column names with a given element type (`eltype`):" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "1-element Vector{String}:\n \"C\"" }, "metadata": {}, "execution_count": 8 } ], "cell_type": "code", "source": [ "names(x, String)" ], "metadata": {}, "execution_count": 8 }, { "cell_type": "markdown", "source": [ "use `propertynames` to get a vector of `Symbol`s:" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "3-element Vector{Symbol}:\n :A\n :B\n :C" }, "metadata": {}, "execution_count": 9 } ], "cell_type": "code", "source": [ "propertynames(x)" ], "metadata": {}, "execution_count": 9 }, { "cell_type": "markdown", "source": [ "`eltype` on `eachcol(x)` returns element types of columns:" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "3-element Vector{Type}:\n Int64\n Union{Missing, Float64}\n String" }, "metadata": {}, "execution_count": 10 } ], "cell_type": "code", "source": [ "eltype.(eachcol(x))" ], "metadata": {}, "execution_count": 10 }, { "cell_type": "markdown", "source": [ "Here we create some large `DataFrame`" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m1000×10 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\u001b[1m x4 \u001b[0m\u001b[1m x5 \u001b[0m\u001b[1m x6 \u001b[0m\u001b[1m x7 \u001b[0m\u001b[1m x8 \u001b[0m\u001b[1m x9 \u001b[0m\u001b[1m x10 \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n──────┼──────────────────────────────────────────────────────────────────────\n 1 │ 1 3 6 5 3 3 1 8 8 3\n 2 │ 10 8 1 7 8 3 10 10 7 9\n 3 │ 9 9 9 4 2 8 3 4 10 8\n 4 │ 6 4 6 7 8 2 3 5 8 1\n 5 │ 3 1 8 5 4 6 4 8 5 1\n 6 │ 7 2 1 7 3 8 6 8 2 4\n 7 │ 2 9 9 4 6 7 8 7 2 1\n 8 │ 3 3 5 10 7 7 8 2 2 7\n ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮\n 994 │ 2 9 2 9 3 2 8 3 1 1\n 995 │ 2 8 4 3 10 10 3 7 5 2\n 996 │ 7 3 1 7 5 5 2 2 3 2\n 997 │ 7 4 7 2 2 6 8 4 6 8\n 998 │ 4 8 1 1 3 3 2 3 8 8\n 999 │ 1 9 4 4 10 6 4 3 8 2\n 1000 │ 7 1 4 2 1 8 1 1 7 3\n\u001b[36m 985 rows omitted\u001b[0m", "text/html": [ "
1000×10 DataFrame
975 rows omitted
Rowx1x2x3x4x5x6x7x8x9x10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
11365331883
21081783101079
399942834108
46467823581
53185464851
67217386824
72994678721
833510778227
981062373213
103995595636
116373457897
129351813268
1371793410151
98931098379639
990810183104125
99183316841079
99265183776610
993371073102614
9942929328311
995284310103752
9967317552232
9977472268468
9984811332388
99919441064382
10007142181173
" ] }, "metadata": {}, "execution_count": 11 } ], "cell_type": "code", "source": [ "y = DataFrame(rand(1:10, 1000, 10), :auto)" ], "metadata": {}, "execution_count": 11 }, { "cell_type": "markdown", "source": [ "and then we can use `first` to peek into its first few rows" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m5×10 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\u001b[1m x4 \u001b[0m\u001b[1m x5 \u001b[0m\u001b[1m x6 \u001b[0m\u001b[1m x7 \u001b[0m\u001b[1m x8 \u001b[0m\u001b[1m x9 \u001b[0m\u001b[1m x10 \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n─────┼──────────────────────────────────────────────────────────────────────\n 1 │ 1 3 6 5 3 3 1 8 8 3\n 2 │ 10 8 1 7 8 3 10 10 7 9\n 3 │ 9 9 9 4 2 8 3 4 10 8\n 4 │ 6 4 6 7 8 2 3 5 8 1\n 5 │ 3 1 8 5 4 6 4 8 5 1", "text/html": [ "
5×10 DataFrame
Rowx1x2x3x4x5x6x7x8x9x10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
11365331883
21081783101079
399942834108
46467823581
53185464851
" ] }, "metadata": {}, "execution_count": 12 } ], "cell_type": "code", "source": [ "first(y, 5)" ], "metadata": {}, "execution_count": 12 }, { "cell_type": "markdown", "source": [ "and `last` to see its bottom rows." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m3×10 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\u001b[1m x4 \u001b[0m\u001b[1m x5 \u001b[0m\u001b[1m x6 \u001b[0m\u001b[1m x7 \u001b[0m\u001b[1m x8 \u001b[0m\u001b[1m x9 \u001b[0m\u001b[1m x10 \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n─────┼──────────────────────────────────────────────────────────────────────\n 1 │ 4 8 1 1 3 3 2 3 8 8\n 2 │ 1 9 4 4 10 6 4 3 8 2\n 3 │ 7 1 4 2 1 8 1 1 7 3", "text/html": [ "
3×10 DataFrame
Rowx1x2x3x4x5x6x7x8x9x10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
14811332388
219441064382
37142181173
" ] }, "metadata": {}, "execution_count": 13 } ], "cell_type": "code", "source": [ "last(y, 3)" ], "metadata": {}, "execution_count": 13 }, { "cell_type": "markdown", "source": [ "Using `first` and `last` without number of rows will return a first/last `DataFrameRow` in the `DataFrame`" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1mDataFrameRow\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\u001b[1m x4 \u001b[0m\u001b[1m x5 \u001b[0m\u001b[1m x6 \u001b[0m\u001b[1m x7 \u001b[0m\u001b[1m x8 \u001b[0m\u001b[1m x9 \u001b[0m\u001b[1m x10 \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n─────┼──────────────────────────────────────────────────────────────────────\n 1 │ 1 3 6 5 3 3 1 8 8 3", "text/html": [ "
DataFrameRow (10 columns)
Rowx1x2x3x4x5x6x7x8x9x10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
11365331883
" ] }, "metadata": {}, "execution_count": 14 } ], "cell_type": "code", "source": [ "first(y)" ], "metadata": {}, "execution_count": 14 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1mDataFrameRow\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\u001b[1m x4 \u001b[0m\u001b[1m x5 \u001b[0m\u001b[1m x6 \u001b[0m\u001b[1m x7 \u001b[0m\u001b[1m x8 \u001b[0m\u001b[1m x9 \u001b[0m\u001b[1m x10 \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n──────┼──────────────────────────────────────────────────────────────────────\n 1000 │ 7 1 4 2 1 8 1 1 7 3", "text/html": [ "
DataFrameRow (10 columns)
Rowx1x2x3x4x5x6x7x8x9x10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
10007142181173
" ] }, "metadata": {}, "execution_count": 15 } ], "cell_type": "code", "source": [ "last(y)" ], "metadata": {}, "execution_count": 15 }, { "cell_type": "markdown", "source": [ "## Displaying large data frames\n", "Create a wide and tall data frame:" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m100×100 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\u001b[1m x4 \u001b[0m\u001b[1m x5 \u001b[0m\u001b[1m x6 \u001b[0m\u001b[1m x7 \u001b[0m ⋯\n │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float6\u001b[0m ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.789962 0.0624244 0.735969 0.460358 0.995564 0.778658 0.6534 ⋯\n 2 │ 0.906916 0.801493 0.916937 0.303229 0.0785325 0.504828 0.1142\n 3 │ 0.191172 0.640318 0.591957 0.768282 0.976985 0.947779 0.0449\n 4 │ 0.152854 0.519964 0.588042 0.586559 0.355327 0.272534 0.7055\n 5 │ 0.438492 0.979425 0.0891575 0.945882 0.730485 0.642267 0.8786 ⋯\n 6 │ 0.978017 0.685007 0.13794 0.527277 0.959323 0.353505 0.8836\n 7 │ 0.355382 0.387504 0.0386789 0.817752 0.920758 0.947085 0.2679\n 8 │ 0.623965 0.573277 0.380789 0.272908 0.484304 0.343541 0.6340\n ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱\n 94 │ 0.932557 0.702032 0.219581 0.886079 0.588761 0.271663 0.5069 ⋯\n 95 │ 0.957338 0.00108312 0.0452597 0.0415551 0.0603021 0.932539 0.1623\n 96 │ 0.217519 0.0452365 0.96423 0.987456 0.199681 0.927534 0.8782\n 97 │ 0.775986 0.767202 0.244489 0.912083 0.464894 0.428593 0.2336\n 98 │ 0.522384 0.740745 0.807622 0.15677 0.842836 0.485635 0.8578 ⋯\n 99 │ 0.512551 0.294041 0.213607 0.895076 0.816713 0.363689 0.8900\n 100 │ 0.61419 0.176781 0.0414305 0.791225 0.0696568 0.889425 0.0994\n\u001b[36m 94 columns and 85 rows omitted\u001b[0m", "text/html": [ "
100×100 DataFrame
75 rows omitted
Rowx1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16x17x18x19x20x21x22x23x24x25x26x27x28x29x30x31x32x33x34x35x36x37x38x39x40x41x42x43x44x45x46x47x48x49x50x51x52x53x54x55x56x57x58x59x60x61x62x63x64x65x66x67x68x69x70x71x72x73x74x75x76x77x78x79x80x81x82x83x84x85x86x87x88x89x90x91x92x93x94x95x96x97x98x99x100
Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
10.7899620.06242440.7359690.4603580.9955640.7786580.6534090.86490.7444140.7186150.8173030.8421550.902810.6786680.1967160.4000460.9758510.4082430.1296690.16080.7284550.8100810.943140.9499850.3825180.3981250.3169780.5653630.2174610.1816030.1706180.9940690.6864840.01377870.08817110.6374450.594410.3104370.0874330.9800310.1769490.01059230.3671880.964850.507510.1441180.6350310.514950.9556660.1960450.932470.3488460.1519130.5115620.3717840.2766120.9273450.5741420.2569170.8232030.807710.341480.5202930.1216870.8404830.2045350.2160770.299070.7398360.5328720.8904340.6621740.3640330.007890430.3169840.8911170.3320190.3291710.255490.8915670.5699980.5655520.9620390.7155440.8190780.7313850.4960260.69740.8212010.7085210.5252280.4919040.5822720.911020.8980550.817310.9481040.05597820.651910.782342
20.9069160.8014930.9169370.3032290.07853250.5048280.114240.09909220.1737230.7645540.6615640.6778530.8501740.4267910.8037420.8490760.6291160.8994570.03074740.3672950.2761080.3174220.2879910.7334440.6021230.08832160.9888870.02225790.4216920.0462510.3385890.6581560.4200620.4839670.2690880.9125390.5213640.3983090.9043740.1287590.8455630.2651370.9400240.9097910.1749720.3796790.725790.5110810.8453780.7683090.8652790.5986970.9619840.4282720.6975650.4757750.04715130.3377920.6963060.1289990.2912030.2663010.257250.8132330.3223880.7657760.6322180.8978580.5367520.4660920.9750.0190280.1049230.09423280.2538960.2169720.168760.1994350.3654840.5206080.5809470.3462340.466320.3872650.5736170.2539710.9350740.1236010.6033160.27040.6237160.400310.3368370.9945160.7117720.4987220.5887640.4050560.7625780.810863
30.1911720.6403180.5919570.7682820.9769850.9477790.04496460.7021670.8064190.7605460.4627880.6651960.684710.3339270.2735860.3431960.6146860.238530.6653370.6402590.121590.5436030.7465360.2234460.7807120.1396130.8572780.5275470.3034260.6689160.7460080.8525870.5065230.5659380.7369070.4344050.5970160.8173680.4486560.8027660.5786490.6614920.7629090.8860290.996140.5250820.9329520.08488760.5426360.02378580.7312220.4030120.05514890.242990.5179120.8503670.5943180.629380.09004610.2815030.6014830.3770340.7818160.235370.7708270.3019950.9182570.4150670.886970.08195720.2540090.2478970.9029310.4030270.894840.02628830.01916190.08453530.699580.8481510.8269640.09738350.5215480.5359220.8954310.2287040.6779980.4813160.06565490.4125360.4299620.05913720.4651670.1899530.9762260.3503920.5356180.8192810.3546810.596328
40.1528540.5199640.5880420.5865590.3553270.2725340.7055560.940910.2798720.1374910.9261850.5910340.4847180.4279580.4517310.5905350.4815840.06550170.797350.2553490.600090.8986340.994120.4101660.872830.04988670.7734060.6465410.05607350.5305860.2170240.3376910.0929640.134510.1270270.7322760.7443450.07860010.2365770.1937320.8068460.08039810.1674070.6278980.476130.6182220.1023110.9274730.911350.2553560.5180480.4868950.530870.4019060.6896750.1717610.6690480.06215310.591580.5491130.6920230.4814870.2949330.5896980.02263710.7640850.7772560.2580740.870560.3650230.2699270.6720630.06393640.2277010.7868190.8114280.7909610.1166250.4549460.7791650.5099070.9583280.003922770.7396330.1001590.7529880.1506690.06704220.5099120.7479370.7816940.1791750.8774890.2667610.7507750.6048850.2273720.1027780.5190490.83224
50.4384920.9794250.08915750.9458820.7304850.6422670.8786450.1631010.07359550.212770.1880280.431060.7739980.3914240.2744760.9525680.3196980.254970.8909720.3706780.9465260.7966530.02668690.281310.5589910.1519080.0174120.6052460.6810850.9311870.3397460.998630.6340390.2678560.6284970.968770.4947590.103370.8496180.8474020.1736610.706780.3364190.6746870.3286790.6107050.07621140.1161680.8791910.8742640.3655420.7057720.9954170.6394540.7188540.1203330.9314380.2044270.05989470.3931860.5635450.6577450.5010170.4163580.8977510.5638960.889210.7056170.1022530.6242210.7914420.8584320.3428290.6109950.2639440.8130460.6825030.3112770.336140.3732460.3805570.8429040.03408860.9777560.8276960.9058810.6333180.2011860.04480090.1363330.8527520.8906380.2076350.5934630.9223080.3020280.743130.9113290.7418330.189258
60.9780170.6850070.137940.5272770.9593230.3535050.8836490.4979460.5082990.3648060.8842260.3189540.1916650.5299340.2832080.5842120.6425040.3970670.4396160.3320970.7585130.7677440.2527750.8742760.5123810.8556340.7680540.2473610.8936180.6779970.8497320.2009490.9164470.4739070.8876930.2045610.32190.06774580.4034290.2101130.6482270.8404520.7405170.8213380.2485550.2480920.009930020.9504790.5088060.6682460.4357530.5435610.517060.05684030.5730820.7724920.582920.6574420.5808150.05260010.9376410.2747140.1579320.1183310.4836430.8625160.8966590.3744680.498190.3376190.3607270.354710.06896350.864360.8257710.8592730.7879550.9378370.5966130.1756670.9573550.8124270.1257640.9180910.47570.8645510.1926220.8748210.4911570.8301360.7020210.5805040.635010.9979950.1132920.3553450.4486880.313590.7427070.784953
70.3553820.3875040.03867890.8177520.9207580.9470850.2679680.5082670.07448010.04169930.4162370.1549950.7953770.8632240.7253020.8588170.5561210.01968860.6404860.6647340.8017780.1740840.7472560.479670.05613470.9473390.9699360.3921610.4418670.9627980.4322540.2114710.5033550.2331930.2796470.9076910.8024520.5382680.2388050.4733710.1435320.5029390.8780330.02847680.1545750.4757270.9807660.8371610.4853420.3550290.3108260.1953470.009533820.5139230.09475960.03758940.3510140.0949230.5130240.8599130.902930.7891470.379690.8976670.03895840.1891680.6039270.4237350.8390330.9914280.8547640.438710.9183470.9271220.4892360.7638860.8926630.196980.5201830.5098470.7061770.7666940.01368340.1564290.1320120.7912730.474030.2741210.5212740.5922080.6988950.8027990.6481420.04712960.6334710.9958330.1907460.4318090.8142910.260273
80.6239650.5732770.3807890.2729080.4843040.3435410.6340270.3070120.8629360.2308150.6524910.792430.5142640.5718550.04502830.7417230.4255620.7607730.8480760.4910730.04048910.6362670.006376740.2431970.8048270.783360.3012420.3356870.9319320.2472740.9122920.4491090.7958030.6412210.1767840.9699640.6704710.9954020.04216170.5225380.3335080.07978080.5466250.2158470.68170.5358630.2603380.8434190.8890260.5062870.7131060.6951580.6424640.05732080.8057220.5362090.6248320.6980360.04000370.7212920.9606020.6072940.7724790.7383930.1982550.1000640.8021410.3044460.4250350.9058660.1573920.04511830.9536960.7524760.4563850.5437690.8679630.9762570.8667770.07948330.9590750.6916340.7448510.4212410.9782130.4710060.4899630.9666420.8545820.7697710.2484330.9620770.05553620.6282930.9382990.9201960.1401570.7967870.5678650.407918
90.9249670.7745970.2327520.2041110.3665670.9023850.9217460.4737910.4727890.4617280.5114220.229580.1820110.02340310.7419590.6096750.4066370.3982940.6221530.4044390.4377760.005421050.7486150.2595860.5248320.6481690.554650.487090.5233740.2314860.5927010.6812640.4889070.4308310.6278750.3512960.2206720.4967310.9881250.9159990.6045120.3745450.4038340.8076660.4751560.3595490.6913420.829210.8420370.6656120.7527090.3267450.3698850.7457140.5012460.1245890.8895160.8666040.8563620.6671240.4468370.686070.7629440.7051230.1572660.9941740.163590.637070.4430530.7648810.515560.8466790.001812720.2005860.7819750.2234450.8383550.1710950.4754080.07679080.869550.5188450.4613290.7953330.8914510.4256620.9953460.5627350.1871630.7517790.2458880.2399540.5705110.5935360.5808870.937950.7976480.1015560.1306430.132767
100.8700180.5237820.1466630.06882820.9101870.0306170.71950.4241320.7397840.1068370.9134810.3107180.02537760.7307060.697320.2204790.1217290.7722840.7920310.1754780.2423310.945150.6206820.250980.7733860.4740920.647730.5315230.630110.474550.01159820.2339140.7301830.2402370.4320140.06606560.5792570.8016620.7340370.2807450.4676080.2274330.3185560.0702220.07070960.4551860.5144610.3140140.641610.06817910.459580.4834180.3087410.1249270.8802560.8508420.03937610.03980960.1226090.4881810.1603640.1126080.2209290.2072020.4946360.5252680.02560170.8579390.5096530.6240370.5062820.3856280.3637830.5892660.7521080.809890.7310940.8709230.6632110.7875620.1281380.8267310.8388760.7620650.6945180.08542030.1494630.7110090.8958980.8911020.05961530.9342140.4051290.959020.06797590.2231750.7316980.9583410.4904050.0775291
110.3625480.5994070.09475380.7395890.5421920.8418550.06011350.1940390.4701780.5279840.5272130.9460750.958590.510090.557430.7531280.2676010.04909260.239750.8089860.6711420.960510.7644470.4964780.6408310.8627860.1308490.7317220.5812030.4243230.07352960.5665770.7227780.8152740.1541860.4770420.3113530.6580950.9544050.562250.3372640.814790.1763370.5536020.2922430.8119560.9455090.7830880.2484240.8380070.2773830.5051510.9061550.7554130.4304790.7827040.9621040.563880.953430.3501090.641190.4593330.7997970.3084240.956620.4990960.9076390.8477750.5083790.1569820.5497680.9962670.3295790.2337310.6315850.6881590.0523760.1475060.1524670.9623460.9450860.4489640.1128020.1134030.6845670.7186780.6529640.2464630.9051860.8484040.5001860.2803530.7148170.9474390.612230.2564810.8861910.9123580.3615130.211067
120.6156160.8485330.6835360.7538840.1053340.4915580.05413340.363630.2378250.07945090.3230710.8015870.5539790.09285620.9183860.0787320.2789980.6851080.7795970.02172480.1738920.3000550.04878420.6330440.6542440.1811670.5673750.8546190.06974450.2563870.1855910.9195990.9344970.3909270.7388180.9258730.9828550.1705050.6164830.05868320.3092170.03028020.8567250.3519170.03612070.482560.5736140.3360810.1562470.1416440.7494020.4201740.1142790.3463570.1184010.7164840.8726540.3335510.8999510.4691640.9532720.5777930.5070590.7579460.01303740.4850620.4561760.2051140.9396350.9120740.3973160.4153660.0559410.2550770.06199570.15580.8568760.7620880.4555910.7589760.2745730.6967240.03071460.3621780.2479940.5224910.6939220.6408070.249770.0178360.3363380.995480.7358180.1243170.411320.7146840.4483590.2996570.3620690.958475
130.4620130.9063680.7829220.04500270.2164450.5317210.1999160.3165950.6259910.1122160.6935130.9717850.8855560.8482210.5241840.1566830.1518680.3017410.6447630.2561460.8697830.02611410.3453160.188320.06844320.3922280.08267620.1720710.1416080.9687170.9390860.7220960.4553680.4610170.4631840.02031320.3419060.4342470.2863170.4817530.8851270.1022020.1368550.33660.2363580.3857150.1288610.4296460.4148880.8207580.1261010.9470220.4066310.3223540.4638390.5542690.9864280.788690.4642030.9584660.6890870.7602680.2944950.9731280.6048840.8319110.620570.1986230.3705870.2415170.5656280.6820430.5934820.7483810.5232380.4032520.6860630.5996150.9368590.4969680.3563960.04149670.05208490.7181520.5993640.1803710.9305510.9122030.5653550.9771520.8253880.9040740.9551820.01552270.7285410.633320.6504790.7174570.6316430.374587
890.2807690.3854920.05854520.01781410.31810.170230.5011490.9177310.6037340.09551690.09732320.9448970.001324170.5809760.1655290.0397240.8881330.2738720.1727420.9506420.2515190.2949210.745490.9916150.6803010.9901520.2006810.264730.07920350.3736310.5792670.4596890.0390330.5874370.9292830.3975820.04198380.6320370.9391060.4910730.6254610.5266020.3633510.08425470.5785670.7781790.6333210.0014860.906620.1076840.2225630.5476520.3821660.2414370.4027040.9163210.1289080.7125970.6077950.9614370.4450010.7853820.6772610.3485340.8178780.5428340.8349880.4760180.8217240.6209730.1157760.8827580.2529710.9466280.6777360.2020760.3543470.8829990.7752480.9580360.6684480.01319150.448570.3974890.4212810.339180.4340420.6192870.5245040.3544850.8029620.3842890.5672390.3895450.1784080.5391630.3897940.7569470.2345940.00260606
900.158630.8923240.1958180.1895460.121590.4624420.04468660.6615740.1144630.2358530.1901550.0003734130.003689810.9549470.2440630.1010280.08181960.7176050.6392970.4055190.02612080.1370220.9221650.2931540.8369470.494160.3049980.5531040.7823690.6097110.5664480.9538910.2632490.5727130.8665140.6301330.8613530.09467740.9595460.7346240.9315960.151810.2215250.9772040.6757440.9153310.8557070.9465960.4330120.9454520.1110260.1831870.7342690.34620.9063480.9840580.9705970.4040780.8243750.8851510.7638470.8754240.9101190.3849150.989710.112490.1499370.9756110.2074110.2349940.5203530.9960160.7400650.2745140.05589980.7285540.2231620.9136110.3079390.5258810.1474260.1772580.9052070.8882960.7518410.4874070.1988130.7851660.09513160.5717640.4220650.4673020.5975030.439330.8743290.4514940.8510460.1166240.3010250.684658
910.7617410.2273450.5151410.7335110.4218220.783580.433220.2260860.0580770.7992610.5817930.1112090.1347590.5529730.1586350.9316790.7890160.9058970.3374390.05297210.1284420.7874250.7153670.4569980.8040640.4500390.8455720.4192710.5069060.8614480.2630050.6072970.2265380.2158370.5046860.5003790.3216790.813710.1121330.6755060.01186330.6325890.9665340.3359670.8869130.9520920.4168060.1317160.368550.4209910.02380010.3020860.214680.2941670.03750480.3086720.7957850.5741720.7988630.3635350.6160060.9278740.44210.75040.2093760.3230750.9418570.8246670.8938050.1634320.1974420.2792060.1306570.1949320.4162090.6034710.846750.4850010.5147170.4053390.3832130.5523060.9048490.06158130.4444210.2968190.2029130.8822360.7467490.5748210.1917130.2958370.4766070.1413020.1580830.1416960.9695050.7913680.7056620.038993
920.2252680.2394950.4222710.959380.4254130.5420660.03461590.2428410.1534930.773170.5344480.9878560.2631890.278820.7559240.9892970.2859970.3522560.2874370.3207340.03389250.1146340.5819640.9162260.698350.4424430.9171240.06638970.5783140.6538050.8060740.005933840.6028210.3956260.004261040.6182340.509820.1679690.9666830.5848230.8331080.9721820.9600390.8451570.3165930.9319990.3574770.01607590.6831180.3477280.1430080.05240160.2202430.5577180.2492380.8862470.08763380.3620770.8046430.5783420.963160.3994540.3847790.9299790.4066290.6457570.6738650.08229340.7392980.9835280.3688730.5045320.4883960.6613420.1480150.005429620.02793790.4486180.7494860.135820.1527310.325360.9821720.6320620.06514920.6515190.1043450.9634080.9557750.05138190.6083450.2628960.6840680.3948010.5027420.8214720.07747610.1085180.9826750.906776
930.7176290.1408780.7477550.5958910.278670.3748460.880690.2701840.8534350.4776470.8045550.9195290.4042350.7515650.6242750.873060.3286930.4940630.4032970.5947060.4732440.7276950.7720570.2819990.9946710.2790260.2740970.531350.3224910.9792150.9939710.7611470.7918210.02098720.389580.3709310.4613430.6027450.2151640.2435210.3302440.8104350.5671990.03052750.3166760.882190.2435790.7841810.5369520.9938970.05595120.379940.5561330.7862420.7106350.1352890.3559040.3490010.8275980.6653360.9769850.5246320.7986250.8999730.7166890.5002350.4163180.4170660.2176580.8109960.6674680.4298590.9778430.8739820.2017550.7741940.3984760.4745250.3644970.2852190.4415810.8804670.3415180.9977190.0755910.5138030.4933630.1592870.9117030.8994110.6309940.5643320.4947530.2931810.603950.9950290.8148030.4132120.06424550.634372
940.9325570.7020320.2195810.8860790.5887610.2716630.5069450.8186360.07507150.8051340.5415550.5843670.9136840.8618350.8613930.4540130.08829140.5202390.2590260.425860.5594670.1505480.7811070.2813460.3663760.3378760.3842170.2002330.8678590.2806870.1662370.4044920.4183160.8145880.08869780.435620.1680990.8398540.5313140.4061090.0295760.4965010.703590.4820490.3538810.01926660.8001160.1661030.6490930.2441530.605020.6417780.9652860.05402750.9591710.8255050.3730890.8173070.1169260.4541650.3656130.2657810.004207590.3309850.8541530.6741020.4879410.9725570.5389950.7061530.7595310.9801150.9208450.01461970.5697980.5691620.09387850.8235280.6099630.6992180.5526890.5014040.5101270.01253490.6909410.6349560.9500280.8113960.09328840.4544120.6241390.8123650.3797620.07307520.6478440.5056060.2186230.2794470.6010690.41675
950.9573380.001083120.04525970.04155510.06030210.9325390.1623270.5543320.006260940.4110860.9057090.1471080.5887690.7544560.05491820.6830310.2405430.9177370.5962220.1425230.2085930.05700860.1905650.9669170.459580.516240.4749360.3272850.7647230.7054560.23520.8482330.4265020.002187720.5447340.1766090.3948780.5338750.1292970.1456680.8199160.6132210.07327650.02989990.01450710.4921780.6445730.2894020.02859950.5219270.4710340.8962320.4365830.1816190.9064940.1295310.6096990.7424030.9751990.1617360.7490930.4746840.4765120.9954470.3217330.4647860.2249410.1622950.7810670.9091280.2943590.7326960.3871560.4778050.6177130.1428630.09482520.8386130.4284790.8501490.3601350.5794480.7838990.8043170.7347180.1311840.160230.7161230.8727330.5219060.9662090.4607460.5716130.8739840.1233320.6745250.7523190.9156870.3377970.733894
960.2175190.04523650.964230.9874560.1996810.9275340.8782670.3782070.8720080.2380980.2576140.6339710.9767850.6632630.3906870.3532850.6138820.7418350.1079310.1596430.9569330.5049970.28030.4331690.01562740.8790040.7261080.6834160.9258060.5938910.4546650.1323480.3662130.02345350.7299590.5570510.2212220.5683130.03487860.3125950.9897440.08050060.8247320.04774570.933470.4281670.3264580.4323810.1409560.5818440.1839740.8160840.3902740.8133280.7016940.8380390.3454570.9477280.3299880.871440.7179560.4039990.2004290.2069370.5905910.6318040.3418310.7200180.228410.5553940.2362540.5993660.1663110.6356770.9354860.3648330.9246160.9489360.4789770.9515910.06337990.108120.9546490.2201050.587440.9788210.2129940.8167670.8522910.441560.6397630.03612410.9805160.1398430.9990620.6374380.2182610.3031850.8749530.308514
970.7759860.7672020.2444890.9120830.4648940.4285930.2336540.2704410.6627840.6854290.9130070.4198680.04476730.1006360.5301660.7596010.1056960.8782490.8123190.1334920.1237470.6278620.8031890.1761590.8936050.5098330.895630.8381780.8247330.219170.2702750.6319830.4805380.997440.006247130.08208330.4436150.1645360.3090130.261520.03155890.2660620.2386580.7401170.4207590.8942840.4011520.4067650.2189360.5094360.4764130.6422630.8140020.4960540.9236060.2203040.7717780.2952050.9943860.7605250.1796460.2862790.009846920.1066950.9236220.9197960.7198810.9000470.3878180.7940210.4869810.921230.08853110.6673910.618220.0954120.7844220.3955690.1078450.2810750.2372780.4097290.6694980.05390780.5206310.2010340.4765080.9570750.9202350.3765510.3608920.7938290.5722120.6931870.6546350.9166160.5875880.5601660.2004090.0910738
980.5223840.7407450.8076220.156770.8428360.4856350.8578480.9222810.4094670.09328160.1783180.7321830.02230560.6434920.9545210.9161750.6187340.9615020.2144510.5935540.2843020.6522270.6629530.9028470.7406480.4188430.5983260.3201450.6269250.3324440.944810.009519620.4082720.625490.7273590.7138450.5287160.04170950.9531020.4421720.9822320.3569230.6402110.9388850.2790690.9977780.05676640.5736130.2396740.06708270.5426540.3739620.7502050.2371240.6290470.3740390.7110360.6682410.6857650.3392480.3674680.4610480.483420.32690.7176110.9475210.3186310.01130160.2002630.5660340.1328540.221670.4216060.8517070.4113020.4837420.5836980.7555730.9838590.4137060.9696350.7920340.5685180.2269610.02103440.703620.09064330.9109670.5606270.1507920.146770.660630.0510290.7458240.6132130.268470.07782110.03435510.07339160.109837
990.5125510.2940410.2136070.8950760.8167130.3636890.8900630.3592140.7721960.6418270.6092220.7092870.4479940.4892380.3207490.9197990.3246570.6608150.8378990.06835060.1546250.07735420.1844970.333720.9077790.523560.2438730.08814710.2391570.1933520.7250130.3571060.7379410.4908110.7748280.4477740.4680670.24180.9393110.2143870.1705220.1161660.0772450.3312240.8084390.3340340.367610.967920.8931160.6603930.7449930.9363950.792730.4092490.2921890.1387850.258670.169990.3917770.9067320.6065010.4757050.5575340.01248580.7141360.1569390.877240.5400450.1487720.108240.3068880.4092290.7569490.648360.2094190.6774180.5543420.09371780.609640.1278780.2214880.516330.1035580.4875930.9982540.4954580.09623330.04971430.188010.2357060.2541620.3279160.05505990.8795540.4222940.5097370.1877780.2700360.8777070.467502
1000.614190.1767810.04143050.7912250.06965680.8894250.09942570.8090570.536310.2964340.8936240.651670.8483230.8797340.8599810.3162210.4404160.3846870.1220390.6447870.2627580.1517280.2258960.7243590.1978220.7192090.06737430.2984270.9677210.4434250.3323890.09937150.500760.1524710.02160560.4541340.4533290.7528430.2230690.7175380.8612530.03905360.385160.1158520.21240.8191440.4535410.270310.3840870.1888180.0627450.5648580.359680.6725910.8313380.03516650.04296040.01223830.6996130.8454580.9314690.8931510.07069750.867390.2755420.03341330.1741980.1080960.6928650.7237920.7328220.1887440.4427340.3636430.6090890.2134750.6292680.9394750.7733750.7556430.64450.8251810.2744810.05177580.4334480.8745570.4177650.8671310.5946010.8142860.7976310.7363570.71230.9389630.6695590.5747950.4151250.413070.9341990.179651
" ] }, "metadata": {}, "execution_count": 16 } ], "cell_type": "code", "source": [ "df = DataFrame(rand(100, 100), :auto)" ], "metadata": {}, "execution_count": 16 }, { "cell_type": "markdown", "source": [ "we can see that 92 of its columns were not printed. Also we get its first 30 rows. You can easily change this behavior by changing the value of `ENV[\"LINES\"]` and `ENV[\"COLUMNS\"]`." ], "metadata": {} }, { "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100×100 DataFrame\n", " Row │ x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 ⋯\n", " │ Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Flo ⋯\n", "─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────\n", " 1 │ 0.789962 0.0624244 0.735969 0.460358 0.995564 0.778658 0.653409 0.8649 0.744414 0.718615 0.817303 0.842155 0.90281 0.678668 0.196716 0.400046 0.975851 0.408243 0.1 ⋯\n", " ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱\n", " 82 columns and 99 rows omitted" ] } ], "cell_type": "code", "source": [ "withenv(\"LINES\" => 10, \"COLUMNS\" => 200) do\n", " show(df)\n", "end" ], "metadata": {}, "execution_count": 17 }, { "cell_type": "markdown", "source": [ "### Most elementary get and set operations\n", "Given the `DataFrame` `x` we have created earlier, here are various ways to grab one of its columns as a `Vector`." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m A \u001b[0m\u001b[1m B \u001b[0m\u001b[1m C \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Float64? \u001b[0m\u001b[90m String \u001b[0m\n─────┼──────────────────────────\n 1 │ 1 1.0 a\n 2 │ 2 \u001b[90m missing \u001b[0m b", "text/html": [ "
2×3 DataFrame
RowABC
Int64Float64?String
111.0a
22missingb
" ] }, "metadata": {}, "execution_count": 18 } ], "cell_type": "code", "source": [ "x = DataFrame(A=[1, 2], B=[1.0, missing], C=[\"a\", \"b\"])" ], "metadata": {}, "execution_count": 18 }, { "cell_type": "markdown", "source": [ "all get the vector stored in our DataFrame without copying it" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "([1, 2], [1, 2], [1, 2])" }, "metadata": {}, "execution_count": 19 } ], "cell_type": "code", "source": [ "x.A, x[!, 1], x[!, :A]" ], "metadata": {}, "execution_count": 19 }, { "cell_type": "markdown", "source": [ "the same using string indexing" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "([1, 2], [1, 2])" }, "metadata": {}, "execution_count": 20 } ], "cell_type": "code", "source": [ "x.\"A\", x[!, \"A\"]" ], "metadata": {}, "execution_count": 20 }, { "cell_type": "markdown", "source": [ "note that this creates a copy" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "2-element Vector{Int64}:\n 1\n 2" }, "metadata": {}, "execution_count": 21 } ], "cell_type": "code", "source": [ "x[:, 1]" ], "metadata": {}, "execution_count": 21 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "false" }, "metadata": {}, "execution_count": 22 } ], "cell_type": "code", "source": [ "x[:, 1] === x[:, 1]" ], "metadata": {}, "execution_count": 22 }, { "cell_type": "markdown", "source": [ "To grab one row as a `DataFrame`, we can index as follows." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m1×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m A \u001b[0m\u001b[1m B \u001b[0m\u001b[1m C \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Float64? \u001b[0m\u001b[90m String \u001b[0m\n─────┼─────────────────────────\n 1 │ 1 1.0 a", "text/html": [ "
1×3 DataFrame
RowABC
Int64Float64?String
111.0a
" ] }, "metadata": {}, "execution_count": 23 } ], "cell_type": "code", "source": [ "x[1:1, :]" ], "metadata": {}, "execution_count": 23 }, { "cell_type": "markdown", "source": [ "this produces a DataFrameRow which is treated as 1-dimensional object similar to a NamedTuple" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1mDataFrameRow\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m A \u001b[0m\u001b[1m B \u001b[0m\u001b[1m C \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Float64? \u001b[0m\u001b[90m String \u001b[0m\n─────┼─────────────────────────\n 1 │ 1 1.0 a", "text/html": [ "
DataFrameRow (3 columns)
RowABC
Int64Float64?String
111.0a
" ] }, "metadata": {}, "execution_count": 24 } ], "cell_type": "code", "source": [ "x[1, :]" ], "metadata": {}, "execution_count": 24 }, { "cell_type": "markdown", "source": [ "We can grab a single cell or element with the same syntax to grab an element of an array." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "1" }, "metadata": {}, "execution_count": 25 } ], "cell_type": "code", "source": [ "x[1, 1]" ], "metadata": {}, "execution_count": 25 }, { "cell_type": "markdown", "source": [ "or a new `DataFrame` that is a subset of rows and columns" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×2 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m A \u001b[0m\u001b[1m B \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Float64? \u001b[0m\n─────┼──────────────────\n 1 │ 1 1.0\n 2 │ 2 \u001b[90m missing \u001b[0m", "text/html": [ "
2×2 DataFrame
RowAB
Int64Float64?
111.0
22missing
" ] }, "metadata": {}, "execution_count": 26 } ], "cell_type": "code", "source": [ "x[1:2, 1:2]" ], "metadata": {}, "execution_count": 26 }, { "cell_type": "markdown", "source": [ "You can also use `Regex` to select columns and `Not` from InvertedIndices.jl both to select rows and columns" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m1×1 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m A \u001b[0m\n │\u001b[90m Int64 \u001b[0m\n─────┼───────\n 1 │ 2", "text/html": [ "
1×1 DataFrame
RowA
Int64
12
" ] }, "metadata": {}, "execution_count": 27 } ], "cell_type": "code", "source": [ "x[Not(1), r\"A\"]" ], "metadata": {}, "execution_count": 27 }, { "cell_type": "markdown", "source": [ "`!` indicates that underlying columns are not copied" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×2 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m B \u001b[0m\u001b[1m C \u001b[0m\n │\u001b[90m Float64? \u001b[0m\u001b[90m String \u001b[0m\n─────┼───────────────────\n 1 │ 1.0 a\n 2 │\u001b[90m missing \u001b[0m b", "text/html": [ "
2×2 DataFrame
RowBC
Float64?String
11.0a
2missingb
" ] }, "metadata": {}, "execution_count": 28 } ], "cell_type": "code", "source": [ "x[!, Not(1)]" ], "metadata": {}, "execution_count": 28 }, { "cell_type": "markdown", "source": [ "`:` means that the columns will get copied" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×2 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m B \u001b[0m\u001b[1m C \u001b[0m\n │\u001b[90m Float64? \u001b[0m\u001b[90m String \u001b[0m\n─────┼───────────────────\n 1 │ 1.0 a\n 2 │\u001b[90m missing \u001b[0m b", "text/html": [ "
2×2 DataFrame
RowBC
Float64?String
11.0a
2missingb
" ] }, "metadata": {}, "execution_count": 29 } ], "cell_type": "code", "source": [ "x[:, Not(1)]" ], "metadata": {}, "execution_count": 29 }, { "cell_type": "markdown", "source": [ "Assignment of a scalar to a data frame can be done in ranges using broadcasting:" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m A \u001b[0m\u001b[1m B \u001b[0m\u001b[1m C \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Float64? \u001b[0m\u001b[90m String \u001b[0m\n─────┼─────────────────────────\n 1 │ 1 1.0 a\n 2 │ 1 1.0 b", "text/html": [ "
2×3 DataFrame
RowABC
Int64Float64?String
111.0a
211.0b
" ] }, "metadata": {}, "execution_count": 30 } ], "cell_type": "code", "source": [ "x[1:2, 1:2] .= 1\n", "x" ], "metadata": {}, "execution_count": 30 }, { "cell_type": "markdown", "source": [ "Assignment of a vector of length equal to the number of assigned rows using broadcasting" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m A \u001b[0m\u001b[1m B \u001b[0m\u001b[1m C \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Float64? \u001b[0m\u001b[90m String \u001b[0m\n─────┼─────────────────────────\n 1 │ 1 1.0 a\n 2 │ 2 2.0 b", "text/html": [ "
2×3 DataFrame
RowABC
Int64Float64?String
111.0a
222.0b
" ] }, "metadata": {}, "execution_count": 31 } ], "cell_type": "code", "source": [ "x[1:2, 1:2] .= [1, 2]\n", "x" ], "metadata": {}, "execution_count": 31 }, { "cell_type": "markdown", "source": [ "Assignment or of another data frame of matching size and column names, again using broadcasting:" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m A \u001b[0m\u001b[1m B \u001b[0m\u001b[1m C \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Float64? \u001b[0m\u001b[90m String \u001b[0m\n─────┼─────────────────────────\n 1 │ 5 6.0 a\n 2 │ 7 8.0 b", "text/html": [ "
2×3 DataFrame
RowABC
Int64Float64?String
156.0a
278.0b
" ] }, "metadata": {}, "execution_count": 32 } ], "cell_type": "code", "source": [ "x[1:2, 1:2] .= DataFrame([5 6; 7 8], [:A, :B])\n", "x" ], "metadata": {}, "execution_count": 32 }, { "cell_type": "markdown", "source": [ "**Caution**\n", "\n", "With `df[!, :col]` and `df.col` syntax you get a direct (non copying) access to a column of a data frame.\n", "This is potentially unsafe as you can easily corrupt data in the `df` data frame if you resize, sort, etc. the column obtained in this way.\n", "Therefore such access should be used with caution.\n", "\n", "Similarly `df[!, cols]` when `cols` is a collection of columns produces a new data frame that holds the same (not copied) columns as the source `df` data frame. Similarly, modifying the data frame obtained via `df[!, cols]` might cause problems with the consistency of `df`.\n", "\n", "The `df[:, :col]` and `df[:, cols]` syntaxes always copy columns so they are safe to use (and should generally be preferred except for performance or memory critical use cases)." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Here are examples of how `Cols` and `Between` can be used to select columns of a data frame." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m4×5 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\u001b[1m x4 \u001b[0m\u001b[1m x5 \u001b[0m\n │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n─────┼───────────────────────────────────────────────────\n 1 │ 0.143664 0.243967 0.541132 0.90432 0.253281\n 2 │ 0.649823 0.974319 0.523402 0.547555 0.906718\n 3 │ 0.532381 0.154004 0.618354 0.0428888 0.897352\n 4 │ 0.878731 0.371432 0.413427 0.829586 0.771033", "text/html": [ "
4×5 DataFrame
Rowx1x2x3x4x5
Float64Float64Float64Float64Float64
10.1436640.2439670.5411320.904320.253281
20.6498230.9743190.5234020.5475550.906718
30.5323810.1540040.6183540.04288880.897352
40.8787310.3714320.4134270.8295860.771033
" ] }, "metadata": {}, "execution_count": 33 } ], "cell_type": "code", "source": [ "x = DataFrame(rand(4, 5), :auto)" ], "metadata": {}, "execution_count": 33 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m4×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\u001b[1m x4 \u001b[0m\n │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n─────┼───────────────────────────────\n 1 │ 0.243967 0.541132 0.90432\n 2 │ 0.974319 0.523402 0.547555\n 3 │ 0.154004 0.618354 0.0428888\n 4 │ 0.371432 0.413427 0.829586", "text/html": [ "
4×3 DataFrame
Rowx2x3x4
Float64Float64Float64
10.2439670.5411320.90432
20.9743190.5234020.547555
30.1540040.6183540.0428888
40.3714320.4134270.829586
" ] }, "metadata": {}, "execution_count": 34 } ], "cell_type": "code", "source": [ "x[:, Between(:x2, :x4)]" ], "metadata": {}, "execution_count": 34 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m4×4 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\u001b[1m x4 \u001b[0m\n │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n─────┼─────────────────────────────────────────\n 1 │ 0.143664 0.243967 0.541132 0.90432\n 2 │ 0.649823 0.974319 0.523402 0.547555\n 3 │ 0.532381 0.154004 0.618354 0.0428888\n 4 │ 0.878731 0.371432 0.413427 0.829586", "text/html": [ "
4×4 DataFrame
Rowx1x2x3x4
Float64Float64Float64Float64
10.1436640.2439670.5411320.90432
20.6498230.9743190.5234020.547555
30.5323810.1540040.6183540.0428888
40.8787310.3714320.4134270.829586
" ] }, "metadata": {}, "execution_count": 35 } ], "cell_type": "code", "source": [ "x[:, Cols(\"x1\", Between(\"x2\", \"x4\"))]" ], "metadata": {}, "execution_count": 35 }, { "cell_type": "markdown", "source": [ "## Views\n", "You can simply create a view of a `DataFrame` (it is more efficient than creating a materialized selection). Here are the possible return value options." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "2-element view(::Vector{Float64}, 1:2) with eltype Float64:\n 0.14366370663453387\n 0.6498227286696711" }, "metadata": {}, "execution_count": 36 } ], "cell_type": "code", "source": [ "@view x[1:2, 1]" ], "metadata": {}, "execution_count": 36 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "0-dimensional view(::Vector{Float64}, 1) with eltype Float64:\n0.14366370663453387" }, "metadata": {}, "execution_count": 37 } ], "cell_type": "code", "source": [ "@view x[1, 1]" ], "metadata": {}, "execution_count": 37 }, { "cell_type": "markdown", "source": [ "a DataFrameRow, the same as for x[1, 1:2] without a view" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1mDataFrameRow\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\n │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n─────┼────────────────────\n 1 │ 0.143664 0.243967", "text/html": [ "
DataFrameRow (2 columns)
Rowx1x2
Float64Float64
10.1436640.243967
" ] }, "metadata": {}, "execution_count": 38 } ], "cell_type": "code", "source": [ "@view x[1, 1:2]" ], "metadata": {}, "execution_count": 38 }, { "cell_type": "markdown", "source": [ "a SubDataFrame" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×2 SubDataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\n │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n─────┼────────────────────\n 1 │ 0.143664 0.243967\n 2 │ 0.649823 0.974319", "text/html": [ "
2×2 SubDataFrame
Rowx1x2
Float64Float64
10.1436640.243967
20.6498230.974319
" ] }, "metadata": {}, "execution_count": 39 } ], "cell_type": "code", "source": [ "@view x[1:2, 1:2]" ], "metadata": {}, "execution_count": 39 }, { "cell_type": "markdown", "source": [ "## Adding new columns to a data frame" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m0×0 DataFrame\u001b[0m", "text/html": [ "
0×0 DataFrame
" ] }, "metadata": {}, "execution_count": 40 } ], "cell_type": "code", "source": [ "df = DataFrame()" ], "metadata": {}, "execution_count": 40 }, { "cell_type": "markdown", "source": [ "using `setproperty!` (element assignment)" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m3×1 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m a \u001b[0m\n │\u001b[90m Int64 \u001b[0m\n─────┼───────\n 1 │ 1\n 2 │ 2\n 3 │ 3", "text/html": [ "
3×1 DataFrame
Rowa
Int64
11
22
33
" ] }, "metadata": {}, "execution_count": 41 } ], "cell_type": "code", "source": [ "x = [1, 2, 3]\n", "df.a = x\n", "df" ], "metadata": {}, "execution_count": 41 }, { "cell_type": "markdown", "source": [ "no copy is performed (sharing the same memory address)" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "true" }, "metadata": {}, "execution_count": 42 } ], "cell_type": "code", "source": [ "df.a === x" ], "metadata": {}, "execution_count": 42 }, { "cell_type": "markdown", "source": [ "using `setindex!`" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m3×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m a \u001b[0m\u001b[1m b \u001b[0m\u001b[1m c \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n─────┼─────────────────────\n 1 │ 1 1 1\n 2 │ 2 2 2\n 3 │ 3 3 3", "text/html": [ "
3×3 DataFrame
Rowabc
Int64Int64Int64
1111
2222
3333
" ] }, "metadata": {}, "execution_count": 43 } ], "cell_type": "code", "source": [ "df[!, :b] = x\n", "df[:, :c] = x\n", "df" ], "metadata": {}, "execution_count": 43 }, { "cell_type": "markdown", "source": [ "no copy is performed" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "true" }, "metadata": {}, "execution_count": 44 } ], "cell_type": "code", "source": [ "df.b === x" ], "metadata": {}, "execution_count": 44 }, { "cell_type": "markdown", "source": [ "With copying\n", "`!` and `:` has different effects" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "false" }, "metadata": {}, "execution_count": 45 } ], "cell_type": "code", "source": [ "df.c === x" ], "metadata": {}, "execution_count": 45 }, { "cell_type": "markdown", "source": [ "Element-wise assignment" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m3×5 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m a \u001b[0m\u001b[1m b \u001b[0m\u001b[1m c \u001b[0m\u001b[1m d \u001b[0m\u001b[1m e \u001b[0m\n │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n─────┼───────────────────────────────────\n 1 │ 1 1 1 1 1\n 2 │ 2 2 2 2 2\n 3 │ 3 3 3 3 3", "text/html": [ "
3×5 DataFrame
Rowabcde
Int64Int64Int64Int64Int64
111111
222222
333333
" ] }, "metadata": {}, "execution_count": 46 } ], "cell_type": "code", "source": [ "df[!, :d] .= x\n", "df[:, :e] .= x\n", "df" ], "metadata": {}, "execution_count": 46 }, { "cell_type": "markdown", "source": [ "both copy, so in this case `!` and `:` has the same effect" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "(false, false)" }, "metadata": {}, "execution_count": 47 } ], "cell_type": "code", "source": [ "df.d === x, df.e === x" ], "metadata": {}, "execution_count": 47 }, { "cell_type": "markdown", "source": [ "note that in our data frame columns `:a` and `:b` store the vector `x` (not a copy)" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "true" }, "metadata": {}, "execution_count": 48 } ], "cell_type": "code", "source": [ "df.a === df.b === x" ], "metadata": {}, "execution_count": 48 }, { "cell_type": "markdown", "source": [ "This can lead to silent errors. For example this code leads to a bug (note that calling `pairs` on `eachcol(df)` creates an iterator of (column name, column) pairs):" ], "metadata": {} }, { "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a: 3\n", "b: 2\n", "c: 3\n", "d: 3\n", "e: 3\n" ] } ], "cell_type": "code", "source": [ "try\n", " for (n, c) in pairs(eachcol(df))\n", " println(\"$n: \", pop!(c))\n", " end\n", "catch e\n", " show(e)\n", "end" ], "metadata": {}, "execution_count": 49 }, { "cell_type": "markdown", "source": [ "note that for column `:b` we printed `2` as `3` was removed from it when we used `pop!` on column `:a`.\n", "Such mistakes sometimes happen. Because of this DataFrames.jl performs consistency checks before doing an expensive operation (most notably before showing a data frame)." ], "metadata": {} }, { "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AssertionError(\"Data frame is corrupt: length of column :c (2) does not match length of column 1 (1). The column vector has likely been resized unintentionally (either directly or because it is shared with another data frame).\")" ] } ], "cell_type": "code", "source": [ "try\n", " show(df)\n", "catch e\n", " show(e)\n", "end" ], "metadata": {}, "execution_count": 50 }, { "cell_type": "markdown", "source": [ "We can investigate the columns to find out what happened:" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "5-element Vector{Pair{Symbol, AbstractVector}}:\n :a => [1]\n :b => [1]\n :c => [1, 2]\n :d => [1, 2]\n :e => [1, 2]" }, "metadata": {}, "execution_count": 51 } ], "cell_type": "code", "source": [ "collect(pairs(eachcol(df)))" ], "metadata": {}, "execution_count": 51 }, { "cell_type": "markdown", "source": [ "The output confirms that the data frame `df` got corrupted.\n", "DataFrames.jl supports a complete set of `getindex`, `getproperty`, `setindex!`, `setproperty!`, `view`, broadcasting, and broadcasting assignment operations. The details are explained here: http://juliadata.github.io/DataFrames.jl/latest/lib/indexing/." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Comparisons" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "using DataFrames" ], "metadata": {}, "execution_count": 52 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\n │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n─────┼──────────────────────────────\n 1 │ 0.938257 0.811633 0.362354\n 2 │ 0.975227 0.753141 0.141313", "text/html": [ "
2×3 DataFrame
Rowx1x2x3
Float64Float64Float64
10.9382570.8116330.362354
20.9752270.7531410.141313
" ] }, "metadata": {}, "execution_count": 53 } ], "cell_type": "code", "source": [ "df = DataFrame(rand(2, 3), :auto)" ], "metadata": {}, "execution_count": 53 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\n │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n─────┼──────────────────────────────\n 1 │ 0.938257 0.811633 0.362354\n 2 │ 0.975227 0.753141 0.141313", "text/html": [ "
2×3 DataFrame
Rowx1x2x3
Float64Float64Float64
10.9382570.8116330.362354
20.9752270.7531410.141313
" ] }, "metadata": {}, "execution_count": 54 } ], "cell_type": "code", "source": [ "df2 = copy(df)" ], "metadata": {}, "execution_count": 54 }, { "cell_type": "markdown", "source": [ "compares column names and contents" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "true" }, "metadata": {}, "execution_count": 55 } ], "cell_type": "code", "source": [ "df == df2" ], "metadata": {}, "execution_count": 55 }, { "cell_type": "markdown", "source": [ "create a minimally different data frame and use `isapprox` for comparison" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m2×3 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m x1 \u001b[0m\u001b[1m x2 \u001b[0m\u001b[1m x3 \u001b[0m\n │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\n─────┼──────────────────────────────\n 1 │ 0.938257 0.811633 0.362354\n 2 │ 0.975227 0.753141 0.141313", "text/html": [ "
2×3 DataFrame
Rowx1x2x3
Float64Float64Float64
10.9382570.8116330.362354
20.9752270.7531410.141313
" ] }, "metadata": {}, "execution_count": 56 } ], "cell_type": "code", "source": [ "df3 = df2 .+ eps()" ], "metadata": {}, "execution_count": 56 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "false" }, "metadata": {}, "execution_count": 57 } ], "cell_type": "code", "source": [ "df == df3" ], "metadata": {}, "execution_count": 57 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "true" }, "metadata": {}, "execution_count": 58 } ], "cell_type": "code", "source": [ "isapprox(df, df3)" ], "metadata": {}, "execution_count": 58 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "false" }, "metadata": {}, "execution_count": 59 } ], "cell_type": "code", "source": [ "isapprox(df, df3, atol=eps() / 2)" ], "metadata": {}, "execution_count": 59 }, { "cell_type": "markdown", "source": [ "`missings` are handled as in Julia Base" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "\u001b[1m1×1 DataFrame\u001b[0m\n\u001b[1m Row \u001b[0m│\u001b[1m a \u001b[0m\n │\u001b[90m Missing \u001b[0m\n─────┼─────────\n 1 │\u001b[90m missing \u001b[0m", "text/html": [ "
1×1 DataFrame
Rowa
Missing
1missing
" ] }, "metadata": {}, "execution_count": 60 } ], "cell_type": "code", "source": [ "df = DataFrame(a=missing)" ], "metadata": {}, "execution_count": 60 }, { "cell_type": "markdown", "source": [ "Equality test shows missing." ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "missing" }, "metadata": {}, "execution_count": 61 } ], "cell_type": "code", "source": [ "df == df" ], "metadata": {}, "execution_count": 61 }, { "cell_type": "markdown", "source": [ "The same object?" ], "metadata": {} }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "true" }, "metadata": {}, "execution_count": 62 } ], "cell_type": "code", "source": [ "df === df" ], "metadata": {}, "execution_count": 62 }, { "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "true" }, "metadata": {}, "execution_count": 63 } ], "cell_type": "code", "source": [ "isequal(df, df)" ], "metadata": {}, "execution_count": 63 }, { "cell_type": "markdown", "source": [ "---\n", "\n", "*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*" ], "metadata": {} } ], "nbformat_minor": 3, "metadata": { "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.10.5" }, "kernelspec": { "name": "julia-1.10", "display_name": "Julia 1.10.5", "language": "julia" } }, "nbformat": 4 }