# Basic information about a data frame

In [1]:
using DataFrames

Let's start by creating a `DataFrame` object, `x`, so that we can learn how to get information on that data frame.

In [2]:
x = DataFrame(A=[1, 2], B=[1.0, missing], C=["a", "b"])

Row,A,B,C
Unnamed: 0_level_1,Int64,Float64?,String
1,1,1.0,a
2,2,missing,b


The standard `size` function works to get dimensions of the `DataFrame`,

In [3]:
size(x), size(x, 1), size(x, 2)

((2, 3), 2, 3)

as well as `nrow` and `ncol` from R.

In [4]:
nrow(x), ncol(x)

(2, 3)

`describe` gives basic summary statistics of data in your `DataFrame` (check out the help of `describe` for information on how to customize shown statistics).

In [5]:
describe(x)

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,Type
1,A,1.5,1,1.5,2,0,Int64
2,B,1.0,1.0,1.0,1.0,1,"Union{Missing, Float64}"
3,C,,a,,b,0,String


you can limit the columns shown by `describe` using `cols` keyword argument

In [6]:
describe(x, cols=1:2)

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Float64,Real,Float64,Real,Int64,Type
1,A,1.5,1.0,1.5,2.0,0,Int64
2,B,1.0,1.0,1.0,1.0,1,"Union{Missing, Float64}"


`names` will return the names of all columns as strings

In [7]:
names(x)

3-element Vector{String}:
 "A"
 "B"
 "C"

you can also get column names with a given element type (`eltype`):

In [8]:
names(x, String)

1-element Vector{String}:
 "C"

use `propertynames` to get a vector of `Symbol`s:

In [9]:
propertynames(x)

3-element Vector{Symbol}:
 :A
 :B
 :C

`eltype` on `eachcol(x)` returns element types of columns:

In [10]:
eltype.(eachcol(x))

3-element Vector{Type}:
 Int64
 Union{Missing, Float64}
 String

Here we create some large `DataFrame`

In [11]:
y = DataFrame(rand(1:10, 1000, 10), :auto)

Row,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,1,3,6,5,3,3,1,8,8,3
2,10,8,1,7,8,3,10,10,7,9
3,9,9,9,4,2,8,3,4,10,8
4,6,4,6,7,8,2,3,5,8,1
5,3,1,8,5,4,6,4,8,5,1
6,7,2,1,7,3,8,6,8,2,4
7,2,9,9,4,6,7,8,7,2,1
8,3,3,5,10,7,7,8,2,2,7
9,8,10,6,2,3,7,3,2,1,3
10,3,9,9,5,5,9,5,6,3,6


and then we can use `first` to peek into its first few rows

In [12]:
first(y, 5)

Row,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,1,3,6,5,3,3,1,8,8,3
2,10,8,1,7,8,3,10,10,7,9
3,9,9,9,4,2,8,3,4,10,8
4,6,4,6,7,8,2,3,5,8,1
5,3,1,8,5,4,6,4,8,5,1


and `last` to see its bottom rows.

In [13]:
last(y, 3)

Row,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,4,8,1,1,3,3,2,3,8,8
2,1,9,4,4,10,6,4,3,8,2
3,7,1,4,2,1,8,1,1,7,3


Using `first` and `last` without number of rows will return a first/last `DataFrameRow` in the `DataFrame`

In [14]:
first(y)

Row,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,1,3,6,5,3,3,1,8,8,3


In [15]:
last(y)

Row,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1000,7,1,4,2,1,8,1,1,7,3


## Displaying large data frames
Create a wide and tall data frame:

In [16]:
df = DataFrame(rand(100, 100), :auto)

Row,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x50,x51,x52,x53,x54,x55,x56,x57,x58,x59,x60,x61,x62,x63,x64,x65,x66,x67,x68,x69,x70,x71,x72,x73,x74,x75,x76,x77,x78,x79,x80,x81,x82,x83,x84,x85,x86,x87,x88,x89,x90,x91,x92,x93,x94,x95,x96,x97,x98,x99,x100
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,0.789962,0.0624244,0.735969,0.460358,0.995564,0.778658,0.653409,0.8649,0.744414,0.718615,0.817303,0.842155,0.90281,0.678668,0.196716,0.400046,0.975851,0.408243,0.129669,0.1608,0.728455,0.810081,0.94314,0.949985,0.382518,0.398125,0.316978,0.565363,0.217461,0.181603,0.170618,0.994069,0.686484,0.0137787,0.0881711,0.637445,0.59441,0.310437,0.087433,0.980031,0.176949,0.0105923,0.367188,0.96485,0.50751,0.144118,0.635031,0.51495,0.955666,0.196045,0.93247,0.348846,0.151913,0.511562,0.371784,0.276612,0.927345,0.574142,0.256917,0.823203,0.80771,0.34148,0.520293,0.121687,0.840483,0.204535,0.216077,0.29907,0.739836,0.532872,0.890434,0.662174,0.364033,0.00789043,0.316984,0.891117,0.332019,0.329171,0.25549,0.891567,0.569998,0.565552,0.962039,0.715544,0.819078,0.731385,0.496026,0.6974,0.821201,0.708521,0.525228,0.491904,0.582272,0.91102,0.898055,0.81731,0.948104,0.0559782,0.65191,0.782342
2,0.906916,0.801493,0.916937,0.303229,0.0785325,0.504828,0.11424,0.0990922,0.173723,0.764554,0.661564,0.677853,0.850174,0.426791,0.803742,0.849076,0.629116,0.899457,0.0307474,0.367295,0.276108,0.317422,0.287991,0.733444,0.602123,0.0883216,0.988887,0.0222579,0.421692,0.046251,0.338589,0.658156,0.420062,0.483967,0.269088,0.912539,0.521364,0.398309,0.904374,0.128759,0.845563,0.265137,0.940024,0.909791,0.174972,0.379679,0.72579,0.511081,0.845378,0.768309,0.865279,0.598697,0.961984,0.428272,0.697565,0.475775,0.0471513,0.337792,0.696306,0.128999,0.291203,0.266301,0.25725,0.813233,0.322388,0.765776,0.632218,0.897858,0.536752,0.466092,0.975,0.019028,0.104923,0.0942328,0.253896,0.216972,0.16876,0.199435,0.365484,0.520608,0.580947,0.346234,0.46632,0.387265,0.573617,0.253971,0.935074,0.123601,0.603316,0.2704,0.623716,0.40031,0.336837,0.994516,0.711772,0.498722,0.588764,0.405056,0.762578,0.810863
3,0.191172,0.640318,0.591957,0.768282,0.976985,0.947779,0.0449646,0.702167,0.806419,0.760546,0.462788,0.665196,0.68471,0.333927,0.273586,0.343196,0.614686,0.23853,0.665337,0.640259,0.12159,0.543603,0.746536,0.223446,0.780712,0.139613,0.857278,0.527547,0.303426,0.668916,0.746008,0.852587,0.506523,0.565938,0.736907,0.434405,0.597016,0.817368,0.448656,0.802766,0.578649,0.661492,0.762909,0.886029,0.99614,0.525082,0.932952,0.0848876,0.542636,0.0237858,0.731222,0.403012,0.0551489,0.24299,0.517912,0.850367,0.594318,0.62938,0.0900461,0.281503,0.601483,0.377034,0.781816,0.23537,0.770827,0.301995,0.918257,0.415067,0.88697,0.0819572,0.254009,0.247897,0.902931,0.403027,0.89484,0.0262883,0.0191619,0.0845353,0.69958,0.848151,0.826964,0.0973835,0.521548,0.535922,0.895431,0.228704,0.677998,0.481316,0.0656549,0.412536,0.429962,0.0591372,0.465167,0.189953,0.976226,0.350392,0.535618,0.819281,0.354681,0.596328
4,0.152854,0.519964,0.588042,0.586559,0.355327,0.272534,0.705556,0.94091,0.279872,0.137491,0.926185,0.591034,0.484718,0.427958,0.451731,0.590535,0.481584,0.0655017,0.79735,0.255349,0.60009,0.898634,0.99412,0.410166,0.87283,0.0498867,0.773406,0.646541,0.0560735,0.530586,0.217024,0.337691,0.092964,0.13451,0.127027,0.732276,0.744345,0.0786001,0.236577,0.193732,0.806846,0.0803981,0.167407,0.627898,0.47613,0.618222,0.102311,0.927473,0.91135,0.255356,0.518048,0.486895,0.53087,0.401906,0.689675,0.171761,0.669048,0.0621531,0.59158,0.549113,0.692023,0.481487,0.294933,0.589698,0.0226371,0.764085,0.777256,0.258074,0.87056,0.365023,0.269927,0.672063,0.0639364,0.227701,0.786819,0.811428,0.790961,0.116625,0.454946,0.779165,0.509907,0.958328,0.00392277,0.739633,0.100159,0.752988,0.150669,0.0670422,0.509912,0.747937,0.781694,0.179175,0.877489,0.266761,0.750775,0.604885,0.227372,0.102778,0.519049,0.83224
5,0.438492,0.979425,0.0891575,0.945882,0.730485,0.642267,0.878645,0.163101,0.0735955,0.21277,0.188028,0.43106,0.773998,0.391424,0.274476,0.952568,0.319698,0.25497,0.890972,0.370678,0.946526,0.796653,0.0266869,0.28131,0.558991,0.151908,0.017412,0.605246,0.681085,0.931187,0.339746,0.99863,0.634039,0.267856,0.628497,0.96877,0.494759,0.10337,0.849618,0.847402,0.173661,0.70678,0.336419,0.674687,0.328679,0.610705,0.0762114,0.116168,0.879191,0.874264,0.365542,0.705772,0.995417,0.639454,0.718854,0.120333,0.931438,0.204427,0.0598947,0.393186,0.563545,0.657745,0.501017,0.416358,0.897751,0.563896,0.88921,0.705617,0.102253,0.624221,0.791442,0.858432,0.342829,0.610995,0.263944,0.813046,0.682503,0.311277,0.33614,0.373246,0.380557,0.842904,0.0340886,0.977756,0.827696,0.905881,0.633318,0.201186,0.0448009,0.136333,0.852752,0.890638,0.207635,0.593463,0.922308,0.302028,0.74313,0.911329,0.741833,0.189258
6,0.978017,0.685007,0.13794,0.527277,0.959323,0.353505,0.883649,0.497946,0.508299,0.364806,0.884226,0.318954,0.191665,0.529934,0.283208,0.584212,0.642504,0.397067,0.439616,0.332097,0.758513,0.767744,0.252775,0.874276,0.512381,0.855634,0.768054,0.247361,0.893618,0.677997,0.849732,0.200949,0.916447,0.473907,0.887693,0.204561,0.3219,0.0677458,0.403429,0.210113,0.648227,0.840452,0.740517,0.821338,0.248555,0.248092,0.00993002,0.950479,0.508806,0.668246,0.435753,0.543561,0.51706,0.0568403,0.573082,0.772492,0.58292,0.657442,0.580815,0.0526001,0.937641,0.274714,0.157932,0.118331,0.483643,0.862516,0.896659,0.374468,0.49819,0.337619,0.360727,0.35471,0.0689635,0.86436,0.825771,0.859273,0.787955,0.937837,0.596613,0.175667,0.957355,0.812427,0.125764,0.918091,0.4757,0.864551,0.192622,0.874821,0.491157,0.830136,0.702021,0.580504,0.63501,0.997995,0.113292,0.355345,0.448688,0.31359,0.742707,0.784953
7,0.355382,0.387504,0.0386789,0.817752,0.920758,0.947085,0.267968,0.508267,0.0744801,0.0416993,0.416237,0.154995,0.795377,0.863224,0.725302,0.858817,0.556121,0.0196886,0.640486,0.664734,0.801778,0.174084,0.747256,0.47967,0.0561347,0.947339,0.969936,0.392161,0.441867,0.962798,0.432254,0.211471,0.503355,0.233193,0.279647,0.907691,0.802452,0.538268,0.238805,0.473371,0.143532,0.502939,0.878033,0.0284768,0.154575,0.475727,0.980766,0.837161,0.485342,0.355029,0.310826,0.195347,0.00953382,0.513923,0.0947596,0.0375894,0.351014,0.094923,0.513024,0.859913,0.90293,0.789147,0.37969,0.897667,0.0389584,0.189168,0.603927,0.423735,0.839033,0.991428,0.854764,0.43871,0.918347,0.927122,0.489236,0.763886,0.892663,0.19698,0.520183,0.509847,0.706177,0.766694,0.0136834,0.156429,0.132012,0.791273,0.47403,0.274121,0.521274,0.592208,0.698895,0.802799,0.648142,0.0471296,0.633471,0.995833,0.190746,0.431809,0.814291,0.260273
8,0.623965,0.573277,0.380789,0.272908,0.484304,0.343541,0.634027,0.307012,0.862936,0.230815,0.652491,0.79243,0.514264,0.571855,0.0450283,0.741723,0.425562,0.760773,0.848076,0.491073,0.0404891,0.636267,0.00637674,0.243197,0.804827,0.78336,0.301242,0.335687,0.931932,0.247274,0.912292,0.449109,0.795803,0.641221,0.176784,0.969964,0.670471,0.995402,0.0421617,0.522538,0.333508,0.0797808,0.546625,0.215847,0.6817,0.535863,0.260338,0.843419,0.889026,0.506287,0.713106,0.695158,0.642464,0.0573208,0.805722,0.536209,0.624832,0.698036,0.0400037,0.721292,0.960602,0.607294,0.772479,0.738393,0.198255,0.100064,0.802141,0.304446,0.425035,0.905866,0.157392,0.0451183,0.953696,0.752476,0.456385,0.543769,0.867963,0.976257,0.866777,0.0794833,0.959075,0.691634,0.744851,0.421241,0.978213,0.471006,0.489963,0.966642,0.854582,0.769771,0.248433,0.962077,0.0555362,0.628293,0.938299,0.920196,0.140157,0.796787,0.567865,0.407918
9,0.924967,0.774597,0.232752,0.204111,0.366567,0.902385,0.921746,0.473791,0.472789,0.461728,0.511422,0.22958,0.182011,0.0234031,0.741959,0.609675,0.406637,0.398294,0.622153,0.404439,0.437776,0.00542105,0.748615,0.259586,0.524832,0.648169,0.55465,0.48709,0.523374,0.231486,0.592701,0.681264,0.488907,0.430831,0.627875,0.351296,0.220672,0.496731,0.988125,0.915999,0.604512,0.374545,0.403834,0.807666,0.475156,0.359549,0.691342,0.82921,0.842037,0.665612,0.752709,0.326745,0.369885,0.745714,0.501246,0.124589,0.889516,0.866604,0.856362,0.667124,0.446837,0.68607,0.762944,0.705123,0.157266,0.994174,0.16359,0.63707,0.443053,0.764881,0.51556,0.846679,0.00181272,0.200586,0.781975,0.223445,0.838355,0.171095,0.475408,0.0767908,0.86955,0.518845,0.461329,0.795333,0.891451,0.425662,0.995346,0.562735,0.187163,0.751779,0.245888,0.239954,0.570511,0.593536,0.580887,0.93795,0.797648,0.101556,0.130643,0.132767
10,0.870018,0.523782,0.146663,0.0688282,0.910187,0.030617,0.7195,0.424132,0.739784,0.106837,0.913481,0.310718,0.0253776,0.730706,0.69732,0.220479,0.121729,0.772284,0.792031,0.175478,0.242331,0.94515,0.620682,0.25098,0.773386,0.474092,0.64773,0.531523,0.63011,0.47455,0.0115982,0.233914,0.730183,0.240237,0.432014,0.0660656,0.579257,0.801662,0.734037,0.280745,0.467608,0.227433,0.318556,0.070222,0.0707096,0.455186,0.514461,0.314014,0.64161,0.0681791,0.45958,0.483418,0.308741,0.124927,0.880256,0.850842,0.0393761,0.0398096,0.122609,0.488181,0.160364,0.112608,0.220929,0.207202,0.494636,0.525268,0.0256017,0.857939,0.509653,0.624037,0.506282,0.385628,0.363783,0.589266,0.752108,0.80989,0.731094,0.870923,0.663211,0.787562,0.128138,0.826731,0.838876,0.762065,0.694518,0.0854203,0.149463,0.711009,0.895898,0.891102,0.0596153,0.934214,0.405129,0.95902,0.0679759,0.223175,0.731698,0.958341,0.490405,0.0775291


we can see that 92 of its columns were not printed. Also we get its first 30 rows. You can easily change this behavior by changing the value of `ENV["LINES"]` and `ENV["COLUMNS"]`.

In [17]:
withenv("LINES" => 10, "COLUMNS" => 200) do
    show(df)
end

100×100 DataFrame
 Row │ x1        x2         x3         x4        x5         x6        x7         x8         x9        x10        x11       x12       x13        x14       x15       x16       x17       x18        x19 ⋯
     │ Float64   Float64    Float64    Float64   Float64    Float64   Float64    Float64    Float64   Float64    Float64   Float64   Float64    Float64   Float64   Float64   Float64   Float64    Flo ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 0.789962  0.0624244  0.735969   0.460358  0.995564   0.778658  0.653409   0.8649     0.744414  0.718615   0.817303  0.842155  0.90281    0.678668  0.196716  0.400046  0.975851  0.408243   0.1 ⋯
  ⋮  │    ⋮          ⋮          ⋮         ⋮          ⋮         ⋮          ⋮          ⋮         ⋮          ⋮         ⋮         ⋮          ⋮         ⋮         ⋮         ⋮         ⋮

### Most elementary get and set operations
Given the `DataFrame` `x` we have created earlier, here are various ways to grab one of its columns as a `Vector`.

In [18]:
x = DataFrame(A=[1, 2], B=[1.0, missing], C=["a", "b"])

Row,A,B,C
Unnamed: 0_level_1,Int64,Float64?,String
1,1,1.0,a
2,2,missing,b


all get the vector stored in our DataFrame without copying it

In [19]:
x.A, x[!, 1], x[!, :A]

([1, 2], [1, 2], [1, 2])

the same using string indexing

In [20]:
x."A", x[!, "A"]

([1, 2], [1, 2])

note that this creates a copy

In [21]:
x[:, 1]

2-element Vector{Int64}:
 1
 2

In [22]:
x[:, 1] === x[:, 1]

false

To grab one row as a `DataFrame`, we can index as follows.

In [23]:
x[1:1, :]

Row,A,B,C
Unnamed: 0_level_1,Int64,Float64?,String
1,1,1.0,a


this produces a DataFrameRow which is treated as 1-dimensional object similar to a NamedTuple

In [24]:
x[1, :]

Row,A,B,C
Unnamed: 0_level_1,Int64,Float64?,String
1,1,1.0,a


We can grab a single cell or element with the same syntax to grab an element of an array.

In [25]:
x[1, 1]

1

or a new `DataFrame` that is a subset of rows and columns

In [26]:
x[1:2, 1:2]

Row,A,B
Unnamed: 0_level_1,Int64,Float64?
1,1,1.0
2,2,missing


You can also use `Regex` to select columns and `Not` from InvertedIndices.jl both to select rows and columns

In [27]:
x[Not(1), r"A"]

Row,A
Unnamed: 0_level_1,Int64
1,2


`!` indicates that underlying columns are not copied

In [28]:
x[!, Not(1)]

Row,B,C
Unnamed: 0_level_1,Float64?,String
1,1.0,a
2,missing,b


`:` means that the columns will get copied

In [29]:
x[:, Not(1)]

Row,B,C
Unnamed: 0_level_1,Float64?,String
1,1.0,a
2,missing,b


Assignment of a scalar to a data frame can be done in ranges using broadcasting:

In [30]:
x[1:2, 1:2] .= 1
x

Row,A,B,C
Unnamed: 0_level_1,Int64,Float64?,String
1,1,1.0,a
2,1,1.0,b


Assignment of a vector of length equal to the number of assigned rows using broadcasting

In [31]:
x[1:2, 1:2] .= [1, 2]
x

Row,A,B,C
Unnamed: 0_level_1,Int64,Float64?,String
1,1,1.0,a
2,2,2.0,b


Assignment or of another data frame of matching size and column names, again using broadcasting:

In [32]:
x[1:2, 1:2] .= DataFrame([5 6; 7 8], [:A, :B])
x

Row,A,B,C
Unnamed: 0_level_1,Int64,Float64?,String
1,5,6.0,a
2,7,8.0,b


**Caution**

With `df[!, :col]` and `df.col` syntax you get a direct (non copying) access to a column of a data frame.
This is potentially unsafe as you can easily corrupt data in the `df` data frame if you resize, sort, etc. the column obtained in this way.
Therefore such access should be used with caution.

Similarly `df[!, cols]` when `cols` is a collection of columns produces a new data frame that holds the same (not copied) columns as the source `df` data frame. Similarly, modifying the data frame obtained via `df[!, cols]` might cause problems with the consistency of `df`.

The `df[:, :col]` and `df[:, cols]` syntaxes always copy columns so they are safe to use (and should generally be preferred except for performance or memory critical use cases).

Here are examples of how `Cols` and `Between` can be used to select columns of a data frame.

In [33]:
x = DataFrame(rand(4, 5), :auto)

Row,x1,x2,x3,x4,x5
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64
1,0.143664,0.243967,0.541132,0.90432,0.253281
2,0.649823,0.974319,0.523402,0.547555,0.906718
3,0.532381,0.154004,0.618354,0.0428888,0.897352
4,0.878731,0.371432,0.413427,0.829586,0.771033


In [34]:
x[:, Between(:x2, :x4)]

Row,x2,x3,x4
Unnamed: 0_level_1,Float64,Float64,Float64
1,0.243967,0.541132,0.90432
2,0.974319,0.523402,0.547555
3,0.154004,0.618354,0.0428888
4,0.371432,0.413427,0.829586


In [35]:
x[:, Cols("x1", Between("x2", "x4"))]

Row,x1,x2,x3,x4
Unnamed: 0_level_1,Float64,Float64,Float64,Float64
1,0.143664,0.243967,0.541132,0.90432
2,0.649823,0.974319,0.523402,0.547555
3,0.532381,0.154004,0.618354,0.0428888
4,0.878731,0.371432,0.413427,0.829586


## Views
You can simply create a view of a `DataFrame` (it is more efficient than creating a materialized selection). Here are the possible return value options.

In [36]:
@view x[1:2, 1]

2-element view(::Vector{Float64}, 1:2) with eltype Float64:
 0.14366370663453387
 0.6498227286696711

In [37]:
@view x[1, 1]

0-dimensional view(::Vector{Float64}, 1) with eltype Float64:
0.14366370663453387

a DataFrameRow, the same as for x[1, 1:2] without a view

In [38]:
@view x[1, 1:2]

Row,x1,x2
Unnamed: 0_level_1,Float64,Float64
1,0.143664,0.243967


a SubDataFrame

In [39]:
@view x[1:2, 1:2]

Row,x1,x2
Unnamed: 0_level_1,Float64,Float64
1,0.143664,0.243967
2,0.649823,0.974319


## Adding new columns to a data frame

In [40]:
df = DataFrame()

using `setproperty!` (element assignment)

In [41]:
x = [1, 2, 3]
df.a = x
df

Row,a
Unnamed: 0_level_1,Int64
1,1
2,2
3,3


no copy is performed (sharing the same memory address)

In [42]:
df.a === x

true

using `setindex!`

In [43]:
df[!, :b] = x
df[:, :c] = x
df

Row,a,b,c
Unnamed: 0_level_1,Int64,Int64,Int64
1,1,1,1
2,2,2,2
3,3,3,3


no copy is performed

In [44]:
df.b === x

true

With copying
`!` and `:` has different effects

In [45]:
df.c === x

false

Element-wise assignment

In [46]:
df[!, :d] .= x
df[:, :e] .= x
df

Row,a,b,c,d,e
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64
1,1,1,1,1,1
2,2,2,2,2,2
3,3,3,3,3,3


both copy, so in this case `!` and `:` has the same effect

In [47]:
df.d === x, df.e === x

(false, false)

note that in our data frame columns `:a` and `:b` store the vector `x` (not a copy)

In [48]:
df.a === df.b === x

true

This can lead to silent errors. For example this code leads to a bug (note that calling `pairs` on `eachcol(df)` creates an iterator of (column name, column) pairs):

In [49]:
try
    for (n, c) in pairs(eachcol(df))
        println("$n: ", pop!(c))
    end
catch e
    show(e)
end

a: 3
b: 2
c: 3
d: 3
e: 3


note that for column `:b` we printed `2` as `3` was removed from it when we used `pop!` on column `:a`.
Such mistakes sometimes happen. Because of this DataFrames.jl performs consistency checks before doing an expensive operation (most notably before showing a data frame).

In [50]:
try
    show(df)
catch e
    show(e)
end

AssertionError("Data frame is corrupt: length of column :c (2) does not match length of column 1 (1). The column vector has likely been resized unintentionally (either directly or because it is shared with another data frame).")

We can investigate the columns to find out what happened:

In [51]:
collect(pairs(eachcol(df)))

5-element Vector{Pair{Symbol, AbstractVector}}:
 :a => [1]
 :b => [1]
 :c => [1, 2]
 :d => [1, 2]
 :e => [1, 2]

The output confirms that the data frame `df` got corrupted.
DataFrames.jl supports a complete set of `getindex`, `getproperty`, `setindex!`, `setproperty!`, `view`, broadcasting, and broadcasting assignment operations. The details are explained here: http://juliadata.github.io/DataFrames.jl/latest/lib/indexing/.

## Comparisons

In [52]:
using DataFrames

In [53]:
df = DataFrame(rand(2, 3), :auto)

Row,x1,x2,x3
Unnamed: 0_level_1,Float64,Float64,Float64
1,0.938257,0.811633,0.362354
2,0.975227,0.753141,0.141313


In [54]:
df2 = copy(df)

Row,x1,x2,x3
Unnamed: 0_level_1,Float64,Float64,Float64
1,0.938257,0.811633,0.362354
2,0.975227,0.753141,0.141313


compares column names and contents

In [55]:
df == df2

true

create a minimally different data frame and use `isapprox` for comparison

In [56]:
df3 = df2 .+ eps()

Row,x1,x2,x3
Unnamed: 0_level_1,Float64,Float64,Float64
1,0.938257,0.811633,0.362354
2,0.975227,0.753141,0.141313


In [57]:
df == df3

false

In [58]:
isapprox(df, df3)

true

In [59]:
isapprox(df, df3, atol=eps() / 2)

false

`missings` are handled as in Julia Base

In [60]:
df = DataFrame(a=missing)

Row,a
Unnamed: 0_level_1,Missing
1,missing


Equality test shows missing.

In [61]:
df == df

missing

The same object?

In [62]:
df === df

true

In [63]:
isequal(df, df)

true

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*