105學年度第二學期 巨量資料分析: 作業(1) 系級: 姓名: ############################################################################### # 請修課的同學親自打程式練習 [Week03-Lecture] (可互相討論,但不可以複製程式碼) # http://www.hmwu.idv.tw/web/R/B01-1-hmwu_R-DataManipulation.pdf # 範圍: Pages 43/112~90/112 (dplyr和tidyr的部份) # # 上傳期限: 2017/04/02, 24:00 # 上傳:教學網站【作業考試上傳區】,帳號: bigdata105 密碼: 1fxx (教室號碼) # 上傳目錄: 20170402-HW1-dplyr-tidyr # # 可自行寫註解,寫筆記,或修改程式跑跑看 # 不要只照著講義打,要邊打程式,邊想是什麼意思。 # 自學能力真的很重要! 人生是你的,不是老師的。 # 不要投機取巧或不老實,隨便打(或複製)一些程式碼上傳只為了應付, # 除了浪費時間,也毫無意義。 # 程式學習過程中本來就辛苦乏味,若有一天,你有機會發揮所學到的能力,就會快樂了~ # 有其它問題,請FB私訊老師,加油!! # # 貼上執行程式碼及執行結果(如下範例) ############################################################################### > # 45/112 > install.packages("nycflights13") Installing package into ‘C:/Users/userpc/Documents/R/win-library/3.3’ (as ‘lib’ is unspecified) Warning: package ‘nycflights13’ is in use and will not be installed > library(nycflights13) # 把套件叫進來 > dim(flights) # 看一下資料的維度 [1] 336776 19 > head(flights, 4) # A tibble: 4 × 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin 1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR 2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA 3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK 4 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK # ... with 6 more variables: dest , air_time , distance , hour , minute , time_hour > > # 46/112 > library(dplyr) > filter(flights, month == 1, day == 1) # 寫一下自己的註解 # A tibble: 842 × 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin 1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR 2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA 3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK 4 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK 5 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA 6 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR 7 2013 1 1 555 600 -5 913 854 19 B6 507 N516JB EWR 8 2013 1 1 557 600 -3 709 723 -14 EV 5708 N829AS LGA 9 2013 1 1 557 600 -3 838 846 -8 B6 79 N593JB JFK 10 2013 1 1 558 600 -2 753 745 8 AA 301 N3ALAA LGA # ... with 832 more rows, and 6 more variables: dest , air_time , distance , hour , minute , # time_hour > flights[flights$month == 1 & flights$day == 1, ] # A tibble: 842 × 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin 1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR 2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA 3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK 4 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK 5 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA 6 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR 7 2013 1 1 555 600 -5 913 854 19 B6 507 N516JB EWR 8 2013 1 1 557 600 -3 709 723 -14 EV 5708 N829AS LGA 9 2013 1 1 557 600 -3 838 846 -8 B6 79 N593JB JFK 10 2013 1 1 558 600 -2 753 745 8 AA 301 N3ALAA LGA # ... with 832 more rows, and 6 more variables: dest , air_time , distance , hour , minute , # time_hour > table(flights$carrier) 9E AA AS B6 DL EV F9 FL HA MQ OO UA US VX WN YV 18460 32729 714 54635 48110 54173 685 3260 342 26397 32 58665 20536 5162 12275 601 > filter(flights, carrier %in% c("OO", "YV")) # A tibble: 633 × 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin 1 2013 1 3 1428 1435 -7 1539 1559 -20 YV 3750 N509MJ LGA 2 2013 1 3 1551 1602 -11 1659 1722 -23 YV 3771 N508MJ LGA 3 2013 1 4 1430 1435 -5 1546 1559 -13 YV 3750 N511MJ LGA 4 2013 1 4 1731 1602 89 1837 1722 75 YV 3771 N513MJ LGA 5 2013 1 6 1557 1605 -8 1714 1729 -15 YV 3771 N511MJ LGA 6 2013 1 7 1430 1435 -5 1541 1559 -18 YV 3750 N507MJ LGA 7 2013 1 7 1556 1602 -6 1721 1722 -1 YV 3771 N509MJ LGA 8 2013 1 8 1432 1435 -3 1537 1559 -22 YV 3750 N513MJ LGA 9 2013 1 8 1555 1602 -7 1727 1722 5 YV 3771 N506MJ LGA 10 2013 1 9 1432 1435 -3 1543 1559 -16 YV 3750 N505MJ LGA # ... with 623 more rows, and 6 more variables: dest , air_time , distance , hour , minute , # time_hour > filter(flights, carrier != "AA") # 修改一下程式試試看 # A tibble: 304,047 × 19 year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin 1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR 2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA 3 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK 4 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA 5 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR 6 2013 1 1 555 600 -5 913 854 19 B6 507 N516JB EWR 7 2013 1 1 557 600 -3 709 723 -14 EV 5708 N829AS LGA 8 2013 1 1 557 600 -3 838 846 -8 B6 79 N593JB JFK 9 2013 1 1 558 600 -2 849 851 -2 B6 49 N793JB JFK 10 2013 1 1 558 600 -2 853 856 -3 B6 71 N657JB JFK # ... with 304,037 more rows, and 6 more variables: dest , air_time , distance , hour , minute , # time_hour > >