Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
202 views
in Technique[技术] by (71.8m points)

r - html_nodes return an empty list for complex table

I would like to webscraping the table in the following website: https://www.timeshighereducation.com/world-university-rankings/2021/world-ranking#!/page/0/length/25/sort_by/rank/sort_order/asc/cols/stats I am using the following code but it is not working, thank you in advance.

library(rvest)
library(xml2)
library(dplyr)
link <- "https://www.timeshighereducation.com/world-university-rankings/2021/world-ranking#!/page/0/length/25/sort_by/rank/sort_order/asc/cols/stats"
page<- read_html(link)


rank<- page %>% html_nodes(".sorting_2") %>% html_text()
university<-page %>% html_nodes(".ranking-institution-title ") %>% html_text() 
statistics<-page %>% html_nodes(".stats") %>% html_text() 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The Terms and Services of this site state: "Use data mining, robot, spider, scraping or similar automated data gathering, extraction or publication tools for any purpose."

That being said, you can read the json file that @QHarr found:

library(jsonlite)
url <- "https://www.timeshighereducation.com/sites/default/files/the_data_rankings/world_university_rankings_2021_0__fa224219a267a5b9c4287386a97c70ea.json"
x <- read_json(url, simplifyVector = TRUE)
head(x$data) # give you the data frame with universities

Now you have a well structured R list. The $data element contains a data frame with the stats of each university in rows. The other 3 list elements only provide supplementary information.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...