fetch_flight_delays#
- skrub.datasets.fetch_flight_delays(data_home=None)[source]#
Fetch the flight delays dataset (regression) available at https://github.com/skrub-data/skrub-data-files
This is a regression use-case, where the goal is to predict flight delays. Size on disk: 657MB.
- Parameters:
- data_home
stror path-like, default=None The directory where to download and unzip the files.
- data_home
- Returns:
- bunch
Bunch A dictionary-like object with the following keys:
- flightsDataFrame of shape (2370030, 12)
Information about the flights, including departure and arrival airports, and delay.
- airportsDataFrame of shape (3376, 7)
Information about airports, such as city and coordinates. The airport’s
iatacan be matched to the flights’OriginandDest.- weatherDataFrame of shape (11282238, 5)
Weather data that could be used to help improve the delay predictions. Note the weather data is not measured at the airports directly but at weather stations, whose location and information is provided in
stations.- stationsdataframe of shape (124245, 9)
Information about the weather stations.
weatherandstationscan be joined on theirIDcolumns. Weather stations can only be matched to the nearest airport based on the latitude and longitude.- metadatadict
A dictionary containing the name of the dataset.
- flights_pathstr
The path to the flights CSV file.
- airports_pathstr
The path to the airports CSV file.
- weather_pathstr
The path to the weather CSV file.
- stations_pathstr
The path to the stations CSV file.
- bunch
Gallery examples#
Spatial join for flight data: Joining across multiple columns
Interpolation join: infer missing rows when joining two tables