fetch_flight_delays#

skrub.datasets.fetch_flight_delays(data_home=None)[source]#

Fetch the flight delays dataset (regression) available at https://github.com/skrub-data/skrub-data-files

This is a regression use-case, where the goal is to predict flight delays. Size on disk: 657MB.

Parameters:
data_homestr or path-like, default=None

The directory where to download and unzip the files.

Returns:
bunchBunch

A dictionary-like object with the following keys:

flightsDataFrame of shape (2370030, 12)

Information about the flights, including departure and arrival airports, and delay.

airportsDataFrame of shape (3376, 7)

Information about airports, such as city and coordinates. The airport’s iata can be matched to the flights’ Origin and Dest.

weatherDataFrame of shape (11282238, 5)

Weather data that could be used to help improve the delay predictions. Note the weather data is not measured at the airports directly but at weather stations, whose location and information is provided in stations.

stationsdataframe of shape (124245, 9)

Information about the weather stations. weather and stations can be joined on their ID columns. Weather stations can only be matched to the nearest airport based on the latitude and longitude.

metadatadict

A dictionary containing the name of the dataset.

flights_pathstr

The path to the flights CSV file.

airports_pathstr

The path to the airports CSV file.

weather_pathstr

The path to the weather CSV file.

stations_pathstr

The path to the stations CSV file.