Comprehensive guide to accessing cloud storage (S3, GCS, Azure) and remote filesystems in Python. Covers three major libraries - fsspec, pyarrow.fs, and obstore - and their integration with data engineering tools.
| Best For | Broad compatibility, ecosystem integration | Arrow-native workflows, Parquet | High-throughput, performance-critical | | Backends | S3, GCS, Azure, HTTP, FTP, 20+ more | S3, GCS, HDFS, local | S3, GCS, Azure, local | | Performance | Good (with caching) | Excellent for Parquet | 9x faster for concurrent ops |
| Dependencies | Backend-specific (s3fs, gcsfs) | Bundled with PyArrow | Zero Python deps (Rust) | | Async Support | Yes (aiohttp) | Limited | Native sync/async | | DataFrame Integration | Universal | PyArrow-native | Via fsspec wrapper | | Maturity | Very mature (2018+) | Mature | New (2025), rapidly evolving |