In-Memory Analytics with Apache Arrow

This is the code repository for In-Memory Analytics with Apache Arrow , published by Packt.

Accelerate data analytics for efficient processing of flat and hierarchical data structures

What is this book about?

Apache Arrow is an open source, columnar in-memory data format designed for efficient data processing and analytics. This book harnesses the author’s 15 years of experience to show you a standardized way to work with tabular data across various programming languages and environments, enabling high-performance data processing and exchange.

This book covers the following exciting features:

Use Apache Arrow libraries to access data files, both locally and in the cloud
Understand the zero-copy elements of the Apache Arrow format
Improve the read performance of data pipelines by memory-mapping Arrow files
Produce and consume Apache Arrow data efficiently by sharing memory with the C API
Leverage the Arrow compute engine, Acero, to perform complex operations
Create Arrow Flight servers and clients for transferring data quickly
Build the Arrow libraries locally and contribute to the community

If you feel this book is for you, get your copy today!

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

>>> import numba.cuda
>>> import pyarrow as pa
>>> from pyarrow import cuda
>>> import numpy as np
>>> from pyarrow.cffi import ffi

Following is what you need for this book: This book is for developers, data engineers, and data scientists looking to explore the capabilities of Apache Arrow from the ground up. Whether you’re building utilities for data analytics and query engines, or building full pipelines with tabular data, this book can help you out regardless of your preferred programming language. A basic understanding of data analysis concepts is needed, but not necessary. Code examples are provided using C++, Python, and Go throughout the book.

With the following software and hardware list you can run all code files present in the book (Chapter 1-12).

Software and Hardware List

Chapter	Software required	OS required
1-12	Python 3.8 or higher	Windows, Mac OS X, and Linux (Any)
1-12	C++ compiler capable of C++17 or higher	Windows, macOS, or Linux
1-12	conda/mamba (optional)	Windows, macOS, or Linux
1-12	vcpkg (optional)	Windows
1-12	MSYS2 (optional)	Windows
1-12	CMake 3.16 or higher	Windows, macOS, or Linux
1-12	make or ninja	macOS or Linux
1-12	Docker	Windows, macOS, or Linux
1-12	Go 1.21 or higher	Windows, macOS

Related products

Polars Cookbook [Packt] [Amazon]
Getting Started with DuckDB [Packt] [Amazon]

Get to Know the Author

Matt Topol is a software engineering enthusiast with roots at Brooklyn Polytechnic (now NYU-Poly). He joined FactSet Research Systems, Inc. in 2009, specializing in financial data systems. Matt’s career spans infrastructure and app development, team leadership, and architecting large-scale distributed systems for financial analytics. He is a key member of the Apache Arrow Project Management Committee (PMC), dedicated to expanding the Arrow community. Recently, he joined Voltron Data to focus on Arrow’s Golang library, promoting it globally through conferences and talks. Originally from Brentwood, NY, he now resides in Connecticut. Outside of work, Matt enjoys coding, creating intricate fantasy games, and enthusiastically sharing his expertise.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
chapter1		chapter1
chapter2		chapter2
chapter3		chapter3
chapter4		chapter4
chapter5		chapter5
chapter6/cpp		chapter6/cpp
chapter7		chapter7
chapter8		chapter8
chapter9		chapter9
sample_data		sample_data
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

In-Memory Analytics with Apache Arrow

What is this book about?

Instructions and Navigations

Software and Hardware List

Related products

Get to Know the Author

About

Uh oh!

Releases

Packages

Languages

License

CuriousDima/In-Memory-Analytics-with-Apache-Arrow

Folders and files

Latest commit

History

Repository files navigation

In-Memory Analytics with Apache Arrow

What is this book about?

Instructions and Navigations

Software and Hardware List

Related products

Get to Know the Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages