Table of Contents generated with DocToc
- Getting Started with Rust
- Some links on Rust
- Borrowing and Lifetime Tricks
- Macros
- Cool Rust Projects
- Rust Error Handling
- Rust Concurrency
- Data Processing and Data Structures
- Rust and Scala/Java
- CLI and Misc
- IDE/Editor/Tooling
- Testing and CI/CD
- Performance and Low-Level Stuff
Getting Started with Rust
Why the developers who use Rust love it so much - from StackOverflow survey, really good quotes
If you want a Rust REPL, check out evcxr.
I highly recommend rust-analyzer to support fast compile checks, references, refactorings, etc. in your editor. VSCode works pretty well - install rust-analyzer and the "Even Better TOML" extension and you should be set.
- The Rust Book - probably the best starting point
- Rustlings - small exercises to learn
- Easy Rust Youtube Channel - great videos
- Rust By Example - also the guide on their site is pretty good.
- Complete(sh) Rust Cheat Sheet
- explaine.rs - paste Rust code into the window and hover over keywords to get explanations. Great for learning.
- Rustlang in a Nutshell - great introduction
- Mental models for learning Rust - really really short blurb on how to approach learning Rust
- Rust Borrowing and Ownership - easy-to-read, short summary of basic ownership, borrowing, and lifetime references
- A Java Programmer Understanding Rust Ownership
- Rust Error Handling for Pythonistas
- Zero to Production in Rust
Easy short intros:
- A Rust Gem: The Rust Map API - comparing C++, Java, and Rust Map APIs and why Option and not having nulls makes the Rust Map API superior
Online resources and help:
- cheats.rs - Awesome quick ref.
- The Rust Discord #beginners channel has been pretty helpful for me
- Rust IRC channel
- Rust for Rubyists
- Rust Playpen - closest thing to a REPL :(
- makepad - Web-based Rust + WebASM multimedia playground
- Awesome Rust Mentors
Some links on Rust
-
Rust Design Patterns - super helpful resource
-
What you Can't Do in Rust and What To Do Instead - great guide for anti-patterns
-
Rust: A Unique Perspective - comprehensive summary about Rust ownership from angle of unique access, covers RC/Arc etc.
-
Rust is for Professionals - great perspective on what makes Rust unique and so appealing
-
Tour of Rust's Standard Library Traits - really great detailed guide with an explanation about traits, generics, associated types, etc.
-
- Common Rust Lifetime Misconceptions -- a great detailed dive into nuances
-
Learn Rust with Too Many Linked Lists - hilarious.
-
Shared Mutability in Rust: Acyclic Graphs - really good article on mutable child entities, and how to share things which need to be mutable (hint: don't, instead use an "arena" pattern where a single owner mutates things)
-
Jon Gjengset on Rust Lifetime Annotations - actually check out his Youtube channel, lots of great tutorials
-
The Evolution of Rust Programmers - hilarious look at different coding styles
-
Fireflowers: Rust in the words of its Practitioners - just brilliant commentary on what Rust is.
-
Oxidizing the Interview - hilarious read on a Rust technical interview
-
Rust and the Three Laws of Informatics - great detailed guide to how Rust allows developers to uncompromisingly achieve correctness, maintainability, AND efficiency
-
Why Scientists are turning to Rust - from Nature mag
-
Rust After the Honeymoon - by Bryan Cantrill, a list of top favorite Rust tricks/properties. Did you know that
{:#x?}
will pretty-print structs in HEX?? -
Prefer Rust over C/C++ - when to and when not to prefer Rust
-
- C2Rust and Quake - a tool to auto translate C to Rust!
-
Clear Explanation of Rust's Module System - easy intro guide
-
On Rusts Module System - good explanation of paths, naming, modules -- see this when compiler complains about cannot find symbols
-
Null The Billion Dollar Mistake and how Rust Provides a Solution
Speed without wizardry - how using Rust is safer and better than using hacks in Javascript
Dealing with strings are confusing in rust, because there are two types: a heap-
allocated String
and a pointer to a slice of String bytes: &str
. Knowing
what to use, and defining structures on them, immediately exposes the steep
learning curve of ownership.
See the Guide to Strings for some help.
Specific topics:
-
Default Values for Maintainability - short and easy guide
-
Async Rust - A really concise and great intro to async/await
-
Async Rust: Futures, Tasks, Wakers; Oh My! - another great concise intro, starting with basic async concepts/syntax and diving into details about Wakers and the Future mechanism
-
Rust Async is Colored - great deep dive into async vs sync, connecting the two worlds, and implications
-
Shared/Exclusive Refs, not Mutable/Immutable - excellent explanation from @dtolnay on thinking about
&mut T
as exclusive, not immutable. Also explaining interior mutability andRefCell
etc. - and why they allow&self
safely while providing mutation. -
Elegant library APIs in Rust - lots of good tips here
-
Rain's Rust CLI Guide - how to write and organize Rust CLI apps
-
Effectively using Iterators in Rust - on differences between
iter()
,into_iter()
, types, etc. -
Generics and Associated Types - when to use each one
-
Returning Iterators - really helpful article, this is not easy
- Recursive Iterators in Rust - yelch, using Box
- Internal-iterator - a potentially better solution for easily implementing some iterators
- propane - Creating iterators via generator/yield API
-
Generic Return Types in Rust - deep dive into
Iterator.collect()
, traits, and Rust's type system -
Rust-san - sanitizers for Rust code, if the basic compiler checks are not enough :)
-
Pretty State Machines in Rust - great article on diff state machine patterns, use of enums and structs
-
Init Struct Pattern - on patterns for initializing structs
-
Rc and RefCell tricks - good explanations of the two
-
COW, Rust vs C++ - great dive into details of copy-on-write. Might be a great pattern for working with things like strings, where cloning might be expensive.
-
Magical Zero-Sized Types and Proofs - for type masochists
-
Structural Typing in Rust - HLists, ability to use path-based and shape/signature based trait typing instead of by name
-
How Rust Solved Dependency Hell - neat look at what's underneath Cargo to help solve dep issues. Rustc can handle multiple versions of a dependency.
Borrowing and Lifetime Tricks
If you need to borrow multiple items mutably from a Vec/array/SmallVec/etc.:
- The thread on solutions
- You can use split_at_mut() but this is clumsy
- Arref gives a great solution
- There is a nightly get_many_mut() API
If you have a Trait with an associated type that must deal with lifetimes: https://stackoverflow.com/questions/33734640/how-do-i-specify-lifetime-parameters-in-an-associated-type
Macros
I started writing Rust macros and it is not only lots of fun, but pretty essential for writing concise, performant code IMO. Writing Rust has lots of boilerplate sometimes, especially owing to not having real inheritance. I recommend starting with macro_rules!
which are fancy templates and really easy. Here are some links to help:
- Macros in Rust - a Tutorial - really easy tutorial, esp for
macro_rules!
- Macros - a Methodical Introduction - very detailed book mostly about
macro_rules!
with explanations for the minutae of parsing - How to Write Hygienic Rust Macros - important and short. Read this to ensure your macros work everywhere - so users don't have to worry about imports, etc.
- Rust Macros case studies
- Overview of Macros in Rust - from Steve Klabnik
Some crates that may help write macros:
- spez - match and specialize on the type of an expression. "A trick to do specialization in non-generic contexts on stable Rust"
- concat-ident - macro to concat multiple identifiers etc. and use the result, perhaps as a struct or method name. Very useful in macros
Cool Rust Projects
NOTE: there's a separate section for Data-related projects.
CLI tools:
- XSV - a fast CSV parsing and analysis tool
- zoxide - a supercharged, AI-based replacement for cd with rank-based search of your most frequently used dirs
- mcfly - Upgraded, smarter Ctrl-R for bash etc. (note: fish users already have this built in, basically)
- Ripgrep - insanely fast grep utility, great for code searches. Shows off power of Rust regex library
- Bat - A super
cat
with syntax highlighting, git integration, other features - Bottom - Cross-platform fancy
top
in Rust - process/sys mon with graphs, very useful! - gitui - awesome, fast Git terminal UI. It will change your life!
- skim -
sk
is a general purpose fuzzy-finder; it can work with ripgrep and other utils too - zellij - terminal mux/session detach like tmux/screen, but with a pretty UI and plugins
- pueue - instead of using tmux, queue and manage your background tasks
- xh - HTTPie clone / much better
curl
alternative - Dust - Rust graphical-text faster and friendlier version of du
- Diskonaut - another Text-UI folder/file space usage and browsing tool
- fd - Rust CLI, friendlier and faster replacement for
find
- rustscan - Really fast port scanner, this should easily replace lsof / netstat
- sd - Easier to use sed. You can search and replace in like all files under subdir with
sd old_str new_str **
. - Nushell - Rust shell that turns all output into tabular data. Pretty cool!
- delta - git-delta: colorful git diff viewer
- ruplacer - Source code search and replace tool
- imagecli - CLI for image batch processing
- Hyperfine - Rust performance benchmarking CLI
- Alacritty - GPU accelerated terminal emulator
- jql - Rust version of popular
jq
JSON CLI processor, though not as powerful - rq - a Record Query/Transform tool, translate CSV, Avro, CBOR, Json etc etc to and from each other
- htmlq - like jq but for HTML
- Starship - "The minimal, blazing-fast, and infinitely customizable prompt for any shell!"
- Kubesql - SQL queries for Kube metadata!
- grex - CLI tool to create regexes given a set of strings to match!
- Scaphandre - Metrics agent for collecting power consumption metrics!
- kdash - Text UI Kubernetes dashboard
- Josh - Cool git proxy allows you to treat part of a large monorepo like its own smaller git repo!
Wasm:
- Wasmer - general purpose WASM runtime
- Krustlet - WebAssembly (instead of containers) runtime on Kubernetes!! Use Rust + wasm + WASI for a truly portable k8s-based deploy!
- Extism - a universal WASM-based plugin system, multi language, but written in Rust
- lunatic - Erlang-like server side WASM runtime with supervision and channel-based message passing, plus hot reloading!
- CosmWasm - Rust/WASM for programming smart contract on Cosmos ecosystem
Others:
- TabNine - an ML-based autocompleter, written in Rust
- kiro - a CLI text editor with syntax highlighting, like a friendlier vim
- ox - another CLI/Text UI lightweight text editor
- async-std - the standard library with async APIs
- Convey - Layer 4 load balancer
- Ockam - End to end secure messaging lib/platform between cloud and IoT devices
- Parsec - abstraction layer for hardware security and cryptography
- Gazebo - useful utilties for all apps, by the Facebook Rust team. They also have blog posts such as on Dupe
Do Rust in Turkish, Spanish and other languages! :)
Languages etc.
- BLisp - a statically-typed Lisp built on top of Rust
- RustPython
Rust Error Handling
Error handling survey - really good summary of the Rust error library landscape as of late 2019.
- Anyhow - streamlined error handling with context....
- Snafu - adding context to errors
- Error-stack - I really like the philosophy behind this crate. It makes it easy to "stack" errors - you get not a backtrace, but stacked detailed errors, with the inner error showing through.
Rust Concurrency
-
Rust Concurrency: Five Easy Pieces - a great intro to threads, using message queues, determinism, and more
-
Async stacktraces - this is SUPER COOL!!!
-
tokio-console - remote async debugging facility!
-
Rust Parallelism for non C/C++ Devs - great resource on the low-level primitives like
Mutex
andRwLock
-
Fearless Concurrency with Hazard Pointers - using the
conc
crate andAtomic
which implements hazard pointers for fine-grained and safe protection of readers and garbage -
Bastion - Erlang/Akka-style, remote supervised actor framework
-
Kompact - Kompics style message-passing "component system" with actor model and networking built in
-
Actyx - really cool "decentralized event database, streaming and processing engine" based on event sourcing concepts, built by one of Akka's founders
-
Actors with Tokio - not using any Actor framework, just channels
-
Use mio if you want a lower-level event loop, or thin_main_loop
Shared Data Across Multiple Threads
Sometimes one needs to share a large data structure across threads and several of them must access it.
The most general way to share a data structure is to use Arc<RwLock<...>>
or Arc<Mutex<...>>
. The Arc
keeps track of lifetimes and lets different threads exist for different lengths of time, and is inexpensive since it is usually only accessed once at thread spawn. The Mutex
or RwLock
lets different threads mutate it safely, assuming the data structure is not thread-safe.
A thread-safe data structure could be used in place of the RwLock
or Mutex
.
Scoped threads could be used if only one owner will mutate the data structure, and one wants to share immutable refs with other threads for reading. However, the special threads in Crossbeam crate are still needed as Rustc by itself has no way of proving the lifetime of a thread or when it will be joined, thus any immutable refs created from the owner thread still cannot compile or be shared due to rustc lifetime checks. Scoped threads are a way around that as it gives rustc a guarantee that the threads will be joined before the owner goes away.
Arc-swap is an alternative to Arc that is designed for occasional updates - enables atomic swapping of the object underneath the Arc, and allows one to read without contention (unlike Mutex/RwLock).
Also see beef - a leaner version of Cow.
Data Processing and Data Structures
-
Are we learning yet? - list of ML Rust crates
- Linfa - Rust ML framework
-
Timely Dataflow - distributed data-parallel compute engine in Rust!!!
-
Hydroflow - a brand new Rust based optimized streaming dataflow engine, relational data, based on very advanced UCBerkeley research on optimization.
-
DataFusion - a Rust query engine which is part of Apache Arrow!
- NOTE: there is now a Ballista project that is basically like Spark - distributed Data Fusion.
-
Amadeus - distributed streams / Parquet / big data processing
-
Fluvio - distributed, persistent queuing / stream processing framework using WASM for programmability, written in Rust!
-
Arroyo - another stream processing framework, streaming SQL and Rust pipelines!
-
Weld - Stanford's high-performance runtime for data analytics
-
Cleora - Super fast Rust tool for billion-scale hypergraph vector embedding ML
-
Node crunch - simple lightweight distributed compute framework
-
Project Midas - distributed compute framework and terminal UI using Lua as scripting language
-
Cube Store - Rust and Arrow/DataFusion-based rollup/aggregation/cache layer for SQL datastores, too bad it's mostly for JS
-
Noria - "data-flow for high-performance web apps" - basically a materialized view cache that updates in real time as database data updates
-
polars - super fast and high level DataFrame implementation for both Rust and Python, much faster and higher level than using Arrow itself
-
Bagua - distributed learning/training framework, the very fast communication core is written in Rust
-
Similari - similarity search/computation engine for ML in Rust
-
Toshi - ElasticSearch written in Rust using Tantivy as the engine
-
MeiliDB - fast full-text search engine
-
Quickwit - Log search DB, like Elastic but built on top of Tantivy
-
Datafuse - distributed "Real-Time Data Processing & Analytics DBMS", similar to Clickhouse "but faster"
-
sonic - Fast, very lightweight and schemaless search/text index. NOT a document store, but an index store.
-
Sanakirja - a transactional KV DB engine/local store, claims to be fastest around
-
Sled - an embedded database engine using latch-free Bw-tree on latch-free page cache techniques for speed
-
Skytable - Rust "realtime NoSQL" key-value database
-
IOx - New in-memory columnar InfluxDB engine using Arrow, DataFusion, rust! Persists using parquet. Super awesome stuff.
-
IndraDB - Graph database/library written in Rust! and inspired by Facebook's TAO.
-
TerminusDB-store - a Rust RDF triple data store
-
BonsaiDB - NoSQL document store written in Rust with Rust schemas
-
Vector - high performance observability data pipeline, for transforming, aggregating, routing logs, metrics, traces, etc.
- includes a Vector Remap Language for general transformation
-
Tremor - a simple event processing / log and metric processing and forwarding system, with scripting and streaming query support. Much more capable than Telegraf.
-
MinSQL - interesting POC on lightweight SQL based log search, w automatic field parsing etc.
-
pq - Parse and Query log files as time series, extracting structured records out of common log files
-
plotters - Rust data visualization / graphing library
-
Stateright - distributed protocol/model checker with UI, linearizability checker!
-
Clepsydra - Graydon Hoare working on distributed database protocol - in Rust!
-
crepe - Datalog, declarative logic programs as macros in Rust
JSON Processing
For JSON DOM (IR) processing, using the mimalloc allocator provided me a 2x speedup with serde-json. Then, switching to json-rust provided another 1.8x speedup. The speedup is completely unreal, much faster than JVM. The main reason I guess is that json-rust has a Short
DOM class for short strings, which requires no heap allocation.
- simdjson-rs - SIMD-enabled JSON parser. NOTE: no writing of JSON.
- pjson - JSON streaming parser
- streamson - efficient JSON processing for large documents
Cool Data Structures
-
leapfrog - fast, concurrent
HashMap
, lock-free if types support atomic ops.- What's neat about its API is that instead of locking at bucket level, and blocking inserts if a reader is taking too long, it never returns references to data and relies on an atomic API
-
concread - Concurrently Readable (Copy on Write, MVCC) datastructures - "allow multiple readers with transactions to proceed while single writers can operate" - guaranteeing readers same version. There is a hashmap and ARCache.
-
flurry - Rust impl of Java's ConcurrentHashMap. Uses seize for ref-count-based GC.
-
im - Immutable data structures for Rust
-
rust-phf - generate efficient lookup tables at compile time using perfect hash functions!
-
odht - "hash table that can be mapped from disk into memory without need for up-front decoding" - deterministic binary representation, and platform and endianness independent. Sounds sweet!
-
Patricia Tree - Radix-tree based map for more compact storage
-
probabilistic-collections - Bloom/Cuckoo/Quotient filters, CountMinSketch, HyperLogLog, streaming approx set membership, etc.
-
priq - "blazing fast" priority queue built using arrays
-
Using Finite State Automata and Rust to quickly index and find data amongst HUGE amount of strings
-
ahash - this seems to be the fastest hash algo for hash keys
-
Metrohash - a really fast hash algorithm
-
IndexMap - O(1) obtain by index, iteration by index order
-
FM-Index, a neat structure that allows for fast exact string indexing and counting while compressing original string data at the same time. There is a Rust crate
-
Heapless - static data structures with fixed size; Vec, heap, map, set, queues
-
dashmap - "Blazing fast concurrent HashMap for Rust". NOTE: I don't recommend this project, I used it in my Ying profiler but it can deadlock in unpredictable ways
-
Easy Persistent Data Structures in Rust - replacing
Box
withRc
-
VecMap - map for small integer keys, may use less space
Geospatial and Graph
-
The base Geometry processing crate is geo.
- Geo does not (as of 0.18) handle intersections, difference, XOR etc. Try geo-booleanop for a Rust-only implementation using Martinez-Rueda algorithm
- Or use geos based on the C library
-
spatial-join - Spatial joins and proximity maps!
-
Rstar - n-dimensional R*-Tree for geospatial indexing and nearest-neighbor
-
spade - R-trees and Delaunay triangulations
-
Hora Search - Nearest-Neighbor (NN) / geo search library that includes multiple algorithms including HNSW, SSG, PQIVF, etc.
-
Petgraph - Graph data structure for Rust, considered perhaps most mature right now
String Performance
Rust has native UTF8 string processing, which is AWESOME for performance. However, there are two concerns usually:
- Small string memory efficiency. The native
String
type uses at least two words just for pointer and length/cap, which might be longer than the string itself; - Minimizing number of heap allocations
Here are some solutions:
- String - string type with configurable byte storage, including stack byte arrays!
- Inlinable String - stores strings up to 30 chars inline, automatic promotion to heap string if needed.
- Also see smallstr
- flexstr - Enum String type to unify literals, inlined, and heap strings
- kstring - intended for map keys: immutable, inlined for small keys, and have Ref/Cow types to allow efficient sharing. :)
- nested - reduce Vec
type structures to just two allocations, probably more memory efficient too. - tinyset - space efficient sets and maps, can be combined with nested perhaps
- bumpalo can do really cheap group allocations in a
Bump
and has customString
andVec
versions. At least lowers allocation overhead.
Rust and Scala/Java
-
The presence of true unsigned types is really nice for low-level work. I hit a bug in Scala where I used >> instead of >>>. In Rust you declare a type as unsigned and don't have to worry about this.
-
Immutable byte slices and reference types again are awesome for low-level work.
-
Trait monomorphisation is awesome for ensuring trait methods can be inlined. JVM cannot do this when there is more than one implementation of a trait.
-
Being able to examine assembly directly from compiler output is super nice for low level perf work (compared to examining bytecode and not knowing the final output until runtime)
-
OTOH, rustc is definitely much much stricter (IMO) compared to scalac. Much of this is for good reason though, for example lack of integer/primitive coercion, ownership, etc. gives safety guarantees.
Rust and Python
- PyO3 seems to be a gold standard of Rust-based Python module development.
- Maturin - for building and publishing PyO3-based/Rust Python modules, or mixed Rust/Python projects
- There are older posts too: Wrapping Rust Types as Python classes and RustyPy but they are much more work than PyO3
- PyOxidizer - a Rust tool to package Python apps, interpreter, and all dependencies as a single binary, by wrapping app in a Rust program with a custom Rust Py module importer. Also helps embed Python code in Rust apps.
- Oh no, my data science is getting Rusty! - neat post from CrowdStrike on integrating Rust with Python for improved performance AND safety
Rust-OtherLanguage Integration / Rust FFI
- Calling Rust from Java - especially see the hint for using jnr-ffi
- There is also j4rs for calling Java from Rust
- SaferFFI - a neat library to make exposing C-like APIs much safer esp dealing with pointers, nulls, borrowing etc.
- Exposing a Rust library to C - has some great tips on creating .so's and working with strings
- cc-rs - C/C++ build integration with Cargo
- It seems to me Circle CI's support for multiple docker images and explicit manifest style makes it very easy to set up multiple language and dependency support
- Supporting multiple languages in Travis CI
- Running LLVM on GraalVM - using GraalVM to embed and run LLVM bitcode! Too bad GraalVM is commercial/Oracle only
CLI and Misc
-
Structopt - define CLI options using a struct!
-
tui-rs - Rust terminal UI for CLI apps. Check out list of projects it refers to also. Lots of options!
-
Hot Reloading in Rust - great article on how to hot-reload dynamic linked libraries in Rust, and on the potential pitfalls, with plenty of links.
IDE/Editor/Tooling
- EVCXR - a Rust REPL!!! With deps, and tab-completion for methods!!
- comby-rust - rewrite Rust code using comby
- rustviz - Visualize borrowing and ownership!
- no-panics-whatsoever - crate to detect and ensure at compile time there aren't panics in your code
- cargo-bloat - what's taking up space in my Rust binary
- cargo-limit - clean up, sort and limit error/warning output. Great for those of us running cargo in shells!
- mutagen - mutation testing tool for Rust programs. Generates "mutations" in your code to try to break test coverage!
- cargo-rr - time travel/recording/reverse debugger framework for Rust using rr
- For more explanation see Print debugging should go away
- cargo_hakari - A crate to speed up builds of workspace-hack packages ... for when you have multiple crates or complex builds, and you have duplicate dependencies
- inkwell - LLVM API, including LLVM IR generation and running LLVM JIT to run snippets in your code
Dependency conflicts? Use cargo tree -i
to lookup reverse dependencies for specific packages (which crates are using which deps). For example, cargo tree -i arrow:5.0.0-SNAPSHOT
.
- RustAnalyzer - LSP-based plugin/server for IDE functionality in Sublime/VSCode/EMacs/etc
- Configuring Rustfmt
- Godbolt - A "compiler explorer", not Rust specific but neat to play with compiler settings and diff targets.
- Cargo-play - run Rust scripts without needing to set up a project
- Also see cargo-eval and runner for diff ways of easily running scripts without projects
BTW for Rust 1.51+ you can speed up MacOS builds with this in your Cargo.toml (see the release notes):
[profile.dev]
split-debuginfo = "unpacked"
Testing and CI/CD
The two standard property testing crates are Quickcheck and proptest. Personally I prefer proptest due to much better control over input generation (without having to define your own type class).
- Rust Continuous Delivery - hints on using Docker, caching deps, and automated cloud-based CI/CD workflows for Rust
- Cargo-nextest looks like a really good project to help with test organization, test CI, running tests faster, etc.
- Faster Build Times on MacOS
- 5x Faster Rust Docker Builds with cargo-chef - you need this for faster Rust app deploys!
- Are We Observable Yet? - an introduction to Rust telemetry
- Miri - can run binaries and test suites of cargo projects and detect certain classes of undefined behavior, including memory leaks!!
Cross-compilation
A common concern - how do I build different versions of my Rust lib/app for say OSX and also Linux?
- Easiest way now seems to be to use cross - I tried it and literally as easy as
cargo install cross
andcross build --target ...
as long as you have Docker.- NOTE: crates with non-Rust code (eg jemalloc, mimalloc) often have trouble
- Also see rust-musl-builder, another Docker-based solution
- musl is the best target for Linux as it removes need for G/LIBC dependencies and versioning. Musl creates a single static binary for super easy deploys.
- For automation, maybe better to create a single Docker image which combines crossbuild (which has a recipe for OSXCross + other targets) with a rustup container like abronan/rust-circleci which allows building both nightly and stable. Use Docker multi-stage builds to make combining multiple images easier
Finally, the Taking Rust everywhere with Rustup blog has good guide on how to use rustup to install cross toolchains, but the above steps to install OS specific linkers are still important.
Performance and Low-Level Stuff
A big part of the appeal of Rust for me is super fast, SAFE, built in UTF8 string processing, access to detailed memory layout, things like SIMD. Basically, to be able to idiomatically, safely, and beautifully (functionally?) do super fast and efficient data processing.
-
Cheap Tricks - Rust Performance - set of quick Cargo settings to try
-
How to Write Fast Rust Code - really good guide
-
High Performance Rust - a book
-
Optimizing String Processing in Rust - really useful stuff
-
Achieving warp speed with Rust - great tips on performance optimization
-
Deep Dive into Dynamic Dispatch - great details and perf comparison
-
Rust to Assembly - great series of blog posts detailing how various parts of Rust compile down to assembly
-
Modern storage is plenty fast - using a new Rust crate called glommio one can achieve multi-GB per sec read throughputs from modern SSDs. So maybe we don't need memory after all.
- Along the same lines, not Rust-specific but ScyllaDB and I/O Access Methods - discussions of mmap vs AIO/DIO vs standard Linux I/O
- Direct I/O Writes - why doing direct I/O writes may end up better than buffered
-
Representations - super important to understand low-level memory layouts for structs. C vs packed vs .... including alignment issues.
-
Precise memory layouts and how to dump out Rust struct memory layouts
- Or just use the memoffset crate
-
MemFlow - framework to inspect machine memory. Think about DMA/IO, debugger, or Plasma-type memory/DB applications.
-
Rust uses system malloc by default. How to switch the default allocator.
- Use jemallocator and jemalloc-ctl crates for stats, deep dives, etc. Jemalloc from Facebook supposed to be fast.
- Also see MiMalloc - a high perf allocator from Microsoft. I got 2x improvement for JSON workloads!
- There are even epoch GCs available
- Also look into the arena and typed_arena crates... very cheap allocations within a region, then free entire region at once.
- Also see bumpalo - bump allocator which includes custom versions of standard collections
-
Watch out for dynamic dispatch (when you need to use
Box<dyn MyTrait>
etc). One solution is to use enum_dispatch.- Related: auto_enum - a way to return enums when you might need to return
impl A
for some trait A when you might be returning diff implementations - Can also use ambassador - to delegate trait implementations
- Related: auto_enum - a way to return enums when you might need to return
If small binary size is what you're after, check out Min-sized-Rust.
Rust nightly now has a super slick asm! inline assembly feature. The way that it integrates Rust variables/expressions with auto register assignment is super awesome.
NOTE: simplest way to increase perf may be to enable certain CPU instructions: set -x RUSTFLAGS "-C target-feature=+sse3,+sse4.2,+lzcnt,+avx,+avx2"
NOTE2: lazy_static
accesses are not cheap. Don't use it in hot code paths.
Perf profiling:
Note: this section is mostly about profiling tools -- detailed breakdowns of bottlenecks, as opposed to benchmarking (which is repeatable, systematic measurement). The two benchmarking tools I recommend are criterion and Iai for benchmarking.
NEW: I've created a Docker image for Linux perf profiling, super easy to use. The best combo is cargo flamegraph followed by perf and asm analysis.
-
cargo-flamegraph -- this is now the easiest way to get a FlameGraph on OSX and profile your Rust binaries. To make it work with bench and Criterion:
- First run
cargo bench
to build your bench executable - If you haven't already,
cargo install flamegraph
(recommend at least v0.1.13) sudo flamegraph target/release/bench-aba573ea464f3f67 --profile-time 180 <filter> --bench
(replace bench-aba* with the name of your bench executable)- The
--profile-time
is needed for flamegraph to collect enough stats
- The
open -a Safari flamegraph.svg
- NOTE: you need to turn on
debug = true
in release profile for symbols - This method works better for apps than small benchmarks btw, as inlined methods won't show up in the graph.
- First run
-
Rust Profiling with Instruments on OSX - but apparently cannot export CSV to FlameGraph :(
- Note that you can now just install cargo instruments
- Also useful for heap/memory analysis, including tracking retained vs transient allocations
-
Rust Performance: Perf and Flamegraph - including finding hot assembly instructions
-
samply - used to be called perfrecord, Rust CPU CLI command profiler using Firefox as UI. WIP.
-
Iai - a one-shot Rust profiler that uses Valgrind underneath
-
Top-down Microarchitecture Analysis Method - TMAM is a formal microprocessor perf analysis method from Intel, works with perf to find out what CPU-level bottlenecks are (mem IO? branch predictions? etc.)
-
Rust Profiling with DTrace and FlameGraphs on OSX - probably the best bet (besides Instruments), can handle any native executable too
- From
@blaagh
: though the predicate should be"/pid == $target/"
rather than using execname. - DTrace Guide is probably pretty useful here
- From
-
Hyperfine - Rust performnace benchmarking CLI
-
Tools for Profiling Rust - cpuprofiler might possibly work on OSX. It does compile. The cpuprofiler crate requires surrounding blocks of your code though.
-
Rust Profiling talk - discusses both OSX and Linux, as well as Instruments and Intel VTune
-
2017 RustConf - Improving Rust Performance through Profiling
-
Flamer - an alternative to generating FlameGraphs if one is willing to instrument code. Warning: might require nightly Rust features.
-
cargo-profiler - only works in Linux :(
-
coz and its Cargo plugin, coz-rs -- "a new kind of profiler that unlocks optimization opportunities missed by traditional profilers. Coz employs a novel technique we call causal profiling that measures optimization potential"
-
Rust Perf Book Profiling Page - lots of good links
cargo-asm can dump out assembly or LLVM/IR output from a particular method. I have found this useful for really low level perf analysis. NOTE: if the method is generic, you need to give a "monomorphised" or filled out method. Also, methods declared inline won't show up.
- What I like to do with asm output: check if rustc has inlined certain methods. Also you can clearly see where dynamic dispatch happens and how complicated generated code seems. More complicated code usually == slower.
- llvm-mca - really detailed static analysis and runtime prediction at the machine instruction level
What works on a Mac (but see cargo flamegraph above for easier way):
sudo dtrace -c './target/release/bench-2022f41cf9c87baf --profile-time 120' -o out.stacks -n 'profile-997 /pid == $target/ { @[ustack(100)] = count(); }'
~/src/github/FlameGraph/stackcollapse.pl out.stacks | ~/src/github/FlameGraph/flamegraph.pl >rust-bench.svg
open -a Safari rust-bench.svg
where -c bench.... is the executable output of cargo bench.
I was hoping cargo-with would allow us to run above dtrace command with the name of the bench output, but alas it doesn't seem to work with bench. (NOTE: they are working on a PR to fix this! :)
NOTE: The built in cargo bench
requires nightly Rust, it doesn't work on stable! I highly recommend for benchmarking to use criterion, which works on stable and has extra features such as gnuplot, parameterized benchmarking and run-to-run comparisons, as well as being able to run for longer time to work with profiling such as dtrace.
Memory/Heap Profiling
The options I've tried out:
- Bytehound - really slick, but only works on Linux (using perf).
- No need to modify apps, uses
LD_PRELOAD
- extracts full stack traces plus every alloc/dealloc, but claims it uses custom unwinding code that's much much faster
- tracks memory usage over time, as well as leaks explicitly, and memory fragmentation
- can give you flamegraphs of memory allocations or just leaks!
- Has a really nice UI/webapp that's bundled together
- Has many options to write out profiling data to different locations or over network
- Problems:
- Creates giant profiling data files. There are options to slim it down though, such as keeping only allocations that live longer than a particular threshold
- Bundled viewer does not seem to be able to load debug symbols when profiling data does not include them :(
- It seems the only way to really include full symbols in the profiling info is to run profiling with a debug build. However this blows up the size of the data file even more... hundreds of MBs from just a few minutes of run time!
- No need to modify apps, uses
- jeprof: If you use jemallocator and install jemalloc as your global allocator, you can get some profiling for free.
- Jemalloc Heap Profiling
- How to parse jeprof text output
- Pros: Jemalloc profiling is sampling based and very lightweight. It can be used in production with minimal perf impact.
- The profile files are also very small
- Cons: it's, like, really hard to use. For example, enabling it via environment variable - the instructions are not very clear, and there is no way to write the files to anything other than the current directory
- Runtime config: set both environment variables
MALLOC_CONF
and_RJEM_MALLOC_CONF
(which one works depends on environment) - Compile time config, for jemallocator users:
JEMALLOC_SYS_WITH_MALLOC_CONF
- Runtime config: set both environment variables
- Con: The stats collected are about total memory allocated, with no differentiation for short/temporal vs long-lived allocations
- Con: It's not built for Rust and difficult to infer stacktraces. Many symbols are mangled.
- It is possible to do differential analysis: use one profile as a "base" and then diff vs other profiles. However, the profile files use sequence numbers, so it's hard to tell which profile to use for what time.
- Also there is no way to sort the output and the options for simplifying the output don't work very well
- dhat - Swap out the global allocator, will profile your allocations & max heap usage
- One advantage DHAT has over jeprof/jemalloc is lifetime / allocation length information. This can be used to figure out long-held things
- DHAT also tracks the entire call graph so it can produce a useful tree
- It's online viewer is also much easier to use than
jeprof
- Unfortunately DHAT tracks every allocation so it's not good for production use
- DHAT also crashes on some workloads. This is really annoying.
- Heaptrack and working with Rust works for Rust, but only on Linux.
After the above frustrations and investigations, I decided to write my own custom memory profiler - Ying - a sampling profiler, built for rich Rust stack traces including inlined methods, which tracks retained memory and lifetimes. Definitely experimental right now.
- memory-profiler - written in Rust by the Nokia team!
- stats_alloc can dump out incremental stats about allocation. Or just use
jemalloc-ctl
. - deepsize - macro to recursively find size of an object
- Parity-util-mem - can find the size of collections as well?
- Measuring Memory Usage in Rust - thoughts on working around the fact we don't have a GC to track deep memory usage
Fast String Parsing
-
nom - a direct parser using macros, commonly accepted as fastest generic parser
-
pest is a PEG parser using an external, easy to understand syntax file. Not quite as fast but might be easier to understand and debug. There is also a book.
-
combine is a parser combinator library, supposedly just as fast as nom, syntax seems slightly
-
simdutf8 - SIMD lightning fast UTF-8 validation
Bitpacking, Binary Structures, Serialization
- bitpacking - insanely fast integer bitpacking library
- packed_struct - bitfield packing/unpacking; can also pack arrays of bitfields; mixed endianness, etc.
- rkyv - Zero-copy deserialization, for generic Rust structs, even trait objects. Uses relative pointers.
- binary-layout - "type-safe, inplace, zero-copy access to structured binary data" including open-ended byte arrays at the end
- zerovec - Clients upgrading to zerovec benefit from zero heap allocations when deserializing read-only data.
- Speeding up incoming message parsing using nom - a detailed guide to using nom for deserialization, much faster than Serde
The ideal performance-wise is to not need serialization at all; ie be able to read directly from portions of a binary byte slice. There are some libraries for doing this, such as flatbuffers, or flatdata for which there is a Rust crate; or Cap'n Proto. However, there may be times when you want more control or things like Cap'n Proto are not good enough.
How do we perform low-level byte/bit twiddling and precise memory access? Unfortunately, all structs in Rust basically need to have known sizes. There's something called dynamically sized types basically like slices where you can have the last element of a struct be an array of unknown size; however, they are virtually impossible to create and work with, and this only covers some cases anyhow. So we will unfortunately need a combination of techniques. In order of preference:
- Overall scroll is the best general-purpose struct serialization crate; it helps with reading integers and other fields too, and takes care of endianness. It generates pretty efficient code. It is a bit of a pain working with numeric enums however.
- num_enum - a way to derive TryFrom for numeric enums helps a little bit.
- I have found plain works really well. Mark your structs with
#[repr(C)]
. It only helps with size and alignment, not endianness - so maybe more for in-memory structures or when you are sure you don't need code to work across endianness platforms. If your structures are not aligned then use#[repr(C, packed)]
or#[align(1)]
. - Use a crate such as bytes or scroll to help extract and write structs and primitives to/from buffers. Might need extra copying though. Also see iobuf
- rel-ptr - small library for relative pointers/offsets, should be super useful for custom file formats and binary/persistent data structures
- arrayref might help extract fixed size arrays from longer ones.
- bytemuck for casts
- Also zerocopy with
FromBytes
andAsBytes
traits for easy transmuting - bitmatch could be great for bitfield parsing
- Also see zero
- Allocate a
Vec::<u8>
and transmute specific portions to/from structs of known size, or convert pointers within regions back to references:
#![allow(unused)] fn main() { let foobar: *mut Foobar = mybytes[..].as_ptr() as *mut Foobar; let &mut Foobar = (unsafe { foobar.as_ref() }).expect("Cannot convert foobar to ref"); }
- Or structview which offers types for unaligned integers etc.
- There are some DST crates worth checking out: slice-dst, thin-dst
- As a last resort, work with raw pointer math using the add/sub/offset methods, but this is REALLY UNSAFE.
#![allow(unused)] fn main() { let foobar: *mut Foobar = mybytes[..].as_ptr() as *mut Foobar; unsafe { (*foobar).foo = 17; (*foobar).bar = -1; } }
Want to zero memory quickly? Use slice_fill for memset optimization, since there is no memory filling for slices in Rust yet.
Also check out the crazy number of crates available under compression - including various interesting radix and trie data structures, and more compression algorithms that one has never heard of.
Enums, Thin Pointers, Type Wrapping
A frequent problem, esp when working with data, is to have a "union" of different types. Perhaps Option
will suffice, but sometimes we need to wrap Vec<A>
and Vec<B>
together in the same type. We don't want to just use Box<dyn MyTrait>
as that allocates and results in dynamic dispatch. Here are some crates and patterns that may help in working with enums, or alternatives:
- enum_dispatch - macro to implement the
dyn MyTrait
trait object pattern for enums, so we get fast static dispatch. Basically implements traits for underlying types in enums - enum_delegate is an alternative that works with associated types in traits - but not generics
- strum - derive strings and discriminant enums using macros
- You can use
std::mem::discriminant
, a built-in function, to find the numeric discriminant for an enum - Also enum discriminants can be explicitly specified using
#[repr(..)]
, see here - you can then transmute the enum into something explicit
Some non-enum crates that can also help:
- ptr_union - "Pointer union types the since of a pointer by storing the tag in the alignment bits" :)
- erasable - "Type-erased thin pointers" - need to see how this is different from
std::any::Any
SIMD
There is this great article on Towards fearless SIMD, about why SIMD is hard, and how to make it easier. Along with pointers to many interesting crates doing SIMD. (There is a built in crate, std::simd
but it is really lacking) (However, packed_simd will soon be merged into it)
Another great article: learning simd with rust by finding planets is great too. simd is really about parallelism. it is better to do multiple operations in a parallel (vertical) fashion, vector on vector, than to do horizontal operations where the different components of a wide register depend on one another.
-
ssimd - an effort to bring std::simd/packed_simd to Rust stable, with auto vectorization (meaning auto detect and implement code paths and fallbacks for when SIMD not available!)
-
faster - "SIMD for Humans" -- probably my favorite one, very high level translation of numeric map loops into SIMD
-
fearless_simd, the blog post author's crate. Runtime CPU detection and use of the most optimal code, no need for unsafe, but only focused on f32.
-
SIMDeez - abstracts intrinsic SIMD instructions over different instruction sets & vector widths, runtime detection
-
simd_aligned and simd_aligned_rust - work with SIMD and packed_simd using vectors which have guaranteed alignment
-
aligned - newtype with byte alignment, for stack or heap!
-
https://www.rustsim.org/blog/2020/03/23/simd-aosoa-in-nalgebra/
NOTE: shuffle
in packed_simd
is not very fast. Replace with native instructions if possible.