Abstract Nonsense

A place for musings, observations, design notes, code snippets - my thought gists.

⚡️Apache Spark 4.0 released

Apache Spark 4.0 has been released. It’s the first major version update since Spark 3.0 in 2020.

Here’s some of the highlights I’m excited about:

  • A new SQL pipe syntax. It seems to be a trend with modern SQL engines to include “pipe” syntax support now (e.g. BigQuery). I’m a fan of functional programming inspired design patterns and the excellent work by the prql team, so I’m glad to see this next evolution of SQL play out.
  • A structured logging framework. Spark logs are notoriously lengthy and this means you can now use Spark to consume Spark logs! Coupled with improvements to stacktraces in PySpark, hopefully this will mean less grepping tortuously long stack traces.
  • A new DESCRIBE TABLE AS JSON option. I really dislike unstructured command line outputs that you have to parse with awkward bashisms. JSON input/outputs and manipulation with jq is a far more expressive consumption pattern that I feel captures the spirit of command line processing.
  • A new PySpark Plotting API! It’s interesting to see it supports plotly on the backend as an engine. I’ll be curious to see how this plays out going forward… Being able to do #BigData ETL as well as visualisation and analytics within the one tool is a very powerful combination.
  • A new lightweight python-only Spark Connect PyPi package. Now that Spark Connect is getting more traction, it’s nice to be able to pip install Spark on small clients without having to ship massive jars around.
  • A bug fix for inaccurate Decimal arithmetic. This is interesting only insofar as it reminds me that even well-established, well-tested, correctness-first, open-source software with industry backing can still be subject to really nasty correctness bugs!

Databricks has some excellent coverage on the main release and the new pipe syntax specifically.

Answering the Unasked

I’m not sure exactly where this originated from, but I’m quite delighted by this exam question:

State some substantive question which you thought might appear on this exam, but did not. Answer this question (correctly).

As an interview question, I’ll sometimes ask: “Tell me something interesting you’ve discovered or learned recently.” I find its goes a long way to understanding the way the candidate thinks; how they convey technical knowledge to others; and to get a flavour for how real their passion and interest is for the domain.

View Transition Web API

The (relatively) new View Transition API is really slick! Simply adding the following CSS to my blog enabled same-document view transitions - no JavaScript required!

Go ahead and give it a try now! Simply click a link to another page on this site and you should observe a seamless transition occur.

@view-transition {
  navigation: auto;
}

If you want to add even more pizzazz, you can declare CSS keyframe animations:

/* Create a custom animation */
@keyframes move-out {
  from {
    transform: translateX(0%);
  }

  to {
    transform: translateX(-100%);
  }
}

@keyframes move-in {
  from {
    transform: translateX(100%);
  }

  to {
    transform: translateX(0%);
  }
}

/* Apply the custom animation to the old and new page states */
::view-transition-old(root) {
  animation: 0.4s ease-in both move-out;
}

::view-transition-new(root) {
  animation: 0.4s ease-in both move-in;
}

For a blog like this there’s no real use, but for more complex web applications, the View Transition API is a really seamless way to integrate smooth transitions.

As of writing, it’s supported by the major browsers, excepting Firefox 😔.

Things rewrites their server architecture in Swift

I’ve been a long time user of Cultured Code’s Things to-do app. It’s slick, has well designed ergonomics, and is perfectly minimalistic. Things’ Markdown support is tasteful and its approach to task management structured but pared back.

They’ve just announced a rewrite of their existing server-side infrastructure stack in Swift, the linked post and blog post are worth a read.

From a technical perspective, I’ve always appreciated its rock-solid proprietary Things Cloud syncing service. In particular, I find it interesting the app asks for Local Network access to enable faster syncing:

“Things” would like to find and connect to devices on your local network. Things uses the local network to provide faster sync between your devices.

I’d always thought they implemented some CRDT data structure and synchronised it on the LAN as well as via the server, but according to their FAQ, their synchronisation is only server-side:

None of your data is transmitted across the local network. Things merely sends a notification to your other devices telling them that new information is available, so that they can download it from the cloud.

btop of your resources

btop is now my default terminal resource monitor, supplanting top, htop, and all the others of that ilk. I wanted to spare a few words for its beautiful (and functional!) text-based user interface (TUI):

Image
  • pane management: btop divides your terminal window into multiple information-dense panes displaying CPU, memory, network, and process information simultaneously. What’s fantastic about btop is that user ergonomics and customisation are clearly front and centre: each pane is numbered, and toggling off/on a pane is as simple as pressing the corresponding pane number. Instead of fiddling with a config file and refreshing (as many other command-line tools require), you can effortlessly switch between panes on-the-fly.
  • command input: In a similar vein, attached to each pane is a set of commands that configure that view. A single letter of the command is highlighted in red, and pressing that letter will toggle that filter/sort/configuration in that panel. Putting the commands front-and-centre shifts the mental burden of recalling “What command displays my processes hierarchically” (e) from searching the manual to just looking at the screen.
  • global configuration: if you want your customisations to be sticky across sessions, there’s a cleanly navigable and expressive configuration window that lets you apply globally persistent configurations. This is much nicer than setting command line flags or editing a config file.
  • cursor support: Despite running in a terminal, you can simply click on processes to select them, or use scroll wheels to navigate long lists. This blending of terminal efficiency with GUI-like interactions creates a really slick experience that respects both keyboard purists and those who don’t mind the forbidden practice of mouse navigation.
  • process management: As an added bonus, selecting any process will allow you to send any signal straight from the TUI.
Image

What makes btop stand out is its intuitive keyboard navigation system. Unlike many other CLIs, btop maps essential functions to single keystrokes. This design philosophy means the interaction mode gets out of the way - toggling through complex system information and controls is always just a keystroke away. To borrow from The Design of Everyday Things, this feels like a set of masterfully crafted affordances.

Of course, this only works because of the constraints of what btop provides - unlike other CLIs with more complex combinations of configurations, resource management is effectively a set of independent components tied together into a master ‘view controller’.