This looks pretty great. The UI looked fantastic, and the post mentioned that it was open source. However what's open source appears to be the DuckDB extension, which forwards the requests to a remote URL. I've not been able to find the code for the actual UI.
Is the actual UI open source, or is that something MotherDuck is allowing to be used by this while remaining proprietary? Right now it doesn't appear like this would work without an internet connection.
Yeah, this is really concerning. The handwaving around "keeping the ui up to date" by hosting it on ui.duckdb.org instead of embedding it doesn't taste great to me.
At least it's hosted on duckdb.org and not mother duck, but I really would expect to see that source somewhere. Disappointing unless I've missed it.
It’s quite funny the docs also say this about the configurable url:
> Be sure you trust any URL you configure, as the application can access the data you load into DuckDB.
That’s certainly not what I would expect if someone gave me a “local UI” for some database.
I’ve only just once toyed with duckdb, was planning to look more at it - looks like will need to have my guard and see what actually is “local” and doesn’t ship my data to a remote url.
I'm a co-author of the blog post. I agree that the wording was confusing – apologies for the confusion. I added a note at the end:
> The repository does not contain the source code for the frontend, which is currently not available as open-source. Releasing it as open-source is under consideration.
(Someone could write an actually open source UI extension for duckdb, but that would require a lot of investment that so far only motherduck has been able to provide.)
I've looked at quite a few options, and this one (the product of a single person) is a great base, and MIT licensed:
https://github.com/caioricciuti/duck-ui
I’ll never understand how any UI projects don’t include an actual screenshot of their project as the first thing on their landing page. It seems so obvious.
I find the SqlLab in apache superset to be very good, and I have duckdb as a data source (anything that supports SqlAlchemy works). It works very well. To be honest, when I first saw the screenshot, I thought it was SqlLab. I haven't actually tried the duckdb ui, though.
Honestly I hope they keep some things proprietary. Just making everything FOSS is not a sustainable business model, and I would quite like DuckDB to continue to exist.
I have similar concerns for Astral. Frankly they're single-handedly unshitifying Python, and it would be a tragedy if they run out of money and we're back to dealing with Pip.
The UI aesthetics look similar to the excellent Rill, also powered by DuckDB: https://www.rilldata.com/
Rill has better built in visualizations and pivot tables and overall a polished product with open-source code in Go/Svelte. But the DuckDB UI has very nice Jupyter notebook-style "cells" for editing SQL queries.
Rill founder here, I have no comment on the UI similarity :) but I would emphasize our vision is building DuckDB-powered metrics layers and exploratory dashboards -- which we presented at DuckCon #6 last month, PDF below [1] -- and less on notebook style UIs like Hex and Jupyter.
Rill is fully open-source under the Apache license. [2]
Alongside several great talks including Rusty Conover presenting Airport - Arrow + DuckDB — and Christophe Blefari (Bl3f) introducing a new, lightweight orchestrator called yato.
ydata-profiling does open-source Exploratory Data Analysis (EDA) with Pandas and Spark DataFrames and integrates with various apps: https://github.com/ydataai/ydata-profiling #integrations, #use-cases
I suggest https://perspective.finos.org/ for data viz to be built in. We use DuckDB paired with Perspective for client-side BI use case, and it's been great.
we're using Perspective in crabwalk[0] (it's like dbt specifically built for duckdb and written in rust) and it's amazing paired with duckdb. Near instant loads for hundreds of thousands of rows and you can keep everything in arrow.
It does look interesting, but for the local ETL use case, I am missing the pitch on just having my own collection of SQL scripts. Presumably the all-local case needs less complexity. Unless the idea is that this will eventually support more connectors/backends and work as a full dbt replacement?
Wow that's really cool! Part of my PhD thesis was about writing stable treemapping algorithms for temporal data. The idea being that you want your treemap cells not to fly around like what I'm seeing in your demo, but to remain more or less in the same position without sacrificing too much on the cells aspect ratios. We've come up with a pretty effective and fast method to do that, check out the paper and a demo down below. Maybe we could even do a collaboration to get this implemented in perspective.
The online demo looks great and promising, too bad it's unusable for me. I've tried installing it with conda from conda-forge and no luck. I've tried installing it with pip, the same. I've also cloned the repository from github, tried to build it and failed, but I don't remember the details.
Why is some software so difficult to install beats me.
The UI looks nice and is by itself a welcome addition.
I am somewhat at odds with it being a default extension build into DuckDB release. This still is a feature/product coming from another company than the makers of DuckDB [1], though they did announce a partnership with makers of this UI [2]. Whilst DuckDB has so far thrived without VC money, MotherDuck has (at least) 100M in VC [3].
I guess I'm wondering where the lines are between free and open source work compared to commercial work here. My assumption has been that the line is what DuckDB ships and what others in the community do. This release seems to change that.
Yes, I do like and use nice, free things. And I understand that things have to be paid for by someone. That someone even sometimes is me. I guess I'd like clarification on the future of DuckDB as its popularity and reach is growing.
edit: I don't want to leave this negative sounding post here without addendum. I'm just concerned of future monetization strategies and roadmap of DuckDB. DuckDB is a good and useful, versatile tool. I mainly use it from Python through Jupyter, in the browser and native. I haven't felt the need for commercial services (plus purchasing them from my professional setting is too convoluted). This UI whilst undoubtedly useful seems to be leaning towards commercial side. I merely wanted some clarity on what it might entail. I do hope DuckDB and its community even more greater, better things, with requisite compensation for those who work to ensure this.
One of the DuckDB maintainers here. To clarify - the UI is not built into the DuckDB release. It is an extension that is downloaded and installed like any other extension. This extension happens to be developed by MotherDuck. We collaborated with them to streamline the experience - but fundamentally the extension is not distributed as part of DuckDB and works similarly to other extensions.
To be specific, the work we did was:
* Add the -ui command to the shell. This executes a SQL query (CALL start_ui()). The query that gets executed can be customized by the user through the .ui_command option - e.g. by setting .ui_command my_ui_function().
* The ui extension is automatically installed and loaded when the start_ui function is executed - similar to other trusted extensions we distribute. The automatic install and load can be disabled through configuration (SET autoinstall_known_extensions=false, SET autoload_known_extensions=false) and is also disabled when SET enable_external_access=false.
The nature of UI as an extension is somewhat hard to understand, since its installation method differs from other extensions. Even core ones. Some extensions autoload, some require INSTALL query, and this one has its own special builtin query. It at least feels more ingrained than other extensions by its user experience.
Then there's the (to me) entirely new feature of an extension providing a HTTP proxy for external web service. This part could have been more prominently explained.
Edit: the OP states that "built-in local UI for DuckDB" and "full-featured local web user interface is available out-of-the-box". These statements make me think this feature comes with the release binary, not that it's an extension.
To clarify my point: for me it's not the possible confusion of what this plugin does or how, but what this collaboration means for the future of DuckDB's no-cost and commercial usage.
I agree that the blog post seems to hint at the fact that this functionality is fully baked in in certain places - we've adjusted the blog post to be more explicit on the fact that this is an extension.
We have collaborated with MotherDuck on streamlining the experience of launching the UI through auto-installation, but the DuckDB Foundation still remains in full control of DuckDB and the extension ecosystem. This has no impact on that.
For further clarification:
* The auto-installation mechanism is identical to that of other trusted extensions - the auto-installation is triggered when a specific function is called that does not exist in the catalog - in this case the `start_ui` function. See [1]. The query I mentioned just calls that function. The only special feature here is the addition of the CLI flag (and what that flag executes is user-configurable).
* The HTTP server is necessary for the extension to function as the extension needs to communicate with the browser. The server is open-source as part of the extension code [2]. The server (1) fetches web resources (javascript/css) from ui.duckdb.org, and (2) communicates with localhost to co-ordinate the UI with DuckDB. Outside of these the server doesn't interface with other external web services.
Reminiscent of what Deno are doing with their Deno K/V feature, which works in the open source project using SQLite but gets a big upgrade if you use it with Deno Deploy: https://til.simonwillison.net/deno/deno-kv
I'm OK with this. Commercial open source projects need a business model. I get why this can be controversial, but the ecosystem needs to find ways to fund future development and I'm willing to compromise on purity if it means people are getting paid for their work.
(Actually it looks like the UI feature may depend on loading closed source assets across the Internet? If so that changes my comfort level a lot, I'm not keen on that compromise.)
I don't see this as the same thing. Deno is an OS product within a commercial enterprise. DuckDB is an OS project/org; MotherDuck is a for-profit company. They have tight integration and partnerships but were largely independent. This seems to be blurring that line. There is a huge ecosystem around SQLite without this confusion.
I have thought that the commercial nature of the (heh) mother company here, DuckDB labs, is support contracts and the like. Whilst MotherDuck is just another VC funded company in the DuckDB ecosystem. This new extension being added the list of default extensions blurs the line. That it seemingly is a proxy to closed source product from another company makes things even murkier. I can see a point for a for-pay external extension, but this one feels more like an AD for other company's services.
DuckDB labs has stock in MotherDuck to align ownership.
I actually really like the close partnerships in theory because it aligns incentives, but this crosses the line by not being open enough. The tight motherduck integration with DuckDB for externally hosted DuckDB/Motherduck databases is fine and good: preferential treatment where the software makes it easy to use the sponsoring service. The local UI which is actually fully dependent on the external service is off-putting. It's a little less bad because it's an extension, but it's still worrying from a governance and principals perspective.
That doesn't change what they're saying. The self-hosted backend you're linking is a network-accessible version of the local SQLite backend. The hosted backend is transparently globally replicated and built on FoundationDB, with a very different (better) scaling story.
Given the floss implementation, if one wanted, they could create their own DenoKV backed by anything they like... Azure Cosmos, DynamoDB, CockroachLabs are all possible, and given the relatively small API, should be relatively easy to do if anyone wanted to do such a thing.
I think primary concern is will DucDb pull something like RedisLabs. Wherein they are open source till it gets enough traction and after that pull the rug.
To be fair, the “traction” here was AWS using their massive competitive levers to kill RedisLabs’ long-existing (and quite reasonable/tolerated by open source) monetization avenue, risking the continued funding for redis.
I think this is a bit of a non issue. The UI is just that, a UI. Take it or leave it. If it makes your life easier, great. If not, nothing changes about how you use DuckDB.
There is always going to be some overlap between open source contributions and commercial interests but unless a real problem emerges like core features getting locked behind paywalls there is no real cause for concern. If that happens then sure let’s talk about it and raise the issue in a public forum. But for now it is just a nice convenience feature that some people (like me) will find useful.
i’m one of the co-founders at MotherDuck. our team is building the UI in collaboration with the team at DuckDB Labs.
this is a first release. we know there are going to be tons of feature requests (including @antman’s request for simple charts). feel free to chime in on this thread and we’ll keep an eye on it!
meanwhile, hope you enjoy this release! we had lots of fun building it.
Y'all at MotherDuck are doing such a great job that I encourage you to not try and muddle the open/closed source divide, at least not this early in the startup lifecycle. Having a local MotherDuck interface is awesome, and doesn't gain much by being 'open source'. Wait to cash out on the community good will when the rewards are higher.
Just a few days ago I have been looking for existing column explorers that look like from Kaggle Dataset, but I was not able to find anything. And this one by DuckDB is better!
I have seen a ton of DB GUI clients/ cloud based data tools for analytics purposes and the fact that MotherDuck's column explorer/ column data distribution is hands down the best I know is puzzling me.
It seems nobody else besides them cares.
Seeing data distribution, unique values, min/ max/ percentiles is so easy and powerful.
Really commend whoever came up with that.
It's a bit of a shame this metadata cannot be queried itself, would be immensely useful for automatic data profiling/ QA at scale.
True; at the moment the UI is not open source. We've talked about releasing the Column Explorer as a standalone component, but haven't been able to prioritize it yet. We'd like to!
it's partly bc this would be extremely slow and expensive with many other databases (e.g. it'd be really slow on postgres, very expensive on snowflake).
(I designed and built the Column Explorer feature)
Observable's column summary feature is very nice! But I do think there's a very common lineage around these kinds of diagnostics which motivated both Observables and ours. See Jeff Heer's profiler paper[1] for more.
I'm very passionate about this area because I think "first mile problems" are underserved by most tools, but they take the longest to work out.
We had to do some gnarly things[2] to make this feature work well; and there's a lot of room to make it scale nicely and cover all DuckDB data types.
I find the ease and intuitiveness of navigating it as well as the clarity of the information presented even for the density of a small window or many columns outstandingly pleasant.
The notebook style of exploring data in a database is absolutely great, but I have yet to find a great implementation of it.
Azure Data Studio can connect to a variety of databases and has completions, but tend to forget if you've set a cell to output a plot. It also doesn't have good functionality for carrying over results from one cell to the next.
Jupyter notebooks don't have any kind of autocompletion against a database (at least to my knowledge), but you do get a lot of control of how you want to store things between cells and display things.
This DuckDB UI looks great, and while DuckDB can read a lot of files, I'm not sure if it has enough connectors to be a general database exploration notebook
I love DuckDB Labs. They get to work on their cool engine. Get paid by Databricks to build Delta Support. Get paid by MotherDuck to build a UI. Always making the core open-source offering better, but getting massively VC funded companies to pay for it.
I’ve been using IntelliJ’s JDBC-based UI, this will add a lot more capability. I’m using the manifold-sql[1] project with duckdb for analytics, amazing.
The article says nothing about licensing. Can I put this in front of paying customers without bothering with signing contracts and forwarding cash to someone else?
Anecdote. Last year I had to work with a heavy analytics process. The whole thing was 4 or 5 large steps and was written with PySpark. It was really slow and memory on my system run quite low (on a 8Gb system with a generous swap), sometimes even stopping the whole processing of the pipeline. For one heavy step we tried out DuckDB and I was blown away how performant against PySpark was. It was not only fast as hell but its memory footprint extremely low as well, almost as if something was wrong and had to recheck several times that it was correct, and yes it did what it was supposed to do. Now this is a place where I do actually care about how fast and performant a thing can be and not the nanoseconds that each JS frontend framework of the day claims to win. KUDOS to the DuckDB team.
And even when you jave quite a lot of machines worth of processing some single threaded streaming of data on a single machine can still beat out any distributed framework as the the overhead of distribution is large.
A favorite paper, “Scalability! But at what COST?”. Authors show a single machine implementation (even single threaded) can wipe the floor with the maximum parallel capable implementation.
We haven't developed the PySpark pipeline. It was given to us to be improved, which we did a whole rewrite to leave it more clean and understandable. We also tried a persistence switch to test if it was a better choice just in case a step failed we could resume from a prevoius one. I also had zero hands-on on PySpark and DuckDB. But yes, I was amazed at how far it was falling behind DuckDB. I wasn't expecting such a difference. Ah also this pipeline did indeed run on the cloud, but it was not posible to test it there, so the only choice was to run it locally.
Weirdly, as cool as this looks, it's a bit concerning to me. It feels like this is marking a milestone in the history of a great open source project where they are doing one or many of the following:
1) Biting off more than they can chew,
2) Putting significant effort into something that's outside of their core value proposition,
3) Leaning more in the direction of supporting things with a for profit company that gradually cannibalizes the open source side.
Other commented on the frontend not being open source at the moment (which I hope they will eventually come around and OS it). But I just wanted to say how great this feels. In particular, being able to launch from within the CLI is a godsend because sometimes you start in the CLI and then realise you are better served with a GUI due to data complexity, etc.
Would it be possible to install duckdb extensions in python using packages instead of dialing back home to the extension service? Lots of companies block direct connections to that service but allow packages via JFrog's Artifactory.
I don't have anything to say in regards to DuckDB or this UI. But, I do find it funny that their homepage animation causes google to index the description as:
This leaves a bad taste in my mouth, because motherduck is going to try and use this to squeeze more money out of duckdb. It’s a slippery slope from here on out.
To me, Motherduck have so far been excellent stewards for DuckDB. I want them to find a sustainable business model.
I doubt they’ll ever enshittify DuckDB core. It’s clear they’re only aiming for better integration with their paid service via peripherals like UI to improve the experience, but you also don’t need to use it?
I see many folks trying to build UI for multiple databases, when excellent open source solutions like DBeaver exist. Is there a reason to use this UI compared to DBeaver, through which I can interact almost all major databases?
Just wondering about reactivity, imports, exports, plain file storage? I don't expect it to be there on a first release, but that's where my mind goes if I see a reference to the notebook form factor.
At risk of harping on a tired topic, have you thought about embedding an AI query generator? For ad-hoc queries like I mostly use DuckDB for I’ve found it’s almost always fastest for me to paste the schema to ChatGPT, tell it what I’ll looking for, then paste the response back into the DuckD CLI, but the whole process isn’t very ergonomic.
I think I’m sort of after duckbook.ai, but with access to a local duckdb.
Thanks for sharing. We haven't cracked the code on doing this locally, but we are working on similar features and functionality in MotherDuck, like the prompt () and embedding () functions. More to come; we're definitely thinking about it!
The UI looks quite nice. I am heavily using DBeaver with various different analytical DBs. Right now I am not sure though what the built-in UI offers, which is not in DBeaver...
What is the best method for using the UI with a remote server that only has SSH access? The database is too large to rsync locally and seems risky to start opening ports?
> Support for the UI is implemented in a DuckDB extension. The extension embeds a localhost HTTP server, which serves the UI browser application, and also exposes an API for communication with DuckDB. In this way, the UI leverages the native DuckDB instance from which it was started, enabling full access to your local memory, compute, and file system.
Given the above I'm not sure it supports SSH functionality? Since it exposes an API though there is probably a way to access it, but the easiest solution is probably the one you don't want, which is to open the expected port and just hit it up in a browser. You could open it only to your (office/VPN) IP address, that way at least you're only exposing the port to yourself.
My ip is dynamic so it seems I would need to wrap it in a script that would handle opening and closing. I didn’t see any authentication built into the UI. Seems like a great local tool but harder to get right in production.
And re-reading a bit it does appear to support remote data warehouses, as it has Mother Duck integration, and that is what Mother Duck is. Someone will probably add an interface to make this kind of thing possible for privately hosted DBs. The question is will it be dynamic via SSH tunnel or is it exclusively API driven? And does it depend on the closed source (I think?) Mother Duck authentication system.
DuckDB is mind blowingly awesome. It is like SQLite, lightweight, embeddable, serverless, in-memory database, but it's optimized to be columnar (analytics optimized). It can work with files that are in filesystem, S3 etc without copying (it just looks at the necessary regions in the file) by just doing `select * from 's3://....something.parquet'`. It support automatic compression and automatic indexing. It can read json lines, parquet, CSV, its own db format, sqlite db, excel, Google Sheets... It has a very convenient SQL dialect with many QoL improvements and extensions (and is PostgreSQL compatible). Best of all: it's incredibly fast. Sometimes it's so fast that I find myself puzzled "how can it possibly analyze 10M rows in 0.1 seconds?" and I find it difficult to replicate the performance in pure Rust. It is an extremely useful tool. In the last year, it has become one of my use-everyday tools because the scope of problems you can just throw DuckDB at is gigantic. If you have a whole bunch of structured data that you want to learn something about, chances are DuckDB is the ideal tool.
PS: Not associated with DuckDB team at all, I just love DuckDB so much that I shill for them when I see them in HN.
I'm sorry, I must be exceptionally stupid (or haven't seriously worked in this particular problem domain and thus lacking awareness), but I still can't figure out the use cases from this feature list.
What sort of thing should I be working on, to think "oh, maybe I want this DuckDB thing here to do this for me?"
I guess I don't really get the "that you want to learn something about" bit.
I’m not the person you asked, but here are some random, assorted examples of “structured data you want to learn something about”:
- data you’ve pulled from an API, such as stock history or weather data,
- banking records you want to analyze for patterns, trends, unauthorized transactions, etc
- your personal fitness data, such as workouts, distance, pace, etc
- your personal sleep patterns (data retrieved from a sleep tracking device),
- data you’ve pulled from an enterprise database at work — could be financial data, transactions, inventory, transit times, or anything else stored there that you might need to pull and analyze.
Here’s a personal example: I recently downloaded a publicly available dataset that came in the form of a 30 MB csv file. But instead of using commas to separate fields, it used the pipe character (‘|’). I used DuckDB to quickly read the data from the file. I could have actually queried the file directly using DuckDB SQL, but in my case I saved it to a local DuckDB database and queried it from there.
My dumb guy heuristic for DuckDB vs SQLite is something like:
- Am I doing data analysis?
- Is it read-heavy, write-light, using complex queries over large datasets?
- Is the dataset large (several GB to terabytes or more)?
- Do I want to use parquet/csv/json data without transformation steps?
- Do I need to distribute the workload across multiple cores?
If any of those are a yes, I might want DuckDB
- Do I need to write data frequently?
- Are ACID transactions important?
- Do I need concurrent writers?
- Are my data sets tiny?
- Are my queries super simple?
If most of the first questions are no and some of these are yes, SQLite is the right call
On way to think about it is SQLite for columnar / analytical data.
It works great against local files, but my favorite DuckDB feature is that it can run queries against remote Parquet files, fetching just the ranges of bytes it needs to answer the query using HTTP range queries.
This means you can run eg a count(*) against a 15GB parquet file from your laptop and only fetch a few hundred KBs of data (if that).
Small intro, It's a relational database for analytical data primarily. It's an "in-process" database meaning you can import certain files at runtime and query them. That's how it differs primarily from regular relational systems.
for the average developer I think the killer feature is allowing you to query over whatever data you want (csv, json, parquet, even gsheets) as equals, directly from their file form - can even join across them
It has great CSV and JSON processing so I find it's almost better thought of as an Excel-like tool vs. a database. Great for running quick analysis and exploratory work. Example: I need to do some reporting mock-ups on Jira data; DuckDB sucks it all in (or queries exports in place), makes it easy to clean, filter, pivot, etc. export to CSV
If you're developing in the data space you should consider your "small data" scenarios (ex: the vast majority of our clients have < 1GB of analytical data; Snowflake, etc. is overkill). Building a DW that exists entirely in a local browser session is possible now; that's a big deal.
Love to see this! This is something rethinkdb (RIP) got right from the start IMO and I like to see tooling like this available from the manufacturer :)
This looks pretty great. The UI looked fantastic, and the post mentioned that it was open source. However what's open source appears to be the DuckDB extension, which forwards the requests to a remote URL. I've not been able to find the code for the actual UI.
Is the actual UI open source, or is that something MotherDuck is allowing to be used by this while remaining proprietary? Right now it doesn't appear like this would work without an internet connection.
Yeah, this is really concerning. The handwaving around "keeping the ui up to date" by hosting it on ui.duckdb.org instead of embedding it doesn't taste great to me.
At least it's hosted on duckdb.org and not mother duck, but I really would expect to see that source somewhere. Disappointing unless I've missed it.
Breadcrumbs in the extension src: https://github.com/duckdb/duckdb-ui/blob/963e0e4d4c6f84b2536...
Yes. So confirmation from Jeff Raymakers, a software engineer at MotherDuck, the UI is not open source.
> Jeff Raymakers — Today at 9:25 AM
> The language in the blog post is misleading, and we're going to correct it.
> The UI extension is open source, but the UI itself is not.
The docs say that the extension's server is configured here: https://duckdb.org/docs/stable/extensions/ui#remote-url
But yeah, I can't find docs nor source for the UI. And the extension docs refer to MotherDuck's own UI: https://motherduck.com/docs/getting-started/motherduck-quick...
So, a bit confusing way this is set up.
It’s quite funny the docs also say this about the configurable url:
> Be sure you trust any URL you configure, as the application can access the data you load into DuckDB.
That’s certainly not what I would expect if someone gave me a “local UI” for some database. I’ve only just once toyed with duckdb, was planning to look more at it - looks like will need to have my guard and see what actually is “local” and doesn’t ship my data to a remote url.
How is this promoted as a "local UI" if it gets the UI from a remote URL?
Maybe the closed source UI is downloaded upon first execution for installation and then cached locally?
Or is this a web app that loads from the remote URL each time?
It's a web interface, but it is served from the local machine. The default is http://localhost:4213/
See the note just above this link on data locations and the optional and explicit opt-in to motherduck:
https://duckdb.org/2025/03/12/duckdb-ui.html#features
I'm a co-author of the blog post. I agree that the wording was confusing – apologies for the confusion. I added a note at the end:
> The repository does not contain the source code for the frontend, which is currently not available as open-source. Releasing it as open-source is under consideration.
The actual UI is not open source.
(Someone could write an actually open source UI extension for duckdb, but that would require a lot of investment that so far only motherduck has been able to provide.)
I've looked at quite a few options, and this one (the product of a single person) is a great base, and MIT licensed: https://github.com/caioricciuti/duck-ui
If you want to support a real OS UI take a look.
I’ll never understand how any UI projects don’t include an actual screenshot of their project as the first thing on their landing page. It seems so obvious.
I find the SqlLab in apache superset to be very good, and I have duckdb as a data source (anything that supports SqlAlchemy works). It works very well. To be honest, when I first saw the screenshot, I thought it was SqlLab. I haven't actually tried the duckdb ui, though.
Honestly I hope they keep some things proprietary. Just making everything FOSS is not a sustainable business model, and I would quite like DuckDB to continue to exist.
I have similar concerns for Astral. Frankly they're single-handedly unshitifying Python, and it would be a tragedy if they run out of money and we're back to dealing with Pip.
Concur, this is rather confusing wording and the GUI components are closed source as far as I can see.
The UI aesthetics look similar to the excellent Rill, also powered by DuckDB: https://www.rilldata.com/
Rill has better built in visualizations and pivot tables and overall a polished product with open-source code in Go/Svelte. But the DuckDB UI has very nice Jupyter notebook-style "cells" for editing SQL queries.
Rill founder here, I have no comment on the UI similarity :) but I would emphasize our vision is building DuckDB-powered metrics layers and exploratory dashboards -- which we presented at DuckCon #6 last month, PDF below [1] -- and less on notebook style UIs like Hex and Jupyter.
Rill is fully open-source under the Apache license. [2]
[1] https://blobs.duckdb.org/events/duckcon6/mike-driscoll-rill-...
[2] https://github.com/rilldata/rill
I love HN. Random comments about some service out there and replies are like "I am the founder" or "I wrote that".
Is there a video of your talk?
Yes thanks to DuckCon team it’s here:
https://youtu.be/_IqvrFWY7ZM?si=1ux9SGUsh4kDs-ff
Alongside several great talks including Rusty Conover presenting Airport - Arrow + DuckDB — and Christophe Blefari (Bl3f) introducing a new, lightweight orchestrator called yato.
Thank you for the additional recommendations!
WhatTheDuck does SQL with duckdb-wasm
Pygwalker does open-source descriptive statistics and charts from pandas dataframes: https://github.com/Kanaries/pygwalker
ydata-profiling does open-source Exploratory Data Analysis (EDA) with Pandas and Spark DataFrames and integrates with various apps: https://github.com/ydataai/ydata-profiling #integrations, #use-cases
xeus-sqlite is a xeus kernel for jupyter and jupyterlite which has Vega visualizations for sql queries: https://github.com/jupyter-xeus/xeus-sqlite
jupyterlite-xeus installs packages specified in an environment.yml from emscripten-forge: https://jupyterlite-xeus.readthedocs.io/en/latest/environmen...
emscripten-forge has xeus-sqlite and pandas and numpy and so on; but not yet duckdb-wasm: https://repo.mamba.pm/emscripten-forge
duckdb-wasm "Feature Request: emscripten-forge package" https://github.com/duckdb/duckdb-wasm/discussions/1978
Hamilton Ulmer was involved in both. Back when Twitter was a thing it was really cool to follow his process.
You still can on Bluesky https://bsky.app/profile/hamilton.bsky.social/post/3lk6yxzan...
I suggest https://perspective.finos.org/ for data viz to be built in. We use DuckDB paired with Perspective for client-side BI use case, and it's been great.
+1
we're using Perspective in crabwalk[0] (it's like dbt specifically built for duckdb and written in rust) and it's amazing paired with duckdb. Near instant loads for hundreds of thousands of rows and you can keep everything in arrow.
0 - https://github.com/definite-app/crabwalk
Where are you using/advocating crabwalk?
It does look interesting, but for the local ETL use case, I am missing the pitch on just having my own collection of SQL scripts. Presumably the all-local case needs less complexity. Unless the idea is that this will eventually support more connectors/backends and work as a full dbt replacement?
A few features:
* Built-in column level lineage (i.e. dump in 20 .sql files and crabwalk automatically figures out lineage)
* Visualize the lineage
* Clean handling of input / output (e.g. simply specify @config output and you can export results to parquet, csv, etc.)
* Tests are not yet implemented, but crabwalk will have built-in support for tests (e.g. uniqueness, joins, etc.)
we're using it in our product (https://www.definite.app/), but only for lineage right now.
Have a look at https://sql-workbench.com eventually, as it's using DuckDB WASM & Perspective to render the query results. Let me know what you think!
This is actually how I discovered Perspective!
Hahaha, nice. It's a small world.
Glad you dig it! Check out our pro version to - it also support DuckDB, Python/Pyodide and more! https://prospective.co
Wow that's really cool! Part of my PhD thesis was about writing stable treemapping algorithms for temporal data. The idea being that you want your treemap cells not to fly around like what I'm seeing in your demo, but to remain more or less in the same position without sacrificing too much on the cells aspect ratios. We've come up with a pretty effective and fast method to do that, check out the paper and a demo down below. Maybe we could even do a collaboration to get this implemented in perspective.
https://github.com/EduardoVernier/eduardovernier.github.io/b...
https://youtu.be/Bf-MRxhNMdI?list=PLy5Y4CMtJ7mKaUBrSZ3YgwrFY... (see the GIT method)
That looks much better, thanks I will read up.
The online demo looks great and promising, too bad it's unusable for me. I've tried installing it with conda from conda-forge and no luck. I've tried installing it with pip, the same. I've also cloned the repository from github, tried to build it and failed, but I don't remember the details.
Why is some software so difficult to install beats me.
Have you ever reported an issue? I use perspective heavily on a variety of platforms both conda and pypi without any problems.
Not yet, because I wanted to give it one more try while documenting all the steps.
Why Perspective? If going for a D3 wrapper, Plot would offer more flexibility.
We've built a nice integration for Plot + DuckDB, found here: https://www.duckplot.com/!
Congratulations on the launch. Looks very cool. If anyone is looking for a local non Web based editor please check out qstudio: https://www.timestored.com/qstudio/help/duckdb-sql-editor
I use studio for kdb, didn't know it can be used with duckdb too.
The UI looks nice and is by itself a welcome addition.
I am somewhat at odds with it being a default extension build into DuckDB release. This still is a feature/product coming from another company than the makers of DuckDB [1], though they did announce a partnership with makers of this UI [2]. Whilst DuckDB has so far thrived without VC money, MotherDuck has (at least) 100M in VC [3].
I guess I'm wondering where the lines are between free and open source work compared to commercial work here. My assumption has been that the line is what DuckDB ships and what others in the community do. This release seems to change that.
Yes, I do like and use nice, free things. And I understand that things have to be paid for by someone. That someone even sometimes is me. I guess I'd like clarification on the future of DuckDB as its popularity and reach is growing.
[1] https://duckdblabs.com
[2] https://duckdblabs.com/news/2022/11/15/motherduck-partnershi...
[3] https://motherduck.com/blog/motherduck-open-for-all-with-ser...
edit: I don't want to leave this negative sounding post here without addendum. I'm just concerned of future monetization strategies and roadmap of DuckDB. DuckDB is a good and useful, versatile tool. I mainly use it from Python through Jupyter, in the browser and native. I haven't felt the need for commercial services (plus purchasing them from my professional setting is too convoluted). This UI whilst undoubtedly useful seems to be leaning towards commercial side. I merely wanted some clarity on what it might entail. I do hope DuckDB and its community even more greater, better things, with requisite compensation for those who work to ensure this.
One of the DuckDB maintainers here. To clarify - the UI is not built into the DuckDB release. It is an extension that is downloaded and installed like any other extension. This extension happens to be developed by MotherDuck. We collaborated with them to streamline the experience - but fundamentally the extension is not distributed as part of DuckDB and works similarly to other extensions.
To be specific, the work we did was:
* Add the -ui command to the shell. This executes a SQL query (CALL start_ui()). The query that gets executed can be customized by the user through the .ui_command option - e.g. by setting .ui_command my_ui_function().
* The ui extension is automatically installed and loaded when the start_ui function is executed - similar to other trusted extensions we distribute. The automatic install and load can be disabled through configuration (SET autoinstall_known_extensions=false, SET autoload_known_extensions=false) and is also disabled when SET enable_external_access=false.
The nature of UI as an extension is somewhat hard to understand, since its installation method differs from other extensions. Even core ones. Some extensions autoload, some require INSTALL query, and this one has its own special builtin query. It at least feels more ingrained than other extensions by its user experience.
Then there's the (to me) entirely new feature of an extension providing a HTTP proxy for external web service. This part could have been more prominently explained.
Edit: the OP states that "built-in local UI for DuckDB" and "full-featured local web user interface is available out-of-the-box". These statements make me think this feature comes with the release binary, not that it's an extension.
To clarify my point: for me it's not the possible confusion of what this plugin does or how, but what this collaboration means for the future of DuckDB's no-cost and commercial usage.
I agree that the blog post seems to hint at the fact that this functionality is fully baked in in certain places - we've adjusted the blog post to be more explicit on the fact that this is an extension.
We have collaborated with MotherDuck on streamlining the experience of launching the UI through auto-installation, but the DuckDB Foundation still remains in full control of DuckDB and the extension ecosystem. This has no impact on that.
For further clarification:
* The auto-installation mechanism is identical to that of other trusted extensions - the auto-installation is triggered when a specific function is called that does not exist in the catalog - in this case the `start_ui` function. See [1]. The query I mentioned just calls that function. The only special feature here is the addition of the CLI flag (and what that flag executes is user-configurable).
* The HTTP server is necessary for the extension to function as the extension needs to communicate with the browser. The server is open-source as part of the extension code [2]. The server (1) fetches web resources (javascript/css) from ui.duckdb.org, and (2) communicates with localhost to co-ordinate the UI with DuckDB. Outside of these the server doesn't interface with other external web services.
[1] https://github.com/duckdb/duckdb/blob/main/src/include/duckd...
[2] https://github.com/duckdb/duckdb-ui
Reminiscent of what Deno are doing with their Deno K/V feature, which works in the open source project using SQLite but gets a big upgrade if you use it with Deno Deploy: https://til.simonwillison.net/deno/deno-kv
I'm OK with this. Commercial open source projects need a business model. I get why this can be controversial, but the ecosystem needs to find ways to fund future development and I'm willing to compromise on purity if it means people are getting paid for their work.
(Actually it looks like the UI feature may depend on loading closed source assets across the Internet? If so that changes my comfort level a lot, I'm not keen on that compromise.)
I don't see this as the same thing. Deno is an OS product within a commercial enterprise. DuckDB is an OS project/org; MotherDuck is a for-profit company. They have tight integration and partnerships but were largely independent. This seems to be blurring that line. There is a huge ecosystem around SQLite without this confusion.
I have thought that the commercial nature of the (heh) mother company here, DuckDB labs, is support contracts and the like. Whilst MotherDuck is just another VC funded company in the DuckDB ecosystem. This new extension being added the list of default extensions blurs the line. That it seemingly is a proxy to closed source product from another company makes things even murkier. I can see a point for a for-pay external extension, but this one feels more like an AD for other company's services.
DuckDB labs has stock in MotherDuck to align ownership.
I actually really like the close partnerships in theory because it aligns incentives, but this crosses the line by not being open enough. The tight motherduck integration with DuckDB for externally hosted DuckDB/Motherduck databases is fine and good: preferential treatment where the software makes it easy to use the sponsoring service. The local UI which is actually fully dependent on the external service is off-putting. It's a little less bad because it's an extension, but it's still worrying from a governance and principals perspective.
https://github.com/denoland/denokv
You can self host Deno KV since over a year.
That doesn't change what they're saying. The self-hosted backend you're linking is a network-accessible version of the local SQLite backend. The hosted backend is transparently globally replicated and built on FoundationDB, with a very different (better) scaling story.
Given the floss implementation, if one wanted, they could create their own DenoKV backed by anything they like... Azure Cosmos, DynamoDB, CockroachLabs are all possible, and given the relatively small API, should be relatively easy to do if anyone wanted to do such a thing.
Right, that's mentioned in my article: https://til.simonwillison.net/deno/deno-kv#user-content-upda...
I think primary concern is will DucDb pull something like RedisLabs. Wherein they are open source till it gets enough traction and after that pull the rug.
To be fair, the “traction” here was AWS using their massive competitive levers to kill RedisLabs’ long-existing (and quite reasonable/tolerated by open source) monetization avenue, risking the continued funding for redis.
To characterize this as a rug pull is unfair IMO.
I think this is a bit of a non issue. The UI is just that, a UI. Take it or leave it. If it makes your life easier, great. If not, nothing changes about how you use DuckDB.
There is always going to be some overlap between open source contributions and commercial interests but unless a real problem emerges like core features getting locked behind paywalls there is no real cause for concern. If that happens then sure let’s talk about it and raise the issue in a public forum. But for now it is just a nice convenience feature that some people (like me) will find useful.
i’m one of the co-founders at MotherDuck. our team is building the UI in collaboration with the team at DuckDB Labs.
this is a first release. we know there are going to be tons of feature requests (including @antman’s request for simple charts). feel free to chime in on this thread and we’ll keep an eye on it!
meanwhile, hope you enjoy this release! we had lots of fun building it.
> The DuckDB UI is also fully open source: visit the duckdb/duckdb-ui repository if you want to dive in deeper.
Is this really the case? The repo doesn’t seem to have any ui elements?
We updated the video [if that's the reference], because it is not yet open source. Thanks for pointing that out!
Is it going to be open source?
Y'all at MotherDuck are doing such a great job that I encourage you to not try and muddle the open/closed source divide, at least not this early in the startup lifecycle. Having a local MotherDuck interface is awesome, and doesn't gain much by being 'open source'. Wait to cash out on the community good will when the rewards are higher.
Is this feature open source?
Maybe you've already seen, but it appears the answer is no, based on xemoka's comment here quoting someone at duckdb
https://news.ycombinator.com/item?id=43344932
How do you compete with Supabase ? Do you have a built in authentication system? Anything like edge functions.
I've been trying to build a small card game with Supabase and I'm sorta stuck...
Supabase IMO sits in the middle of the curve between Firebase and PocketBase, not really the same use case as DuckDB & MotherDuck.
Motherduck has pretty generous free usage limits, I figured it was worth asking...
Not really the same use-case. DuckDB is more for read-heavy analytical uses.
I really like the columns explorer, https://motherduck.com/blog/introducing-column-explorer/.
Just a few days ago I have been looking for existing column explorers that look like from Kaggle Dataset, but I was not able to find anything. And this one by DuckDB is better!
I have seen a ton of DB GUI clients/ cloud based data tools for analytics purposes and the fact that MotherDuck's column explorer/ column data distribution is hands down the best I know is puzzling me.
It seems nobody else besides them cares.
Seeing data distribution, unique values, min/ max/ percentiles is so easy and powerful.
Really commend whoever came up with that.
It's a bit of a shame this metadata cannot be queried itself, would be immensely useful for automatic data profiling/ QA at scale.
A similar open source column explorer is :
https://github.com/manzt/quak
See it's demo:
https://manzt.github.io/quak/?source=https://pub-2fc10ef6724...
Do you know if there is any open source TypeScript component that can be used in a project?
None of the UI is OSS as far as I am aware. :/
True; at the moment the UI is not open source. We've talked about releasing the Column Explorer as a standalone component, but haven't been able to prioritize it yet. We'd like to!
it's partly bc this would be extremely slow and expensive with many other databases (e.g. it'd be really slow on postgres, very expensive on snowflake).
Seems heavily inspired by the column summary of ObservableHQ, but that's nice!
(I designed and built the Column Explorer feature)
Observable's column summary feature is very nice! But I do think there's a very common lineage around these kinds of diagnostics which motivated both Observables and ours. See Jeff Heer's profiler paper[1] for more.
I'm very passionate about this area because I think "first mile problems" are underserved by most tools, but they take the longest to work out.
We had to do some gnarly things[2] to make this feature work well; and there's a lot of room to make it scale nicely and cover all DuckDB data types.
[1] http://vis.stanford.edu/papers/profiler [2] https://motherduck.com/blog/introducing-column-explorer/
Personally, thank you for this awesome work!
I find the ease and intuitiveness of navigating it as well as the clarity of the information presented even for the density of a small window or many columns outstandingly pleasant.
Kudos to you!
Interesting, I guess there's plenty of ideas to grab from their work too! https://observablehq.com/documentation/cells/data-table
The notebook style of exploring data in a database is absolutely great, but I have yet to find a great implementation of it.
Azure Data Studio can connect to a variety of databases and has completions, but tend to forget if you've set a cell to output a plot. It also doesn't have good functionality for carrying over results from one cell to the next.
Jupyter notebooks don't have any kind of autocompletion against a database (at least to my knowledge), but you do get a lot of control of how you want to store things between cells and display things.
This DuckDB UI looks great, and while DuckDB can read a lot of files, I'm not sure if it has enough connectors to be a general database exploration notebook
I love DuckDB Labs. They get to work on their cool engine. Get paid by Databricks to build Delta Support. Get paid by MotherDuck to build a UI. Always making the core open-source offering better, but getting massively VC funded companies to pay for it.
I’ve been using IntelliJ’s JDBC-based UI, this will add a lot more capability. I’m using the manifold-sql[1] project with duckdb for analytics, amazing.
1. https://github.com/manifold-systems/manifold/blob/master/doc...
The article says nothing about licensing. Can I put this in front of paying customers without bothering with signing contracts and forwarding cash to someone else?
Anecdote. Last year I had to work with a heavy analytics process. The whole thing was 4 or 5 large steps and was written with PySpark. It was really slow and memory on my system run quite low (on a 8Gb system with a generous swap), sometimes even stopping the whole processing of the pipeline. For one heavy step we tried out DuckDB and I was blown away how performant against PySpark was. It was not only fast as hell but its memory footprint extremely low as well, almost as if something was wrong and had to recheck several times that it was correct, and yes it did what it was supposed to do. Now this is a place where I do actually care about how fast and performant a thing can be and not the nanoseconds that each JS frontend framework of the day claims to win. KUDOS to the DuckDB team.
simple example along these lines running on AWS (specifically paired with Iceberg): https://www.definite.app/blog/cloud-iceberg-duckdb-aws
Spark is never going to be the right choice when running on a single system.
Spark is for when you have a hundreds of machines worth of processing to do
> Spark is for when you have a hundreds of machines worth of processing to do
Absolutely agree. However, most uses of Spark I've seen in my career are people thinking they have hundreds of machines worth of processing to do.
And even when you jave quite a lot of machines worth of processing some single threaded streaming of data on a single machine can still beat out any distributed framework as the the overhead of distribution is large.
A favorite paper, “Scalability! But at what COST?”. Authors show a single machine implementation (even single threaded) can wipe the floor with the maximum parallel capable implementation.
http://dsrg.pdos.csail.mit.edu/2016/06/26/scalability-cost/
We haven't developed the PySpark pipeline. It was given to us to be improved, which we did a whole rewrite to leave it more clean and understandable. We also tried a persistence switch to test if it was a better choice just in case a step failed we could resume from a prevoius one. I also had zero hands-on on PySpark and DuckDB. But yes, I was amazed at how far it was falling behind DuckDB. I wasn't expecting such a difference. Ah also this pipeline did indeed run on the cloud, but it was not posible to test it there, so the only choice was to run it locally.
Weirdly, as cool as this looks, it's a bit concerning to me. It feels like this is marking a milestone in the history of a great open source project where they are doing one or many of the following:
1) Biting off more than they can chew,
2) Putting significant effort into something that's outside of their core value proposition,
3) Leaning more in the direction of supporting things with a for profit company that gradually cannibalizes the open source side.
Maybe I'm being too cynical. I hope I'm wrong.
duckdb labs didn't make the UI, motherduck did. The extension just launches the web UI.
you might have a point on #3, but they need to pay the bills somehow.
The fact that this is what they need to do to pay the bills doesn't decrease my concern, it increases it.
Other commented on the frontend not being open source at the moment (which I hope they will eventually come around and OS it). But I just wanted to say how great this feels. In particular, being able to launch from within the CLI is a godsend because sometimes you start in the CLI and then realise you are better served with a GUI due to data complexity, etc.
Would it be possible to install duckdb extensions in python using packages instead of dialing back home to the extension service? Lots of companies block direct connections to that service but allow packages via JFrog's Artifactory.
I don't have anything to say in regards to DuckDB or this UI. But, I do find it funny that their homepage animation causes google to index the description as:
DuckDB is a fast ana| database system.
This leaves a bad taste in my mouth, because motherduck is going to try and use this to squeeze more money out of duckdb. It’s a slippery slope from here on out.
To me, Motherduck have so far been excellent stewards for DuckDB. I want them to find a sustainable business model.
I doubt they’ll ever enshittify DuckDB core. It’s clear they’re only aiming for better integration with their paid service via peripherals like UI to improve the experience, but you also don’t need to use it?
It’s all extensions that you can develop the end.
I see many folks trying to build UI for multiple databases, when excellent open source solutions like DBeaver exist. Is there a reason to use this UI compared to DBeaver, through which I can interact almost all major databases?
Really cool. Could you elaborate a but more on what the 'notebook' form factor entails? Should we expect the same as other notebook environments?
Our notebook form factor is unique compared to other notebook environments - we don't serialize the results.
We also have some added bonuses for query profiling and data exploration like the Column Explorer.
The easiest way to give it a whirl is to type 'duckdb -ui' in the CLI.
Let us know if you have any other questions
Just wondering about reactivity, imports, exports, plain file storage? I don't expect it to be there on a first release, but that's where my mind goes if I see a reference to the notebook form factor.
is it a jupyter or marimo style notebook or some third thing?
This looks great!
At risk of harping on a tired topic, have you thought about embedding an AI query generator? For ad-hoc queries like I mostly use DuckDB for I’ve found it’s almost always fastest for me to paste the schema to ChatGPT, tell it what I’ll looking for, then paste the response back into the DuckD CLI, but the whole process isn’t very ergonomic.
I think I’m sort of after duckbook.ai, but with access to a local duckdb.
Thanks for sharing. We haven't cracked the code on doing this locally, but we are working on similar features and functionality in MotherDuck, like the prompt () and embedding () functions. More to come; we're definitely thinking about it!
You can potentially use Ollama running a model locally, e.g. https://ollama.com/library/duckdb-nsql
The UI of duckbook.ai is great! I wish someone would open-source something similar!
If the vision here is to build a local-first version of MotherDuck, the future of small data is very very bright.
The UI looks quite nice. I am heavily using DBeaver with various different analytical DBs. Right now I am not sure though what the built-in UI offers, which is not in DBeaver...
What is the best method for using the UI with a remote server that only has SSH access? The database is too large to rsync locally and seems risky to start opening ports?
> Support for the UI is implemented in a DuckDB extension. The extension embeds a localhost HTTP server, which serves the UI browser application, and also exposes an API for communication with DuckDB. In this way, the UI leverages the native DuckDB instance from which it was started, enabling full access to your local memory, compute, and file system.
Given the above I'm not sure it supports SSH functionality? Since it exposes an API though there is probably a way to access it, but the easiest solution is probably the one you don't want, which is to open the expected port and just hit it up in a browser. You could open it only to your (office/VPN) IP address, that way at least you're only exposing the port to yourself.
My ip is dynamic so it seems I would need to wrap it in a script that would handle opening and closing. I didn’t see any authentication built into the UI. Seems like a great local tool but harder to get right in production.
True, but then again it is called a "local UI".
And re-reading a bit it does appear to support remote data warehouses, as it has Mother Duck integration, and that is what Mother Duck is. Someone will probably add an interface to make this kind of thing possible for privately hosted DBs. The question is will it be dynamic via SSH tunnel or is it exclusively API driven? And does it depend on the closed source (I think?) Mother Duck authentication system.
SSH port forwarding?
It looks like the port is configurable, so that should make it easier to avoid conflicts but I wonder how the performance would be impacted.
I was able to get it working and it seemed fast enough. However I don't have any local databases of similar size to compare to.
ssh -F ssh.config -L 4213:localhost:4213 dev 'DUCKDB_HTTPPORT=4213 ~/.duckdb/cli/latest/duckdb -ui'
This is such a needed addition! Huge duckdb fan, congrats team!
I'm a bit out of the loop here, but what's the use case for DuckDB?
DuckDB is mind blowingly awesome. It is like SQLite, lightweight, embeddable, serverless, in-memory database, but it's optimized to be columnar (analytics optimized). It can work with files that are in filesystem, S3 etc without copying (it just looks at the necessary regions in the file) by just doing `select * from 's3://....something.parquet'`. It support automatic compression and automatic indexing. It can read json lines, parquet, CSV, its own db format, sqlite db, excel, Google Sheets... It has a very convenient SQL dialect with many QoL improvements and extensions (and is PostgreSQL compatible). Best of all: it's incredibly fast. Sometimes it's so fast that I find myself puzzled "how can it possibly analyze 10M rows in 0.1 seconds?" and I find it difficult to replicate the performance in pure Rust. It is an extremely useful tool. In the last year, it has become one of my use-everyday tools because the scope of problems you can just throw DuckDB at is gigantic. If you have a whole bunch of structured data that you want to learn something about, chances are DuckDB is the ideal tool.
PS: Not associated with DuckDB team at all, I just love DuckDB so much that I shill for them when I see them in HN.
I'm sorry, I must be exceptionally stupid (or haven't seriously worked in this particular problem domain and thus lacking awareness), but I still can't figure out the use cases from this feature list.
What sort of thing should I be working on, to think "oh, maybe I want this DuckDB thing here to do this for me?"
I guess I don't really get the "that you want to learn something about" bit.
If you’re using SQLite already, then it’s the same use case but better at analytics
If you’re using excel power query and XLOOKUPs, then it’s similar but dramatically faster and without the excel autocorrection nonsense
If you’re doing data processing that fits on your local machine eg 50MB, 10GB, 50GB CSVs kind of thing, then it should be your default.
If you’re using pandas/numpy, this is probably better/faster/easier
Basically if you’re doing one-time data mangling tasks with quick python scripts or excel or similar, you should probably be looking at SQLite/duckdb.
For bigger/repeatable jobs, then just consider it a competitor to doing things with multiple CSV/JSON files.
I’m not the person you asked, but here are some random, assorted examples of “structured data you want to learn something about”:
- data you’ve pulled from an API, such as stock history or weather data,
- banking records you want to analyze for patterns, trends, unauthorized transactions, etc
- your personal fitness data, such as workouts, distance, pace, etc
- your personal sleep patterns (data retrieved from a sleep tracking device),
- data you’ve pulled from an enterprise database at work — could be financial data, transactions, inventory, transit times, or anything else stored there that you might need to pull and analyze.
Here’s a personal example: I recently downloaded a publicly available dataset that came in the form of a 30 MB csv file. But instead of using commas to separate fields, it used the pipe character (‘|’). I used DuckDB to quickly read the data from the file. I could have actually queried the file directly using DuckDB SQL, but in my case I saved it to a local DuckDB database and queried it from there.
Hope that helps.
My dumb guy heuristic for DuckDB vs SQLite is something like:
If any of those are a yes, I might want DuckDB If most of the first questions are no and some of these are yes, SQLite is the right callWow... sounds pretty good... you should be doing PR for them... I might give it a try, sounds like I should.
On way to think about it is SQLite for columnar / analytical data.
It works great against local files, but my favorite DuckDB feature is that it can run queries against remote Parquet files, fetching just the ranges of bytes it needs to answer the query using HTTP range queries.
This means you can run eg a count(*) against a 15GB parquet file from your laptop and only fetch a few hundred KBs of data (if that).
Small intro, It's a relational database for analytical data primarily. It's an "in-process" database meaning you can import certain files at runtime and query them. That's how it differs primarily from regular relational systems.
for the average developer I think the killer feature is allowing you to query over whatever data you want (csv, json, parquet, even gsheets) as equals, directly from their file form - can even join across them
It has great CSV and JSON processing so I find it's almost better thought of as an Excel-like tool vs. a database. Great for running quick analysis and exploratory work. Example: I need to do some reporting mock-ups on Jira data; DuckDB sucks it all in (or queries exports in place), makes it easy to clean, filter, pivot, etc. export to CSV
If you're developing in the data space you should consider your "small data" scenarios (ex: the vast majority of our clients have < 1GB of analytical data; Snowflake, etc. is overkill). Building a DW that exists entirely in a local browser session is possible now; that's a big deal.
This looks nice! It could be a replacement for me for duckdb-parquet, a plugin for Datasette that lets you run it on top of DuckDB instead of SQLite.
is there any ability for us to log centrally the SQL queries executed from multiple laptops against our s3 iceberg store?
we use a canvas windowed approach for duck db but we specialize in system perf data.
https://yeet.cx/play
Nice, hoping it pivottable ui and some simple graph capability
https://news.ycombinator.com/item?id=43347834
Wow, big fan of duckdb and this is a great step forward.
It's a start to something great. Keep it up!
Love to see this! This is something rethinkdb (RIP) got right from the start IMO and I like to see tooling like this available from the manufacturer :)
Congrats Jeff, Ryan, Antony, Dan, Sheila!
Duckdb and polars have changed my Python development completely. Great packages that can work together, excited to see this.
wondering why just polars is not enough?
Some of us are better as sql guy than pandas/polars guys syntax.
but polars has sql too
Amazing! Allow publishing please!
Amazing feature/release!
Real hacker new ! I definitely have to try it.
how come DuckDB manages to keep delivering such great new features?
it would be awesome if these worked:
Just came here to say, the demo video was awesome!
Refreshing to neither see a loom recording or a high budget video set in a Japandi architecture style office designed to go viral.
Is it possible to use DuckDB on a per-user basis? Does Motherduck enable this?
DuckDB is single player and single node.
MotherDuck lets you run a fleet of DuckDB instances as a managed cloud service.
Yet another web application.
with the WASM work in DuckDB this is actually a great use-case. For so many workflows you can do everything in a local browser session.
The best kind of cross-platform application
[dead]