If you're going to use SQLite as an application file format, you should:
1. Enable the secure_delete pragma <https://antonz.org/sqlite-secure-delete/> so that when your user deletes something, the data is actually erased. Otherwise, when a user shares one of your application's files with someone else, the recipient could recover information that the sender thought they had deleted.
2. Enable the options described at <https://www.sqlite.org/security.html#untrusted_sqlite_databa...> under "Untrusted SQLite Database Files" to make it safer to open files from untrusted sources. No one wants to get pwned when they open an email attachment.
3. Be aware that when it comes to handling security vulnerabilities, the SQLite developers consider this use case to be niche ("few real-world applications" open SQLite database files from untrusted sources, they say) and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed. https://www.sqlite.org/cves.html
They fail to mention any of this on their marketing pages about how you should use SQLite as an application file format.
>and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed.
I think that's an unfair reading. Sqlite runs fuzzers itself and quickly addresses bugs found by fuzzers externally. There's an entire section in their documentation about their own fuzzers and thanking third party fuzzers, including credit to individual engineers.
The tone of the CVE docs are because people freak out about CVEs flagged by automated tools when the CVEs are for issues that have no security impact for typical usage of SQLite, or have prerequisites that would already have resulted in some form of compromise.
> The tone of the CVE docs are because people freak out about CVEs flagged by automated tools when the CVEs are for issues that have no security impact for typical usage of SQLite, or have prerequisites that would already have resulted in some form of compromise.
The CVE docs:
> The attacker can submit a maliciously crafted database file to the application that the application will then open and query
This is exactly the normal use case GP talks about with application file formats.
That's true, but most usage of sqlite is not as an application file format, and many of those CVEs don't apply even to that use case. The reason people have policies around CVE scanning is because CVEs often represent real vulnerabilities. But there's also a stuff like "this regex has exponential or polynomial runtime on bad inputs", which is a real security issue for some projects and not others, depending on what the input to the regex is. That's also true for SQLite, and I'm guessing that the author of that page has spent a bunch of time explaining to people worried about some CVE that their usage is not vulnerable. The maintainer of cURL has expressed similar frustration.
> but most usage of sqlite is not as an application file format,
This is exactly the OTHER way around. Most usages of SQLite are as an application file format. Firefox stores bookmarks, history, cookies in SQLite files in the profiles folder. Messaging apps (WhatsApp, Signal, etc. use SQLite for chat history). macOS and Windows use SQLite in various subsystems, ex: Spotlight metadata, application cache. Mobile apps use SQLite heavily. And probably ten thousand other cases as a file format if I bother to look up more.
Mobile apps store SQLite dbs in their private data directory that only they can access. In order to exploit a vulnerability you'd have to first break the sandbox. Desktop OSes generally have far weaker protections than that, if you have access to the user's profile directory you can steal all of their credentials or plant executables etc.
When I think application file format I think of something like .txt, pdf, or .doc, where it's expected that you'll receive untrusted input passed around. In that case it makes a lot more sense to restrict which features of SQLite are accessible, and even then I'd worry about using it in widely - there's so much surface area, plus the user confusion of shm and wal files.
On the other hand, exploiting weaknesses in MITRE’s CVE program to create ticket management primitives, creating “shellcode” that composes them to implement a feature request tracking API, using it to manage your open source organization’s feature roadmap, sure would make for a great 2600 article…
To be fair, PRAGMA trusted_schema=OFF is recommended by the docs, it just isn't default. The docs also recommend the SQLITE_DIRECTONLY flag on all custom SQL functions.
"Most applications can use SQLite without having to worry about bugs in obscure SQL inputs." And then they recommend SQLite as a document interchange format.
Untrusted database file is not the same as untrusted SQL input.
There are parts of the SQL engine that are exposed to malicious file manipulation (the schema is stored as SQL DDL text) but that's not arbitrary SQL input.
If you want to highlight an inconsistency, this is way more worrying:
> “All historical vulnerabilities reported against SQLite require at least one of these preconditions: (…) 2. The attacker can submit a maliciously crafted database file to the application that the application will then open and query. Few real-world applications meet either of these preconditions…”
However, most of the rest of the page is speaking of arbitrary SQL input, not purposely broken database files.
> There are parts of the SQL engine that are exposed to malicious file manipulation (the schema is stored as SQL DDL text) but that's not arbitrary SQL input.
View and triggers can contain arbitrary SQL and can be defined by a malicious database file, though these can be disabled as described on the "Defense Against The Dark Arts" page.
That leaves default column values and indexes on expressions, which can execute a limited subset of SQL. I'd be worried about certain arbitrary SQL input vulnerabilities being reachable this way.
Although this is indeed a worrying statement, it seems true to me. Most users of sqlite control the SQL they use. The problem I would expect from using a database document interchange format is that a maliciously crafted database could result in a CVE. The page acknowledges this possibility, even while pointing out (in their CVE list) that it hasn't happened so far, or is rare (it's hard to parse some of their descriptions).
I'm not that concerned with bugs in sqlite. sqlite is high quality software, and the application that uses it is a more likely source of vulnerabilities.
But I do see a problem if you really need to use a sqlite that's compiled with particular non-default options.
Say I design a file format and implement it, and my implementation uses an sqlite library that's compiled with all the right options. Then I evangelize my file format, telling everyone that it's really just an sqlite database and sooo easy to work with.
First thing that happens is that someone writes a neat little utility for working with the files, written in language X, which comes with a handy sqlite3 library. But that library is not compiled with the right options, and boom, you have a vulnerable utility.
Most of the recommended [1] setting are available on a per connection basis, through PRAGMAs, sqlite3_db_config, sqlite3_limit, etc; some are global settings, like sqlite3_hard_heap_limit64.
A binding can expose those settings. It's not a given a third party utility will use them, but they can.
Ah, I missed that 9.a-c were alternatives. And that, in the absence of custom tables or functions, they are merely defense in depth for something that is already secure, barring bugs. I withdraw my concern.
Dr. Hipp occasionally gets on a soapbox and extolls the virtue of sqlite databases for use as an application file format. He also preaches about the superiority of Fossil over Git. His arguments generally make sense. I tolerate his sermons because he is one of the truly great software developers of our time, and a personal hero of mine.
These are thought-experiments to help better understand how SQLite works.
This is exactly how supporting documentation should be written so that others read it.
The problem is that better is not an abstract measure. It is better at what, for what purpose, in what context? I like fossil in the abstract, but it isn’t integrated well into any of my tools; there is only one hosting service I know of; and they took away the wysiwyg option from the build in wiki (a preference of mine). So it isn’t better for me
Your better will be measured against different criteria, etc.
One thing I would call out, if you use SQLite as an application format:
BLOB type is limited to 2GiB in size (int32). Depending on your use cases, that might seem high, or not.
People would argue that if you store that much of binary data in a SQLite database, it is not really appropriate. But, application format usually has this requirement to bundle large binary data in one nice file, rather than many files that you need to copy together to make it work.
Also you almost certainly want to do this anyway so you can stream the blobs into/out of the network/filesystem, well before you have GBs in a single blob.
That's right, but it is much easier to just use blob without application logic to worry about chunking. It is the same reason why we use SQLite in the first place, a lot of transaction / rollback logic now is on SQLite layer, not the application layer.
So the limitation is really a structural issue that Dr. Hipp at some point might resolve (or not), but pretty much has to be resolved by SQLite core team, not outside contributors (of course you can resolve it by forking, but...).
This is essential if you want to have encryption/compression + range access at the same time.
I've been using chunk sizes of 128 megabytes for my media archive. This seems to be a reasonable tradeoff between range retrieval delay and per object overhead (e.g. s3 put/get cost).
SQLite can't be reliably used in networked file systems because it heavily relies on locking to be correctly implemented. I recently had to add a check for such file systems in my application [1] because I noticed a related corruption firsthand. Simpler file formats do not demand such requirements. SQLite is certainly good, but not for this use.
In the context of this article, that's largely irrelevant: ZIP cannot be used in a multi-user scenario at all, so even if sqlite isn't perfect, it's still miles better than the ZIP format it replaces in this thought experiment.
That's pretty broad and over-generalized. Networking file systems without good lock support is almost always a bad setup by an administrator. Both NFS and CIFS can work with network-wide locks just fine.
SQLite advises against using a networking file system to avoid potential issues, but you can successfully do it.
As noted in my other comment, those "potential" issues are real and do happen from time to time. Unless SQLite gives some set of configurations to avoid such issues, I can't agree that it's over-generalized.
Are the typical Synology, Qnap, or TrueNAS devices with default Linux, macOS and Windows clients going to be set up correctly by default? If any of the typical things someone is likely to setup following wizards in a home or small office is likely to result in lock not working correctly for SQLite, then it is fair for them to warn against using it on a network file system.
As an application format, you don't generally expect people to be editing an ODF file at the same time though, so network locking doesn't really disqualify it for use as a document format.
> As an application format, you don't generally expect people to be editing an ODF file at the same time though
Oh hell yes you do. Excel spreadsheets are notorious for people wanting to collaborate on them, and PowerPoint sheets come in close second. It used to be an absolute PITA but at least Office 365 makes the pains bearable.
An interesting skim, but it would have been more meaningful if it had tackled text documents or spreadsheets to show what additional functionality would be enabled with those beyond "versioning".
Maybe it's just me, but I see the presentation functionality as one of the less used aspects of the OpenOffice family.
What he listed as the first improvement, "Replace ZIP with SQLite" would certainly apply to the other ODF formats.
He advocates breaking the XML into smaller pieces in SQLite. I suppose making each slide a new XML record could make sense. Moving over to spreadsheets, I don't know how ODF does it now, but making each sheet a separate XML could make sense.
Thinking about Write documents, I wonder what a good smaller unit would be. I think one XML per page would be too fine a granularity. You could consider one record per chapter. I doubt one record per paragraph would make sense, but it could be fun to try different ideas.
While reading I was musing one way to handle text could be to use a linked list format as storage? To make it work like that, you’d need the editor to work on a block concept and I don’t think document editors work like that?
Spreadsheets might be a little easier because you can separate out by sheet or even down to a row/column level?
I've being trying out SQLite for a side project of mine, a virtual whiteboard, I haven't quite got my head around it, but it seems to be much less of a bother than interacting with file system APIs so far. The problem I haven't really solved is how sync and maybe collaboration is going to interact with it, so far I have:
1. Plaintext format (JSON or similar) or SQLite dump files versioned by git
2. Some sort of modern local first CRDT thing (Turso, libsql, Electric SQL)
3. Server/Client architecture that can also be run locally
SQLite has a builtin session extension that can be used to record and replay groups of changes, with all the necessary handling. I don't necessarily recommend session as your solution, but it is at least a good idea to see how it compares to others.
That provides a C level API. If you know Python and want to do some prototyping and exploration then you may find my SQLite wrapper useful as it supports the session extension. This is the example giving a feel for what it is like to use:
> it is still bothersome that changing a single character in a 50 megabyte presentation causes one to burn through 50 megabytes of the finite write life on the SSD.
I used to worry a lot about this but it has never once actually come up for me. 50 megabytes is a pretty extreme example, but even so if you edit this document fewer than several million times it won't
matter.
Serializing the object graph all over again can be way faster than mapping into a tabular model. There are JSON serializers that can push multiple gigabytes per second per core. It might even be the case that, once you factor in the SSD controller quirks, the tabular updates could cause more blocks to be written than just dumping a big fat json stream all at once.
Anki's storage format is SQLite (or was a few years ago). That made it really lovely when I wanted to import the contents (including the view logs) of Anki deck I'd been using for a decade into a custom system I was designing. Just pop up the `sqlite3` REPL, poke around and see what it looks like, then write standard SQL queries to get the data out.
If I remember correctly Mendix project file format is simply a sqlite db. I thought the designer was lazy but it turns out it's a reasonable decision.
Recently, DuckDB team raise similar question on DataLake catalog format. Why not just use SQL database for that ? It's simpler and more efficient as well.
It seems like it would be relatively straightforward to make an sqlite based file format and just have users add a plugin if for some reason they couldn't upgrade their older version of LibreOffice etc. I agree with the other commenter who mentioned that the benefits for text and spreadsheet files needs more explanation. But it seems like a good enough idea to have a LibreOffice working group perform a more in depth study. If significant memory reduction is real and that would translate to fewer crashes, it would be a huge boost even if it had no other benefits, IMHO.
Interesting read! I find the idea to use SQL queries to get only the relevant data quite convincing. I do wonder how this would work in practice though. Any changes the user makes would have to be inserted with SQL to allow for the new data to be included in SQL queries, but users also expect to be able to make changes and then not save them (or save them into a different file).
Should one make a massive transaction that is only committed when saving? It is possible to commit such a transaction to a different file when using Save As?
Or maybe for editing one would need to copy the file to a separate temporary location, constantly commit to that file, and when saving move the temporary file over the original file (this way we aren't losing the resilience against corruption SQLite offers).
Or is there a better way to do this? I don't like storing pending changes into the original file since it kinda goes against how users expect files to work (and could cause them to accidentally leak data).
You could insert any modifications and just mark whatever row the current saved one is
This would also work as a really crude undo tree
I don't really know if it actually goes against users expectations, Office kinda "saves" stuff for you and stores them as temporary versions anyway, to be presented in case you forgot to save
The fundamental problem in my mind is the mixing of binary and text content. An optimal solution would separate them, allowing systems like Git do the versioning. But separating the tightly coupled parts into own files would also be annoying sharing/management wise.
Base64:ing the images into strings, like one could do with html, would probably not be ideal for compression. As a matter of fact, text-files as such would not be ideal compression-wise.
So I suppose if binary-format cant be avoided, SQLite would be as good as any other compression format. But without built-in collaboration protocol support, like CRDT, with history truncation (and diverged histories can always fall back to diff) I dont think it'd be good enough to justify the migration.
> SQLite database has a lot of capability, which this essay has only begun to touch upon. But hopefully this quick glimpse has convinced some readers that using an SQL database as an application file format is worth a second look.
It really is. One of the experiments we have been doing currently to make bug reporting from Androids easier (and to an extent, reduce user frustration and fatigue) is to store app logs (unstructured) in (an in-memory) SQLite table. It lends very well in to on-device LLMs (like Gemma 3n or Qwen2.5 0.5b), as users can Q&A to know just what the app is doing and why it won't work the way they want it to. On-device LLMs are limited (context length and/or embeddings) and too many writes (in batches of 1000 rows) to the in-memory SQLite table (surprisingly) eats up battery like no tomorrow, so this "chat to know what the app is doing" isn't rolled out to everyone, yet.
The problem they're alluding to, I think, isn't the query side, it's the creation side. adb logcat and logging in Android in general is one hell of a clusterfuck, not being helped by logging in Java being a PITA.
Didn’t Apple actually move to SQLite for their Pages/Numbers format? I remember reading years ago that it was rocky (the transition), but was maybe eventually smoothed out?
What if instead of API's for data sets, we simply placed a sqlite file onto a web server as a static asset, so you could just periodically do a GET and have a local copy.
This works as long as the data is "small" and you have no ACL for it. Assuming you mean automatic downloads.
Devdocs does something similar, but there you request to download the payload manually, and the data is still browsable online without you having to download all of it. The data is also split in a convenient manner (by programming language/library). In other words, you can download individual parts. The UI also remains available offline, which is pretty cool.
You can do this today by using the WASM-compiled SQLite module with a custom Javascript VFS that implements the SQLite VFS api appropriately for your backend. I've used it extensively in the past to serve static data sets direct from S3 for low cost.
> The use of a ZIP archive to encapsulate XML files plus resources is an elegant approach to an application file format. It is clearly superior to a custom binary file format.
Can anyone expand on this? Why would it be better than a binary format?
Having to map between SQLite and the application language seems like it’d add lots of complexity, but I don’t have any experience with custom file formats so would love some advice.
XML isn't great for exchange/interchange either due to security problems and inconsistencies in implementations. A big part of the problem is that xml has a lot of complexity, which leads to a bigger attack surface when parsing and processing untrusted data. And then xml entities are just inherently insecure, unless you disable some of their capabilities (like using remote files, and unlimited recursion).
That said, creating a format that can convey rich untrusted data is a hard problem.
Complex features are inherently complex. Say you want external resources or some scripts in document. No matter what storage format you use those are more surfaces. Problem is not storage, but what is done with information. And very often that is a lot and poorly thought out and even more poorly implemented.
But most applications don't need those features. And if they do, that should be part of the application logic, with appropriate controls. Having your parsing library make arbitrary http requests is a bad idea.
Oh, I'm not saying sqlite is better than xml for data exchange. As mentioned in other comments, sqlite's security posture towards an untrusted database is problematic. My point is that xml has problems too.
I remember I played with some software called "The Illumination Software Creator" [1], and I remember the saved project files were just SQLite databases.
I actually thought it was kind of cool, because I was able to play with it easily with some SQLite explorer tool (I forget which one) and I could easily look at how the save files actually worked.
I haven't really used SQLite for anything serious [2], but always found the idea of it kind of charming. Maybe I should dust it off and try it again.
What is it that makes you think Lunduke is pseudo-intellectual? He certainly doesn't try to pose as a scholar. If you are like most of his haters, you just refuse to believe that smart people can be conservatives.
There’s no way to discuss Lunduke without getting into politics, so I’ll leave it that Lunduke is clearly a very intelligent person who IMO mistakes his knowledge in some areas for general expertise in other unrelated fields.
It’s a common trap to fall into. See also: Ben Carson. Both of them are obviously intelligent and highly skilled in their professional fields. And both have let that convince themselves that they know everything about everything.
I don't think Lunduke is a Ben Carson type. That would be ridiculous. He has opinions about things outside his area of expertise, like all of us, but he also has some unique experiences like having worked for Microsoft and OpenSUSE. His opinions on tech are pretty solid. I also agree with his politics for the most part.
I used to think he was reasonably smart but after a certain point I realized that his knowledge of basically anything he talked about was extremely surface level, and doesn’t appear to know much after that.
I disliked him before he went super conservative, but now his YouTube channel boils down to “OMG GUYS LOOK AT HOW WOKE EVERYTHING IS WOKE WOKE WOKE WOKE WOKE PEOPLE ARE HATERS ON ME BECAUSE I SAID SOMETHING THEY DONT LIKE WOKE WOKE!”
I think Lunduke probably has moderate or high technical skills. However, the more technical you get, the smaller your audience can be.
I do get a little tired of the woke stuff, but a Youtuber has to follow a specific pattern to get traffic. It's an important message. I'm sure he takes it at least a little personally that he is banned from forums, conferences, talking to various companies about their activities, has his technical achievements (see: the top comment I replied to here, and his awful treatment by OpenSUSE folks), ignored due to irrelevant (and popular) political views, antagonized for being Jewish, etc. He wants to be a tech journalist but he is persecuted over politics. So if he complains about it a lot, I expect that and appreciate him taking the heat for saying what we all think.
> I do get a little tired of the woke stuff, but a Youtuber has to follow a specific pattern to get traffic. It's an important message.
Yes. This is why I called it a low effort grift.
The anti-woke stuff was overplayed in 2016, and it's even more tiring and stupid now. You're free to think it's "important", but it's not. It's just lazy shit he does instead of actual "journalism" (which I suppose is what he calls it).
> I'm sure he takes it at least a little personally that he is banned from forums, conferences, talking to various companies about their activities, has his technical achievements
> He wants to be a tech journalist but he is persecuted over politics.
He's not "persecuted" over politics. He's putting his opinions out there specifically to get a reaction, and then he pretends to be surprised that people actually react to his opinion. You could say it's persecution, but it's really not: everyone draws a line on this stuff.
For example: if someone was super public about lowering the age of consent to three years old then you probably wouldn't be super upset when he's no longer invited to conferences. That could technically be considered a "political opinion" and I'm sure that he would claim he's being persecuted and we would collectively roll our eyes.
Obviously Lunduke isn't that bad, at least as far as I know, but my point is that he's making provocative statements and unless he's the biggest moron on the planet then he has to know that.
It's something that bothers me; people like Lunduke will write shit specifically to be provocative (like writing a completely braindead thing about trans people not existing) and get a reaction. That is his goal. Then he acts surprised that people react negatively to the thing that he wanted and expected people to act negatively to. It's low-effort attention-seeking behavior.
I have said lots of provocative stuff throughout my life, it can be fun to make people uncomfortable. Some of it I am a bit embarrassed by, but I haven't made an entire career out of making people upset and pretending to not understand why they're upset.
> ignored due to irrelevant (and popular) political views
A viewpoint being "popular" pays no bearing on whether or not it's harmful so I have no idea why you brought it up.
If Lunduke has posted a very public negative opinion about a group of people that are active in a community (e.g. trans people in the FOSS world), then it's not "irrelevant" for people to not want to affiliate with him.
> I expect that and appreciate him taking the heat for saying what we all think.
We don't "all" think that. Pretending to be upset over a trans person working on software or purposefully misgendering people is not something I have ever really wanted to say, and even if I did I would just fucking say it instead of parasocially bonding with some wish.com wannabe demagogue.
Also, I'm not even completely convinced by his "achievements". I'm sure he worked at Microsoft and OpenSUSE, but that's not saying much. I used to work for Apple for several years. I didn't work there but I did at one point get an offer to work at Canonical. I don't want to give too much correlation data about myself but I have also worked at an extremely popular social media website. A lot of people on this forum can make similar claims. It doesn't make me or him particularly special.
Big tech companies hire a lot of very stupid people. They hire a lot of very smart people too, but even if he worked at Microsoft, even in the 90's, isn't an indication of intelligence or making major achievements, and frankly I kind of get the impression that he embellishes his achievements to try and make himself seem more credible, though I have no evidence of that.
Also, wasn't he basically just a spokesperson for OpenSUSE? I didn't think he was doing anything technical there.
ETA:
This is all to say, it's not like Lunduke has been cut out of conferences and the like just for voting republican or anything. I've met plenty of people in tech who are conservative, don't try to hide that fact, and they're not shunned or anything, so I don't buy the conspiracy theory that conservative voices are "persecuted" in tech spaces.
Lunduke goes a step further by being outwardly hostile towards LGBTQ groups, and then pretends he's not doing that. This is why he's been considered so unbelievably insufferable in the tech world: his entire way of speaking is dishonest.
>The anti-woke stuff was overplayed in 2016, and it's even more tiring and stupid now. You're free to think it's "important", but it's not. It's just lazy shit he does instead of actual "journalism" (which I suppose is what he calls it).
The woke stuff directly affects my job prospects and quality of life. That makes it important. I've been suffering because of it every since 2012. It escalated between 2016 and 2024, and only now is the pendulum swinging the other way.
>We don't "all" think that. Pretending to be upset over a trans person working on software or purposefully misgendering people is not something I have ever really wanted to say, and even if I did I would just fucking say it instead of parasocially bonding with some wish.com wannabe demagogue.
He finds much more lurid stories to report than "mere" misgendering nonsense. The pronoun thing is just a litmus test to see whether someone is in a woke cult or not. He reports on actual interesting stories, like lawsuits, new software, outsourcing, layoffs, drama in various communities, etc. (some of which occasionally involves pronouns, yes, but it's no exaggeration to say that these woke people want you banished or even dead if you refuse to go with their delusions).
>Big tech companies hire a lot of very stupid people.
It happens. I don't think Lunduke is stupid though. He is in PR more than pure tech. That doesn't make him stupid. Neither does him being conservative.
>I don't buy the conspiracy theory that conservative voices are "persecuted" in tech spaces.
It's not a theory. He regularly reports on awful treatment of conservatives. You'd be surprised at how malicious some of these woke people are. People have been banned from conferences for being seen on Twitter wearing a MAGA hat. They have been fired for being lukewarm about woke shit. I don't blame you for being out of touch, since only Lunduke seems to be willing to report the stuff, and you refuse to watch. But closing yourself off to all evidence against your views and saying "No you guys are just imagining it!" is the actually dishonest take.
>Lunduke goes a step further by being outwardly hostile towards LGBTQ groups, and then pretends he's not doing that. This is why he's been considered so unbelievably insufferable in the tech world: his entire way of speaking is dishonest.
He is not speaking to them or about them in the way they demand, you mean. People have a right to simply refuse to engage in the constant celebration of certain lifestyles and worldviews. Going to work should not require being lectured about how awesome it is for people to engage in abnormal sexual behaviors, or celebration and advancement of people based on their race or sex alone. Liberals have no problem demanding such things on a constant basis, ostracizing and seeking to banish anyone who disagrees even 5%, and that is exactly why we need people like Lunduke to bravely issue scathing critiques of these practices. Besides that, his tech news is kind of interesting and unique, and he has a good sense of humor.
Juggling all the fragments inside the database, garbage collecting all the unused ones, and maintaining consistency are all quite challenging in this use case.
has anyone actually used the `content BLOB` pattern in a larger scale? Suppose if I have tens of thousands of small jpegs would it be better off in a .sqlite file?
This applies to any secondary index. The data themselves can only be ordered by a single criteria. It may be a meaningful one, but I guess in most cases it is merely the internal ID, which means you will have to scan the whole table too.
XML was meant for documents so in most cases the sequence of elements is given. But technically if I compose XML myself I can lay it out the way I want and thus can have it sorted too. This means it will be directly searchable without an index: read a bit at the middle, find an element name, see where we are, choose head or tail, repeat.
Blindly seeking into XML data is a risky, error-prone approach. It's not impossible to do, but doing it correctly is difficult - even if the tags you're looking for are unique, there are a lot of messy edge cases involving comments and <!CDATA> blocks.
You can store the content of a XML document in a database faithfully enough to reconstruct it exactly. Any system that can produce XML documents is a "XML database".
If you're going to use SQLite as an application file format, you should:
1. Enable the secure_delete pragma <https://antonz.org/sqlite-secure-delete/> so that when your user deletes something, the data is actually erased. Otherwise, when a user shares one of your application's files with someone else, the recipient could recover information that the sender thought they had deleted.
2. Enable the options described at <https://www.sqlite.org/security.html#untrusted_sqlite_databa...> under "Untrusted SQLite Database Files" to make it safer to open files from untrusted sources. No one wants to get pwned when they open an email attachment.
3. Be aware that when it comes to handling security vulnerabilities, the SQLite developers consider this use case to be niche ("few real-world applications" open SQLite database files from untrusted sources, they say) and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed. https://www.sqlite.org/cves.html
They fail to mention any of this on their marketing pages about how you should use SQLite as an application file format.
>and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed.
I think that's an unfair reading. Sqlite runs fuzzers itself and quickly addresses bugs found by fuzzers externally. There's an entire section in their documentation about their own fuzzers and thanking third party fuzzers, including credit to individual engineers.
https://www.sqlite.org/testing.html
The tone of the CVE docs are because people freak out about CVEs flagged by automated tools when the CVEs are for issues that have no security impact for typical usage of SQLite, or have prerequisites that would already have resulted in some form of compromise.
> The tone of the CVE docs are because people freak out about CVEs flagged by automated tools when the CVEs are for issues that have no security impact for typical usage of SQLite, or have prerequisites that would already have resulted in some form of compromise.
The CVE docs:
> The attacker can submit a maliciously crafted database file to the application that the application will then open and query
This is exactly the normal use case GP talks about with application file formats.
That's true, but most usage of sqlite is not as an application file format, and many of those CVEs don't apply even to that use case. The reason people have policies around CVE scanning is because CVEs often represent real vulnerabilities. But there's also a stuff like "this regex has exponential or polynomial runtime on bad inputs", which is a real security issue for some projects and not others, depending on what the input to the regex is. That's also true for SQLite, and I'm guessing that the author of that page has spent a bunch of time explaining to people worried about some CVE that their usage is not vulnerable. The maintainer of cURL has expressed similar frustration.
> but most usage of sqlite is not as an application file format,
This is exactly the OTHER way around. Most usages of SQLite are as an application file format. Firefox stores bookmarks, history, cookies in SQLite files in the profiles folder. Messaging apps (WhatsApp, Signal, etc. use SQLite for chat history). macOS and Windows use SQLite in various subsystems, ex: Spotlight metadata, application cache. Mobile apps use SQLite heavily. And probably ten thousand other cases as a file format if I bother to look up more.
Mobile apps store SQLite dbs in their private data directory that only they can access. In order to exploit a vulnerability you'd have to first break the sandbox. Desktop OSes generally have far weaker protections than that, if you have access to the user's profile directory you can steal all of their credentials or plant executables etc.
When I think application file format I think of something like .txt, pdf, or .doc, where it's expected that you'll receive untrusted input passed around. In that case it makes a lot more sense to restrict which features of SQLite are accessible, and even then I'd worry about using it in widely - there's so much surface area, plus the user confusion of shm and wal files.
On the other hand, exploiting weaknesses in MITRE’s CVE program to create ticket management primitives, creating “shellcode” that composes them to implement a feature request tracking API, using it to manage your open source organization’s feature roadmap, sure would make for a great 2600 article…
To be fair, PRAGMA trusted_schema=OFF is recommended by the docs, it just isn't default. The docs also recommend the SQLITE_DIRECTONLY flag on all custom SQL functions.
Hrm, using sqlite as an application format would be a good use case for Limbo.
"Most applications can use SQLite without having to worry about bugs in obscure SQL inputs." And then they recommend SQLite as a document interchange format.
Untrusted database file is not the same as untrusted SQL input.
There are parts of the SQL engine that are exposed to malicious file manipulation (the schema is stored as SQL DDL text) but that's not arbitrary SQL input.
If you want to highlight an inconsistency, this is way more worrying:
> “All historical vulnerabilities reported against SQLite require at least one of these preconditions: (…) 2. The attacker can submit a maliciously crafted database file to the application that the application will then open and query. Few real-world applications meet either of these preconditions…”
However, most of the rest of the page is speaking of arbitrary SQL input, not purposely broken database files.
> There are parts of the SQL engine that are exposed to malicious file manipulation (the schema is stored as SQL DDL text) but that's not arbitrary SQL input.
View and triggers can contain arbitrary SQL and can be defined by a malicious database file, though these can be disabled as described on the "Defense Against The Dark Arts" page.
That leaves default column values and indexes on expressions, which can execute a limited subset of SQL. I'd be worried about certain arbitrary SQL input vulnerabilities being reachable this way.
Although this is indeed a worrying statement, it seems true to me. Most users of sqlite control the SQL they use. The problem I would expect from using a database document interchange format is that a maliciously crafted database could result in a CVE. The page acknowledges this possibility, even while pointing out (in their CVE list) that it hasn't happened so far, or is rare (it's hard to parse some of their descriptions).
I'm not that concerned with bugs in sqlite. sqlite is high quality software, and the application that uses it is a more likely source of vulnerabilities.
But I do see a problem if you really need to use a sqlite that's compiled with particular non-default options.
Say I design a file format and implement it, and my implementation uses an sqlite library that's compiled with all the right options. Then I evangelize my file format, telling everyone that it's really just an sqlite database and sooo easy to work with.
First thing that happens is that someone writes a neat little utility for working with the files, written in language X, which comes with a handy sqlite3 library. But that library is not compiled with the right options, and boom, you have a vulnerable utility.
Most of the recommended [1] setting are available on a per connection basis, through PRAGMAs, sqlite3_db_config, sqlite3_limit, etc; some are global settings, like sqlite3_hard_heap_limit64.
A binding can expose those settings. It's not a given a third party utility will use them, but they can.
1: https://www.sqlite.org/security.html
Ah, I missed that 9.a-c were alternatives. And that, in the absence of custom tables or functions, they are merely defense in depth for something that is already secure, barring bugs. I withdraw my concern.
Dr. Hipp occasionally gets on a soapbox and extolls the virtue of sqlite databases for use as an application file format. He also preaches about the superiority of Fossil over Git. His arguments generally make sense. I tolerate his sermons because he is one of the truly great software developers of our time, and a personal hero of mine.
These are thought-experiments to help better understand how SQLite works. This is exactly how supporting documentation should be written so that others read it.
He even went over the top with the disclaimers.
I was skeptical at the start but by the end I didn't care if it was a good idea or a bad one, I learned so much it was a great read.
The problem is that better is not an abstract measure. It is better at what, for what purpose, in what context? I like fossil in the abstract, but it isn’t integrated well into any of my tools; there is only one hosting service I know of; and they took away the wysiwyg option from the build in wiki (a preference of mine). So it isn’t better for me
Your better will be measured against different criteria, etc.
One thing I would call out, if you use SQLite as an application format:
BLOB type is limited to 2GiB in size (int32). Depending on your use cases, that might seem high, or not.
People would argue that if you store that much of binary data in a SQLite database, it is not really appropriate. But, application format usually has this requirement to bundle large binary data in one nice file, rather than many files that you need to copy together to make it work.
You can split your data up across multiple blobs
Also you almost certainly want to do this anyway so you can stream the blobs into/out of the network/filesystem, well before you have GBs in a single blob.
Singular sqlite blobs are streamable too! But for streaming in you need to know the size in advance.
That's right, but it is much easier to just use blob without application logic to worry about chunking. It is the same reason why we use SQLite in the first place, a lot of transaction / rollback logic now is on SQLite layer, not the application layer.
Also, SQLite did provide good support for read / write the blob in streamable fashion, see: https://www.sqlite.org/c3ref/blob_read.html
So the limitation is really a structural issue that Dr. Hipp at some point might resolve (or not), but pretty much has to be resolved by SQLite core team, not outside contributors (of course you can resolve it by forking, but...).
This is essential if you want to have encryption/compression + range access at the same time.
I've been using chunk sizes of 128 megabytes for my media archive. This seems to be a reasonable tradeoff between range retrieval delay and per object overhead (e.g. s3 put/get cost).
SQLite can't be reliably used in networked file systems because it heavily relies on locking to be correctly implemented. I recently had to add a check for such file systems in my application [1] because I noticed a related corruption firsthand. Simpler file formats do not demand such requirements. SQLite is certainly good, but not for this use.
[1] https://github.com/lifthrasiir/angel/commit/50a15e703ef2c1af...
In the context of this article, that's largely irrelevant: ZIP cannot be used in a multi-user scenario at all, so even if sqlite isn't perfect, it's still miles better than the ZIP format it replaces in this thought experiment.
That's pretty broad and over-generalized. Networking file systems without good lock support is almost always a bad setup by an administrator. Both NFS and CIFS can work with network-wide locks just fine.
SQLite advises against using a networking file system to avoid potential issues, but you can successfully do it.
As noted in my other comment, those "potential" issues are real and do happen from time to time. Unless SQLite gives some set of configurations to avoid such issues, I can't agree that it's over-generalized.
Are the typical Synology, Qnap, or TrueNAS devices with default Linux, macOS and Windows clients going to be set up correctly by default? If any of the typical things someone is likely to setup following wizards in a home or small office is likely to result in lock not working correctly for SQLite, then it is fair for them to warn against using it on a network file system.
As an application format, you don't generally expect people to be editing an ODF file at the same time though, so network locking doesn't really disqualify it for use as a document format.
> As an application format, you don't generally expect people to be editing an ODF file at the same time though
Oh hell yes you do. Excel spreadsheets are notorious for people wanting to collaborate on them, and PowerPoint sheets come in close second. It used to be an absolute PITA but at least Office 365 makes the pains bearable.
Easy fix is an empty lock file adjacent to the real one.
Yeah, but only if SQLite did support that mode in some built-in VFS implementation...
Which network filesystems are still corrupting sqlite files?
Sqlite on NFSv3 has been rock solid for some NFS servers for a decade.
Maybe name and shame?
Specifically I had an issue over 9p used by WSL2. (I never thought it was networked before this incident.)
It seems odd to break a wide range of valid configs for something so obscure.
In that case the application would keep a temporary file and copy over when saving
Maybe, but how would the application know if /data/foo.bar is a local file or mounted via NFS/SMB/etc?
it would always use such a temporary file and update the "real" file only on explicit saves with fast mv or cp operations
An interesting skim, but it would have been more meaningful if it had tackled text documents or spreadsheets to show what additional functionality would be enabled with those beyond "versioning".
Maybe it's just me, but I see the presentation functionality as one of the less used aspects of the OpenOffice family.
What he listed as the first improvement, "Replace ZIP with SQLite" would certainly apply to the other ODF formats.
He advocates breaking the XML into smaller pieces in SQLite. I suppose making each slide a new XML record could make sense. Moving over to spreadsheets, I don't know how ODF does it now, but making each sheet a separate XML could make sense.
Thinking about Write documents, I wonder what a good smaller unit would be. I think one XML per page would be too fine a granularity. You could consider one record per chapter. I doubt one record per paragraph would make sense, but it could be fun to try different ideas.
> I think one XML per page would be too fine a granularity.
If I add a 1/3 page graphic on page 2, it'd have to repaginate pages 2-n of that chapter, modifying n-1 XML files...
Splitting the presentation into multiple fragments makes it more difficult to generate/alter a presentation using xslt.
While reading I was musing one way to handle text could be to use a linked list format as storage? To make it work like that, you’d need the editor to work on a block concept and I don’t think document editors work like that?
Spreadsheets might be a little easier because you can separate out by sheet or even down to a row/column level?
Part of me wants to try it now…
I've being trying out SQLite for a side project of mine, a virtual whiteboard, I haven't quite got my head around it, but it seems to be much less of a bother than interacting with file system APIs so far. The problem I haven't really solved is how sync and maybe collaboration is going to interact with it, so far I have:
1. Plaintext format (JSON or similar) or SQLite dump files versioned by git
2. Some sort of modern local first CRDT thing (Turso, libsql, Electric SQL)
3. Server/Client architecture that can also be run locally
Has anyone had any success in this department?
SQLite has a builtin session extension that can be used to record and replay groups of changes, with all the necessary handling. I don't necessarily recommend session as your solution, but it is at least a good idea to see how it compares to others.
https://sqlite.org/sessionintro.html
That provides a C level API. If you know Python and want to do some prototyping and exploration then you may find my SQLite wrapper useful as it supports the session extension. This is the example giving a feel for what it is like to use:
https://rogerbinns.github.io/apsw/example-session.html
CRDTs are the way to go if you need something very robust for lots of offline work.
> it is still bothersome that changing a single character in a 50 megabyte presentation causes one to burn through 50 megabytes of the finite write life on the SSD.
I used to worry a lot about this but it has never once actually come up for me. 50 megabytes is a pretty extreme example, but even so if you edit this document fewer than several million times it won't matter.
Serializing the object graph all over again can be way faster than mapping into a tabular model. There are JSON serializers that can push multiple gigabytes per second per core. It might even be the case that, once you factor in the SSD controller quirks, the tabular updates could cause more blocks to be written than just dumping a big fat json stream all at once.
Anki's storage format is SQLite (or was a few years ago). That made it really lovely when I wanted to import the contents (including the view logs) of Anki deck I'd been using for a decade into a custom system I was designing. Just pop up the `sqlite3` REPL, poke around and see what it looks like, then write standard SQL queries to get the data out.
If I remember correctly Mendix project file format is simply a sqlite db. I thought the designer was lazy but it turns out it's a reasonable decision.
Recently, DuckDB team raise similar question on DataLake catalog format. Why not just use SQL database for that ? It's simpler and more efficient as well.
With regard to DuckDB catalogs, I think a database is preferred for that. In particular, the tutorials assume PostGres.
It should be Postgres not PostGres. The latter looks weird.
It seems like it would be relatively straightforward to make an sqlite based file format and just have users add a plugin if for some reason they couldn't upgrade their older version of LibreOffice etc. I agree with the other commenter who mentioned that the benefits for text and spreadsheet files needs more explanation. But it seems like a good enough idea to have a LibreOffice working group perform a more in depth study. If significant memory reduction is real and that would translate to fewer crashes, it would be a huge boost even if it had no other benefits, IMHO.
Interesting read! I find the idea to use SQL queries to get only the relevant data quite convincing. I do wonder how this would work in practice though. Any changes the user makes would have to be inserted with SQL to allow for the new data to be included in SQL queries, but users also expect to be able to make changes and then not save them (or save them into a different file).
Should one make a massive transaction that is only committed when saving? It is possible to commit such a transaction to a different file when using Save As?
Or maybe for editing one would need to copy the file to a separate temporary location, constantly commit to that file, and when saving move the temporary file over the original file (this way we aren't losing the resilience against corruption SQLite offers).
Or is there a better way to do this? I don't like storing pending changes into the original file since it kinda goes against how users expect files to work (and could cause them to accidentally leak data).
You could insert any modifications and just mark whatever row the current saved one is
This would also work as a really crude undo tree
I don't really know if it actually goes against users expectations, Office kinda "saves" stuff for you and stores them as temporary versions anyway, to be presented in case you forgot to save
The fundamental problem in my mind is the mixing of binary and text content. An optimal solution would separate them, allowing systems like Git do the versioning. But separating the tightly coupled parts into own files would also be annoying sharing/management wise.
Base64:ing the images into strings, like one could do with html, would probably not be ideal for compression. As a matter of fact, text-files as such would not be ideal compression-wise.
So I suppose if binary-format cant be avoided, SQLite would be as good as any other compression format. But without built-in collaboration protocol support, like CRDT, with history truncation (and diverged histories can always fall back to diff) I dont think it'd be good enough to justify the migration.
> SQLite database has a lot of capability, which this essay has only begun to touch upon. But hopefully this quick glimpse has convinced some readers that using an SQL database as an application file format is worth a second look.
It really is. One of the experiments we have been doing currently to make bug reporting from Androids easier (and to an extent, reduce user frustration and fatigue) is to store app logs (unstructured) in (an in-memory) SQLite table. It lends very well in to on-device LLMs (like Gemma 3n or Qwen2.5 0.5b), as users can Q&A to know just what the app is doing and why it won't work the way they want it to. On-device LLMs are limited (context length and/or embeddings) and too many writes (in batches of 1000 rows) to the in-memory SQLite table (surprisingly) eats up battery like no tomorrow, so this "chat to know what the app is doing" isn't rolled out to everyone, yet.
What kinds of queries are being done on the logs such that it makes sense to use sqlite instead of, like, just a ring buffer?
The problem they're alluding to, I think, isn't the query side, it's the creation side. adb logcat and logging in Android in general is one hell of a clusterfuck, not being helped by logging in Java being a PITA.
Didn’t Apple actually move to SQLite for their Pages/Numbers format? I remember reading years ago that it was rocky (the transition), but was maybe eventually smoothed out?
Given n=1 https://freeiworktemplates.com/2022/05/pages-concessions-sta... seems to imply the answer is "no, it's a zip" and that seems to hold even for the interior files
What if instead of API's for data sets, we simply placed a sqlite file onto a web server as a static asset, so you could just periodically do a GET and have a local copy.
A few years ago someone posted a site that showed how to query portions of a SQLite file without having to pull the whole thing down.
https://news.ycombinator.com/item?id=27016630
>> I implemented a virtual file system that fetches chunks of the database with HTTP Range requests
That's wild!
This works as long as the data is "small" and you have no ACL for it. Assuming you mean automatic downloads.
Devdocs does something similar, but there you request to download the payload manually, and the data is still browsable online without you having to download all of it. The data is also split in a convenient manner (by programming language/library). In other words, you can download individual parts. The UI also remains available offline, which is pretty cool.
https://devdocs.io/
You can do this today by using the WASM-compiled SQLite module with a custom Javascript VFS that implements the SQLite VFS api appropriately for your backend. I've used it extensively in the past to serve static data sets direct from S3 for low cost.
More industrious people have apparently wrapped this up on NPM: https://www.npmjs.com/package/sqlite-wasm-http
With an S3 object lambda, I suppose you could generate the sqlite file on the fly.
> The use of a ZIP archive to encapsulate XML files plus resources is an elegant approach to an application file format. It is clearly superior to a custom binary file format.
Can anyone expand on this? Why would it be better than a binary format?
I was watching a talk Andrew Kelley gave about a simple binary format he’s using in Zig: https://www.hytradboi.com/2025/05c72e39-c07e-41bc-ac40-85e83...
Having to map between SQLite and the application language seems like it’d add lots of complexity, but I don’t have any experience with custom file formats so would love some advice.
Previous discussions, 2 and 5 years ago: https://hn.algolia.com/?query=https%3A%2F%2Fwww.sqlite.org%2...
I love SQLite.
As a document _exchange_/_interchange_ format, what I prefer for durability is a non-binary format (e.g. XML based).
For local use, I agree SQLite might be much faster than ZIP, and of course the ability to query based on SQL has its own flexibility merits.
XML isn't great for exchange/interchange either due to security problems and inconsistencies in implementations. A big part of the problem is that xml has a lot of complexity, which leads to a bigger attack surface when parsing and processing untrusted data. And then xml entities are just inherently insecure, unless you disable some of their capabilities (like using remote files, and unlimited recursion).
That said, creating a format that can convey rich untrusted data is a hard problem.
Part of the problem though with saying SQLite instead of XML is a lot of things would lend themselves to XML in SQLite.
Complex features are inherently complex. Say you want external resources or some scripts in document. No matter what storage format you use those are more surfaces. Problem is not storage, but what is done with information. And very often that is a lot and poorly thought out and even more poorly implemented.
But most applications don't need those features. And if they do, that should be part of the application logic, with appropriate controls. Having your parsing library make arbitrary http requests is a bad idea.
Oh, I'm not saying sqlite is better than xml for data exchange. As mentioned in other comments, sqlite's security posture towards an untrusted database is problematic. My point is that xml has problems too.
I remember I played with some software called "The Illumination Software Creator" [1], and I remember the saved project files were just SQLite databases.
I actually thought it was kind of cool, because I was able to play with it easily with some SQLite explorer tool (I forget which one) and I could easily look at how the save files actually worked.
I haven't really used SQLite for anything serious [2], but always found the idea of it kind of charming. Maybe I should dust it off and try it again.
[1] https://en.wikipedia.org/wiki/Illumination_Software_Creator by Bryan Lunduke before I realized how much of a pseudo-intellectual dimwit that he is.
[2] At least outside of the "included" database in a few web frameworks.
What is it that makes you think Lunduke is pseudo-intellectual? He certainly doesn't try to pose as a scholar. If you are like most of his haters, you just refuse to believe that smart people can be conservatives.
There’s no way to discuss Lunduke without getting into politics, so I’ll leave it that Lunduke is clearly a very intelligent person who IMO mistakes his knowledge in some areas for general expertise in other unrelated fields.
It’s a common trap to fall into. See also: Ben Carson. Both of them are obviously intelligent and highly skilled in their professional fields. And both have let that convince themselves that they know everything about everything.
I don't think Lunduke is a Ben Carson type. That would be ridiculous. He has opinions about things outside his area of expertise, like all of us, but he also has some unique experiences like having worked for Microsoft and OpenSUSE. His opinions on tech are pretty solid. I also agree with his politics for the most part.
I would hear what he has to say about his tech experiences. I would not be in a room where he was discussing his politics.
I used to think he was reasonably smart but after a certain point I realized that his knowledge of basically anything he talked about was extremely surface level, and doesn’t appear to know much after that.
I disliked him before he went super conservative, but now his YouTube channel boils down to “OMG GUYS LOOK AT HOW WOKE EVERYTHING IS WOKE WOKE WOKE WOKE WOKE PEOPLE ARE HATERS ON ME BECAUSE I SAID SOMETHING THEY DONT LIKE WOKE WOKE!”
It’s typical low effort grifter stuff.
I think Lunduke probably has moderate or high technical skills. However, the more technical you get, the smaller your audience can be.
I do get a little tired of the woke stuff, but a Youtuber has to follow a specific pattern to get traffic. It's an important message. I'm sure he takes it at least a little personally that he is banned from forums, conferences, talking to various companies about their activities, has his technical achievements (see: the top comment I replied to here, and his awful treatment by OpenSUSE folks), ignored due to irrelevant (and popular) political views, antagonized for being Jewish, etc. He wants to be a tech journalist but he is persecuted over politics. So if he complains about it a lot, I expect that and appreciate him taking the heat for saying what we all think.
> I do get a little tired of the woke stuff, but a Youtuber has to follow a specific pattern to get traffic. It's an important message.
Yes. This is why I called it a low effort grift.
The anti-woke stuff was overplayed in 2016, and it's even more tiring and stupid now. You're free to think it's "important", but it's not. It's just lazy shit he does instead of actual "journalism" (which I suppose is what he calls it).
> I'm sure he takes it at least a little personally that he is banned from forums, conferences, talking to various companies about their activities, has his technical achievements
> He wants to be a tech journalist but he is persecuted over politics.
He's not "persecuted" over politics. He's putting his opinions out there specifically to get a reaction, and then he pretends to be surprised that people actually react to his opinion. You could say it's persecution, but it's really not: everyone draws a line on this stuff.
For example: if someone was super public about lowering the age of consent to three years old then you probably wouldn't be super upset when he's no longer invited to conferences. That could technically be considered a "political opinion" and I'm sure that he would claim he's being persecuted and we would collectively roll our eyes.
Obviously Lunduke isn't that bad, at least as far as I know, but my point is that he's making provocative statements and unless he's the biggest moron on the planet then he has to know that.
It's something that bothers me; people like Lunduke will write shit specifically to be provocative (like writing a completely braindead thing about trans people not existing) and get a reaction. That is his goal. Then he acts surprised that people react negatively to the thing that he wanted and expected people to act negatively to. It's low-effort attention-seeking behavior.
I have said lots of provocative stuff throughout my life, it can be fun to make people uncomfortable. Some of it I am a bit embarrassed by, but I haven't made an entire career out of making people upset and pretending to not understand why they're upset.
> ignored due to irrelevant (and popular) political views
A viewpoint being "popular" pays no bearing on whether or not it's harmful so I have no idea why you brought it up.
If Lunduke has posted a very public negative opinion about a group of people that are active in a community (e.g. trans people in the FOSS world), then it's not "irrelevant" for people to not want to affiliate with him.
> I expect that and appreciate him taking the heat for saying what we all think.
We don't "all" think that. Pretending to be upset over a trans person working on software or purposefully misgendering people is not something I have ever really wanted to say, and even if I did I would just fucking say it instead of parasocially bonding with some wish.com wannabe demagogue.
Also, I'm not even completely convinced by his "achievements". I'm sure he worked at Microsoft and OpenSUSE, but that's not saying much. I used to work for Apple for several years. I didn't work there but I did at one point get an offer to work at Canonical. I don't want to give too much correlation data about myself but I have also worked at an extremely popular social media website. A lot of people on this forum can make similar claims. It doesn't make me or him particularly special.
Big tech companies hire a lot of very stupid people. They hire a lot of very smart people too, but even if he worked at Microsoft, even in the 90's, isn't an indication of intelligence or making major achievements, and frankly I kind of get the impression that he embellishes his achievements to try and make himself seem more credible, though I have no evidence of that.
Also, wasn't he basically just a spokesperson for OpenSUSE? I didn't think he was doing anything technical there.
ETA:
This is all to say, it's not like Lunduke has been cut out of conferences and the like just for voting republican or anything. I've met plenty of people in tech who are conservative, don't try to hide that fact, and they're not shunned or anything, so I don't buy the conspiracy theory that conservative voices are "persecuted" in tech spaces.
Lunduke goes a step further by being outwardly hostile towards LGBTQ groups, and then pretends he's not doing that. This is why he's been considered so unbelievably insufferable in the tech world: his entire way of speaking is dishonest.
>The anti-woke stuff was overplayed in 2016, and it's even more tiring and stupid now. You're free to think it's "important", but it's not. It's just lazy shit he does instead of actual "journalism" (which I suppose is what he calls it).
The woke stuff directly affects my job prospects and quality of life. That makes it important. I've been suffering because of it every since 2012. It escalated between 2016 and 2024, and only now is the pendulum swinging the other way.
>We don't "all" think that. Pretending to be upset over a trans person working on software or purposefully misgendering people is not something I have ever really wanted to say, and even if I did I would just fucking say it instead of parasocially bonding with some wish.com wannabe demagogue.
He finds much more lurid stories to report than "mere" misgendering nonsense. The pronoun thing is just a litmus test to see whether someone is in a woke cult or not. He reports on actual interesting stories, like lawsuits, new software, outsourcing, layoffs, drama in various communities, etc. (some of which occasionally involves pronouns, yes, but it's no exaggeration to say that these woke people want you banished or even dead if you refuse to go with their delusions).
>Big tech companies hire a lot of very stupid people.
It happens. I don't think Lunduke is stupid though. He is in PR more than pure tech. That doesn't make him stupid. Neither does him being conservative.
>I don't buy the conspiracy theory that conservative voices are "persecuted" in tech spaces.
It's not a theory. He regularly reports on awful treatment of conservatives. You'd be surprised at how malicious some of these woke people are. People have been banned from conferences for being seen on Twitter wearing a MAGA hat. They have been fired for being lukewarm about woke shit. I don't blame you for being out of touch, since only Lunduke seems to be willing to report the stuff, and you refuse to watch. But closing yourself off to all evidence against your views and saying "No you guys are just imagining it!" is the actually dishonest take.
>Lunduke goes a step further by being outwardly hostile towards LGBTQ groups, and then pretends he's not doing that. This is why he's been considered so unbelievably insufferable in the tech world: his entire way of speaking is dishonest.
He is not speaking to them or about them in the way they demand, you mean. People have a right to simply refuse to engage in the constant celebration of certain lifestyles and worldviews. Going to work should not require being lectured about how awesome it is for people to engage in abnormal sexual behaviors, or celebration and advancement of people based on their race or sex alone. Liberals have no problem demanding such things on a constant basis, ostracizing and seeking to banish anyone who disagrees even 5%, and that is exactly why we need people like Lunduke to bravely issue scathing critiques of these practices. Besides that, his tech news is kind of interesting and unique, and he has a good sense of humor.
Juggling all the fragments inside the database, garbage collecting all the unused ones, and maintaining consistency are all quite challenging in this use case.
(2014) https://web.archive.org/web/20141027044008/www.sqlite.org/af...
I would change this: "Do what works, not what your database professor said you ought to do."
To this: "Unless you work for Google or FaceBook, just do what works, not what your database professor said you ought to do."
has anyone actually used the `content BLOB` pattern in a larger scale? Suppose if I have tens of thousands of small jpegs would it be better off in a .sqlite file?
Sqlite claims [1] this is a good use case
1. https://www.sqlite.org/fasterthanfs.html
Fwiw, autocad uses database format for its file data.
wouldn’t an XML database be easier?
You can't* index into XML. You have to read through the whole document until you get to the part you want.
*: without adding an index of your own, at which point it isn't really XML anymore, it's some kind of homebrew XML-based archive format.
This applies to any secondary index. The data themselves can only be ordered by a single criteria. It may be a meaningful one, but I guess in most cases it is merely the internal ID, which means you will have to scan the whole table too.
XML was meant for documents so in most cases the sequence of elements is given. But technically if I compose XML myself I can lay it out the way I want and thus can have it sorted too. This means it will be directly searchable without an index: read a bit at the middle, find an element name, see where we are, choose head or tail, repeat.
Blindly seeking into XML data is a risky, error-prone approach. It's not impossible to do, but doing it correctly is difficult - even if the tags you're looking for are unique, there are a lot of messy edge cases involving comments and <!CDATA> blocks.
You can store the content of a XML document in a database faithfully enough to reconstruct it exactly. Any system that can produce XML documents is a "XML database".
Does an embeddable XML database engine exist at a similar level of reliability?
They could resurrect xindice!
No.
Why?
LOL!
I'm a fan of both as a Linux user. Interesting thought experiment.