id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason
751195017,MDU6SXNzdWU3NTExOTUwMTc=,1111,Accessing a database's `.json` is slow for very large SQLite files,15178711,open,0,,,3,2020-11-26T00:27:27Z,2021-01-04T19:57:53Z,,CONTRIBUTOR,,"I have a SQLite DB that's pretty large, 23GB and something like 300 million rows. I expect that most queries I run on it will be slow, which is fine, but there are some things that Datasette does that makes working with the DB very slow. Specifically, when I access the `.json` metadata for a table (which I believe it comes from `datasette/views/database.py`, it takes 43 seconds for the request to come in:
```bash
$ time curl localhost:9999/out.json
{""database"": ""out"", ""size"": 24291454976, ""tables"": [{""name"": ""PageviewsHour"", ""columns"": [""file"", ""code"", ""page"", ""pageviews""], ""primary_keys"": [], ""count"": null, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [], ""outgoing"": [{""other_table"": ""PageviewsHourFiles"", ""column"": ""file"", ""other_column"": ""file_id""}]}, ""private"": false}, {""name"": ""PageviewsHourFiles"", ""columns"": [""file_id"", ""filename"", ""sha256"", ""size"", ""day"", ""hour""], ""primary_keys"": [""file_id""], ""count"": null, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [{""other_table"": ""PageviewsHour"", ""column"": ""file_id"", ""other_column"": ""file""}], ""outgoing"": []}, ""private"": false}, {""name"": ""sqlite_sequence"", ""columns"": [""name"", ""seq""], ""primary_keys"": [], ""count"": 1, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [], ""outgoing"": []}, ""private"": false}], ""hidden_count"": 0, ""views"": [], ""queries"": [], ""private"": false, ""allow_execute_sql"": true, ""query_ms"": 43340.23213386536}
real 0m43.417s
user 0m0.006s
sys 0m0.016s
```
I suspect this is because a `COUNT(*)` is happening under the hood, which, when I run it through sqlite directly, does take around the same time:
```bash
$ time sqlite3 out.db < <(echo ""select count(*) from PageviewsHour;"")
362794272
real 0m44.523s
user 0m2.497s
sys 0m6.703s
```
I'm using the `.json` request in the [Observable Datasette Client](https://observablehq.com/@asg017/datasette-client) to 1) verify that a link passed in is a reachable Datasette instance, and 2) a quick way to look at metadata for a db. A few different solutions I can think of:
1. Have some other endpoint, like `/-/datasette.json` that the Observable Datasette client can fetch from to verify that the passed in URL is a valid Datasette (doesnt solve the slow problem, feel free to split this issue into 2)
2. Have a way to turn off table counts when accessing a database's `.json` view, like `?no_count=1` or something
3. Maybe have a timeout on the `table_counts()` function if it takes too long. which is odd, because it seems like it already does that (I think?), I can debug a little more if that's the case
More than happy to debug further, or send a PR if you like one of the proposals above!",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1111/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1060631257,I_kwDOBm6k_c4_N_LZ,1528,"Add new `""sql_file""` key to Canned Queries in metadata?",15178711,open,0,,,3,2021-11-22T21:58:01Z,2022-06-10T03:23:08Z,,CONTRIBUTOR,,"Currently for canned queries, you have to inline SQL in your `metadata.yaml` like so:
```yaml
databases:
fixtures:
queries:
neighborhood_search:
sql: |-
select neighborhood, facet_cities.name, state
from facetable
join facet_cities on facetable.city_id = facet_cities.id
where neighborhood like '%' || :text || '%'
order by neighborhood
title: Search neighborhoods
```
This works fine, but for a few reasons, I usually have my canned queries already written in separate `.sql` files. I'd like to instead re-use those instead of re-writing it.
So, I'd like to see a new `""sql_file""` key that works like so:
`metadata.yaml`:
```yaml
databases:
fixtures:
queries:
neighborhood_search:
sql_file: neighborhood_search.sql
title: Search neighborhoods
```
`neighborhood_search.sql`:
```sql
select neighborhood, facet_cities.name, state
from facetable
join facet_cities on facetable.city_id = facet_cities.id
where neighborhood like '%' || :text || '%'
order by neighborhood
```
Both of these would work in the exact same way, where Datasette would instead open + include `neighborhood_search.sql` on startup.
A few reasons why I'd like to keep my canned queries SQL separate from metadata.yaml:
- Keeping SQL in standalone SQL files means syntax highlighting and other text editor integrations in my code
- Multiline strings in yaml, while functional, are a tad cumbersome and are hard to edit
- Works well with other tools (can pipe `.sql` files into the `sqlite3` CLI, or use with other SQLite clients easier)
- Typically my canned queries are quite long compared to everything else in my metadata.yaml, so I'd love to separate it where possible
Let me know if this is a feature you'd like to see, I can try to send up a PR if this sounds right!",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1528/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1339663518,I_kwDOBm6k_c5P2aSe,1784,"Include ""entrypoint"" option on `--load-extension`?",15178711,closed,0,,,2,2022-08-16T00:22:57Z,2022-08-23T18:34:31Z,2022-08-23T18:34:31Z,CONTRIBUTOR,,"## Problem
SQLite extensions have the option to define multiple ""entrypoints"" in each loadable extension. For example, the upcoming version of `sqlite-lines` will have 2 entrypoints: the default `sqlite3_lines_init` (which SQLite will automatically guess for) and `sqlite3_lines_noread_init`. The `sqlite3_lines_noread_init` version omits functions that read from the filesystem, which is necessary for security purposes when running untrusted SQL (which Datasette does).
(Similar multiple entrypoints will also be added for sqlite-http).
The `--load-extension` flag, however, doesn't give the option to specify a different entrypoint, so the default one is always used.
## Proposal
I want there to be a new command line option of the `--load-extension` flag to specify a custom entrypoint like so:
```
datasette my.db \
--load-extension ./lines0 sqlite3_lines0_noread_init
```
Then, under the hood, this line of code:
https://github.com/simonw/datasette/blob/7af67b54b7d9bca43e948510fc62f6db2b748fa8/datasette/app.py#L562
Would look something like this:
```python
conn.execute(""SELECT load_extension(?, ?)"", [extension, entrypoint])
```
One potential problem: For backward compatibility, I'm not sure if Click allows cli flags to have variable number of options (""arity""). So I guess it could also use a `:` delimiter like `--static`:
```
datasette my.db \
--load-extension ./lines0:sqlite3_lines0_noread_init
```
Or maybe even a new flag name?
```
datasette my.db \
--load-extension-entrypoint ./lines0 sqlite3_lines0_noread_init
```
Personally I prefer the `:` option... and maybe even `--load-extension` -> `--load`? Definitely out of scope for this issue tho
```
datasette my.db \
--load./lines0:sqlite3_lines0_noread_init
```",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1784/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
1344823170,PR_kwDOBm6k_c49e3_k,1789,Add new entrypoint option to `--load-extension`,15178711,closed,0,,,9,2022-08-19T19:27:47Z,2022-08-23T18:42:52Z,2022-08-23T18:34:30Z,CONTRIBUTOR,simonw/datasette/pulls/1789,"Closes #1784
The `--load-extension` flag can now accept an optional ""entrypoint"" value, to specify which entrypoint SQLite should load from the given extension.
```bash
# would load default entrypoint like before
datasette data.db --load-extension ext
# loads the extensions with the ""sqlite3_foo_init"" entrpoint
datasette data.db --load-extension ext:sqlite3_foo_init
# loads the extensions with the ""sqlite3_bar_init"" entrpoint
datasette data.db --load-extension ext:sqlite3_bar_init
```
For testing, I added a small SQLite extension in C at `tests/ext.c`. If compiled, then pytest will run the unit tests in `test_load_extensions.py`to verify that Datasette loads in extensions correctly (and loads the correct entrypoints). Compiling the extension requires a C compiler, I compiled it on my Mac with:
```
gcc ext.c -I path/to/sqlite -fPIC -shared -o ext.dylib
```
Where `path/to/sqlite` is a directory that contains the SQLite amalgamation header files.
Re documentation: I added a bit to the help text for `--load-extension` (which I believe should auto-add to documentation?), and the existing extension documentation is spatialite specific. Let me know if a new extensions documentation page would be helpful!",107914493,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1789/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1620515757,I_kwDOBm6k_c5glxut,2039,Subtle bug with `--load-extension` and `--static` flags with absolute Windows paths with`C:\`,15178711,open,0,,,0,2023-03-12T21:18:52Z,2023-03-12T21:18:52Z,,CONTRIBUTOR,,"From the Datasette discord: A user tried running the following command on windows:
```
datasette --load-extension=""C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll""
```
This failed with `""The specified module could not be found""`, because the entrypoint option introduced in #1789 splits the input differently. Instead of loading the extension found at `""C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll""`, it instead tried to load the extension at `""C""` with entrypoint `""\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll"".
This is hard because most absolute windows paths have a colon in them, like `C:\foo.txt` or `D:\bar.txt`. I'd image the `--static` flag is also vulnerable to this type of bug.
The ""solution"" is to use a relative path instead, but that doesn't feel that great. ",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2039/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1781530343,I_kwDOBm6k_c5qL_7n,2093,"Proposal: Combine settings, metadata, static, etc. into a single `datasette.toml` File",15178711,open,0,,,7,2023-06-29T21:18:23Z,2023-07-02T02:17:47Z,,CONTRIBUTOR,,"Very often I get tripped up when trying to configure my Datasette instances. For example: if I want to change the port my app listen too, do I do that with a CLI flag, a `--setting` flag, inside `metadata.json`, or an env var? If I want to up the time limit of SQL statements, is that under `metadata.json` or a setting? Where does my plugin configuration go?
Normally I need to look it up in Datasette docs, and I quickly find my answer, but the number of places where ""config"" goes it overwhelming.
- Flat CLI flags like `--port`, `--host`, `--cors`, etc.
- `--setting`, like `default_page_size`, `sql_time_limit_ms` etc
- Inside `metadata.json`, including plugin configuration
Typically my Datasette deploys are extremely long shell commands, with multiple `--setting` and other CLI flags.
## Proposal: Consolidate all ""config"" into `datasette.toml`
I propose that we add a new `datasette.toml` that combines ""settings"", ""metadata"", and other common CLI flags like `--port` and `--cors` into a single file. It would be similar to ""Cargo.toml"" in Rust projects, ""package.json"" in Node projects, and ""pyproject.toml"" in Python, etc.
A sample of what it could look like:
```toml
# ""top level"" configuration that are currently CLI flags on `datasette serve`
[config]
port = 8020
host = ""0.0.0.0""
cors = true
# replaces multiple `--setting` flags
[settings]
base_url = ""/app/datasette/""
default_allow_sql = true
sql_time_limit_ms = 3500
# replaces `metadata.json`.
# The contents of datasette-metadata.json could be defined in this file instead, but supporting separate files is nice (since those are easy to machine-generate)
[metadata]
include=""./datasette-metadata.json""
# plugin-specific
[plugins]
[plugins.datasette-auth-github]
client_id = {env = ""DATASETTE_AUTH_GITHUB_CLIENT_ID""}
client_secret = {env = ""GITHUB_CLIENT_SECRET""}
[plugins.datasette-cluster-map]
latitude_column = ""lat""
longitude_column = ""lon""
```
## Pros
- Instead of multiple files and CLI flags, everything could be in one tidy file
- Editing config in a separate file is easier than editing CLI flags, since you don't have to kill a process + edit a command every time
- New users will know ""just edit my `datasette.toml` instead of needing to learn metadata + settings + CLI flags
- Better dev experience for multiple environment. For example, could have `datasette -c datasette-dev.toml` for local dev environments (enables SQL, debug plugins, long timeouts, etc.), and a `datasette -c datasette-prod.toml` for ""production"" (lower timeouts, less plugins, monitoring plugins, etc.)
## Cons
- Yet another config-management system. Now Datasette users will need to know about metadata, settings, CLI flags, _and_ `datasette.toml`. However with enough documentation + announcements + examples, I think we can get ahead of it.
- If toml is chosen, would need to add a toml parser for Python version <3.11
- Multiple sources of config require priority. For example: Would `--setting default_allow_sql off` override the value inside `[settings]`? What about `--port`?
## Other Notes
### Toml
I chose toml over json because toml supports comments. I chose toml over yaml because Python 3.11 has builtin support for it. I also find toml easier to work with since it doesn't have the odd ""gotchas"" that YAML has (""ex `3.10` resolving to `3.1`, Norway `NO` resolving to `false`, etc.). It also mimics `pyproject.toml` which is nice. Happy to change my mind about this however
### Plugin config will be difficult
Plugin config is currently in `metadata.json` in two places:
1. Top level, under `""plugins.[plugin-name]""`. This fits well into `datasette.toml` as `[plugins.plugin-name]`
2. Table level, under `""databases.[db-name].tables.[table-name].plugins.[plugin-name]`. This doesn't fit that well into `datasette.toml`, unless it's nested under `[metadata]`?
### Extensions, static, one-off plugins?
We could also include equivalents of `--plugins-dir`, `--static`, and `--load-extension` into `datasette.toml`, but I'd imagine there's a few security concerns there to think through.
### Explicitly list with plugins to use?
I believe Datasette by default will load all install plugins on startup, but maybe `datasette.toml` can specify a list of plugins to use? For example, a dev version of `datasette.toml` can specify `datasette-pretty-traces`, but the prod version can leave it out",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2093/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1783304750,I_kwDOBm6k_c5qSxIu,2094,JS Plugin Hooks for the Code Editor,15178711,open,0,,,0,2023-07-01T00:51:57Z,2023-07-01T00:51:57Z,,CONTRIBUTOR,,"When #2052 merges, I'd like to add support to add extensions/functions to the Datasette code editor.
I'd eventually like to build a JS plugin for [`sqlite-docs`](https://github.com/asg017/sqlite-docs), to add things like:
- Inline documentation for tables/columns on hover
- Inline docs for custom functions that are loaded in
- More detailed autocomplete for tables/columns/functions
I did some hacking to see what this would look like, see here:
There can be a new hook that allows JS plugins to add new ""extension"" in the CodeMirror editorview here:
https://github.com/simonw/datasette/blob/8cd60fd1d899952f1153460469b3175465f33f80/datasette/static/cm-editor-6.0.1.js#L25
Will need some more planning. For example, the Codemirror bundle in Datasette has functions that we could re-export for plugins to use (so we don't load 2 version of `""@codemirror/autocomplete""`, for example. ",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2094/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1801394744,I_kwDOCGYnMM5rXxo4,567,Plugin system,15178711,open,0,,,2,2023-07-12T17:02:14Z,2023-07-17T21:42:37Z,,NONE,,"I'd like there to be a plugin system for sqlite-utils, similar to the datasette/llm plugins. I'd like to make plugins that would do things like:
- Register SQLite extensions for more SQL functions + virtual tables
- Register new subcommands
- Different input file formats for `sqlite-utils memory`
- Different output file formats (in addition to `--csv` `--tsv` `--nl` etc.
A few real-world use-cases of plugins I'd like to see in sqlite-utils:
- Register many of my sqlite extensions in sqlite-utils (`sqlite-http`, `sqlite-lines`, `sqlite-regex`, etc.)
- New subcommands to work with `sqlite-vss` vector tables
- Input/ouput Parquet/Avro/Arrow IPC files with `sqlite-arrow`",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/567/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,