id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason
751195017,MDU6SXNzdWU3NTExOTUwMTc=,1111,Accessing a database's `.json` is slow for very large SQLite files,15178711,open,0,,,3,2020-11-26T00:27:27Z,2021-01-04T19:57:53Z,,CONTRIBUTOR,,"I have a SQLite DB that's pretty large, 23GB and something like 300 million rows. I expect that most queries I run on it will be slow, which is fine, but there are some things that Datasette does that makes working with the DB very slow. Specifically, when I access the `.json` metadata for a table (which I believe it comes from `datasette/views/database.py`, it takes 43 seconds for the request to come in:

```bash
$ time curl localhost:9999/out.json
{""database"": ""out"", ""size"": 24291454976, ""tables"": [{""name"": ""PageviewsHour"", ""columns"": [""file"", ""code"", ""page"", ""pageviews""], ""primary_keys"": [], ""count"": null, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [], ""outgoing"": [{""other_table"": ""PageviewsHourFiles"", ""column"": ""file"", ""other_column"": ""file_id""}]}, ""private"": false}, {""name"": ""PageviewsHourFiles"", ""columns"": [""file_id"", ""filename"", ""sha256"", ""size"", ""day"", ""hour""], ""primary_keys"": [""file_id""], ""count"": null, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [{""other_table"": ""PageviewsHour"", ""column"": ""file_id"", ""other_column"": ""file""}], ""outgoing"": []}, ""private"": false}, {""name"": ""sqlite_sequence"", ""columns"": [""name"", ""seq""], ""primary_keys"": [], ""count"": 1, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [], ""outgoing"": []}, ""private"": false}], ""hidden_count"": 0, ""views"": [], ""queries"": [], ""private"": false, ""allow_execute_sql"": true, ""query_ms"": 43340.23213386536}
real	0m43.417s
user	0m0.006s
sys	0m0.016s
```

I suspect this is because a `COUNT(*)` is happening under the hood, which, when I run it through sqlite directly, does take around the same time:

```bash
$ time sqlite3 out.db < <(echo ""select count(*) from PageviewsHour;"")
362794272

real	0m44.523s
user	0m2.497s
sys	0m6.703s
```

I'm using the `.json` request in the [Observable Datasette Client](https://observablehq.com/@asg017/datasette-client) to 1) verify that a link passed in is a reachable Datasette instance, and 2) a quick way to look at metadata for a db. A few different solutions I can think of:

1. Have some other endpoint, like `/-/datasette.json` that the Observable Datasette client can fetch from to verify that the passed in URL is a valid Datasette (doesnt solve the slow problem, feel free to split this issue into 2)
2. Have a way to turn off table counts when accessing a database's `.json` view, like `?no_count=1` or something
3. Maybe have a timeout on the `table_counts()` function if it takes too long. which is odd, because it seems like it already does that (I think?), I can debug a little more if that's the case

More than happy to debug further, or send a PR if you like one of the proposals above!",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1111/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1060631257,I_kwDOBm6k_c4_N_LZ,1528,"Add new `""sql_file""` key to Canned Queries in metadata?",15178711,open,0,,,3,2021-11-22T21:58:01Z,2022-06-10T03:23:08Z,,CONTRIBUTOR,,"Currently for canned queries, you have to inline SQL in your `metadata.yaml` like so:

```yaml
databases:
  fixtures:
    queries:
      neighborhood_search:
        sql: |-
          select neighborhood, facet_cities.name, state
          from facetable
            join facet_cities on facetable.city_id = facet_cities.id
          where neighborhood like '%' || :text || '%'
          order by neighborhood
        title: Search neighborhoods
```

This works fine, but for a few reasons, I usually have my canned queries already written in separate `.sql` files. I'd like to instead re-use those instead of re-writing it. 

So, I'd like to see a new `""sql_file""` key that works like so:

`metadata.yaml`:

```yaml
databases:
  fixtures:
    queries:
      neighborhood_search:
        sql_file: neighborhood_search.sql
        title: Search neighborhoods
```
`neighborhood_search.sql`:
```sql
select neighborhood, facet_cities.name, state
from facetable
join facet_cities on facetable.city_id = facet_cities.id
where neighborhood like '%' || :text || '%'
order by neighborhood
```

Both of these would work in the exact same way, where Datasette would instead open + include `neighborhood_search.sql` on startup. 


A few reasons why I'd like to keep my canned queries SQL separate from metadata.yaml:

- Keeping SQL in standalone SQL files means syntax highlighting and other text editor integrations in my code
- Multiline strings in yaml, while functional, are a tad cumbersome and are hard to edit
- Works well with other tools (can pipe `.sql` files into the `sqlite3` CLI, or use with other SQLite clients easier)
- Typically my canned queries are quite long compared to everything else in my metadata.yaml, so I'd love to separate it where possible

Let me know if this is a feature you'd like to see, I can try to send up a PR if this sounds right!",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1528/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1339663518,I_kwDOBm6k_c5P2aSe,1784,"Include ""entrypoint"" option on `--load-extension`?",15178711,closed,0,,,2,2022-08-16T00:22:57Z,2022-08-23T18:34:31Z,2022-08-23T18:34:31Z,CONTRIBUTOR,,"## Problem 

SQLite extensions have the option to define multiple ""entrypoints"" in each loadable extension. For example, the upcoming version of `sqlite-lines` will have 2 entrypoints: the default `sqlite3_lines_init` (which SQLite will automatically guess for) and `sqlite3_lines_noread_init`. The `sqlite3_lines_noread_init` version omits functions that read from the filesystem, which is necessary for security purposes when running untrusted SQL (which Datasette does).

(Similar multiple entrypoints will also be added for sqlite-http).



The `--load-extension` flag, however, doesn't give the option to specify a different entrypoint, so the default one is always used. 

## Proposal

I want there to be a new command line option of the `--load-extension` flag to specify a custom entrypoint like so:
```
datasette my.db \
  --load-extension ./lines0 sqlite3_lines0_noread_init
```

Then, under the hood, this line of code:

https://github.com/simonw/datasette/blob/7af67b54b7d9bca43e948510fc62f6db2b748fa8/datasette/app.py#L562

Would look something like this:

```python
 conn.execute(""SELECT load_extension(?, ?)"", [extension, entrypoint]) 
```

One potential problem: For backward compatibility, I'm not sure if Click allows cli flags to have variable number of options (""arity""). So I guess it could also use a `:` delimiter like `--static`:

```
datasette my.db \
  --load-extension ./lines0:sqlite3_lines0_noread_init
```

Or maybe even a new flag name?

```
datasette my.db \
  --load-extension-entrypoint ./lines0 sqlite3_lines0_noread_init
```


Personally I prefer the `:` option... and maybe even `--load-extension` -> `--load`? Definitely out of scope for this issue tho

```
datasette my.db \
  --load./lines0:sqlite3_lines0_noread_init
```",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1784/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
1344823170,PR_kwDOBm6k_c49e3_k,1789,Add new entrypoint option to `--load-extension`,15178711,closed,0,,,9,2022-08-19T19:27:47Z,2022-08-23T18:42:52Z,2022-08-23T18:34:30Z,CONTRIBUTOR,simonw/datasette/pulls/1789,"Closes #1784 

The `--load-extension` flag can now accept an optional ""entrypoint"" value, to specify which entrypoint SQLite should load from the given extension. 

```bash
# would load default entrypoint like before
datasette data.db --load-extension ext

# loads the extensions with the ""sqlite3_foo_init"" entrpoint
datasette data.db --load-extension ext:sqlite3_foo_init

# loads the extensions with the ""sqlite3_bar_init"" entrpoint
datasette data.db --load-extension ext:sqlite3_bar_init
```

For testing, I added a small SQLite extension in C at `tests/ext.c`. If compiled, then pytest will run the unit tests in `test_load_extensions.py`to verify that Datasette loads in extensions correctly (and loads the correct entrypoints). Compiling the extension requires a C compiler, I compiled it on my Mac with:

```
gcc ext.c -I path/to/sqlite -fPIC -shared -o ext.dylib
```

Where `path/to/sqlite` is a directory that contains the SQLite amalgamation header files.

Re documentation: I added a bit to the help text for `--load-extension` (which I believe should auto-add to documentation?), and the existing extension documentation is spatialite specific. Let me know if a new extensions documentation page would be helpful!",107914493,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1789/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1620515757,I_kwDOBm6k_c5glxut,2039,Subtle bug with `--load-extension` and `--static` flags with absolute Windows paths with`C:\`,15178711,open,0,,,0,2023-03-12T21:18:52Z,2023-03-12T21:18:52Z,,CONTRIBUTOR,,"From the Datasette discord: A user tried running the following command on windows:

```
datasette  --load-extension=""C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll""
```
This failed with `""The specified module could not be found""`, because the entrypoint option introduced in #1789 splits the input differently. Instead of loading the extension found at `""C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll""`, it instead tried to load the extension at `""C""` with entrypoint `""\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll"". 

This is hard because most absolute windows paths have a colon in them, like `C:\foo.txt` or `D:\bar.txt`.  I'd image the `--static` flag is also vulnerable to this type of bug.


The ""solution"" is to use a relative path instead, but that doesn't feel that great. ",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2039/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1781530343,I_kwDOBm6k_c5qL_7n,2093,"Proposal: Combine settings, metadata, static, etc. into a single `datasette.toml` File",15178711,open,0,,,7,2023-06-29T21:18:23Z,2023-07-02T02:17:47Z,,CONTRIBUTOR,,"Very often I get tripped up when trying to configure my Datasette instances. For example: if I want to change the port my app listen too, do I do that with a CLI flag, a `--setting` flag, inside `metadata.json`, or an env var? If I want to up the time limit of SQL statements, is that under `metadata.json` or a setting? Where does my plugin configuration go?

Normally I need to look it up in Datasette docs, and I quickly find my answer, but the number of places where ""config"" goes it overwhelming.

- Flat CLI flags like `--port`, `--host`, `--cors`, etc.
- `--setting`, like `default_page_size`, `sql_time_limit_ms` etc
- Inside `metadata.json`, including plugin configuration

Typically my Datasette deploys are extremely long shell commands, with multiple `--setting` and other CLI flags.

## Proposal: Consolidate all ""config"" into `datasette.toml`

I propose that we add a new `datasette.toml` that combines ""settings"", ""metadata"", and other common CLI flags like `--port` and `--cors` into a single file. It would be similar to ""Cargo.toml"" in Rust projects, ""package.json"" in Node projects, and ""pyproject.toml"" in Python, etc.

A sample of what it could look like:

```toml
# ""top level"" configuration that are currently CLI flags on `datasette serve`
[config]
port = 8020
host = ""0.0.0.0""
cors = true

# replaces multiple `--setting` flags
[settings]
base_url = ""/app/datasette/""
default_allow_sql = true
sql_time_limit_ms = 3500

# replaces `metadata.json`.
# The contents of datasette-metadata.json could be defined in this file instead, but supporting separate files is nice (since those are easy to machine-generate)
[metadata]
include=""./datasette-metadata.json""

# plugin-specific 
[plugins]
[plugins.datasette-auth-github]
client_id = {env = ""DATASETTE_AUTH_GITHUB_CLIENT_ID""}
client_secret = {env = ""GITHUB_CLIENT_SECRET""}

[plugins.datasette-cluster-map]

latitude_column = ""lat""
longitude_column = ""lon""
```

## Pros
- Instead of multiple files and CLI flags, everything could be in one tidy file
- Editing config in a separate file is easier than editing CLI flags, since you don't have to kill a process + edit a command every time
- New users will know ""just edit my `datasette.toml` instead of needing to learn metadata + settings + CLI flags
- Better dev experience for multiple environment. For example, could have `datasette -c datasette-dev.toml` for local dev environments (enables SQL, debug plugins, long timeouts, etc.), and a `datasette -c datasette-prod.toml` for ""production"" (lower timeouts, less plugins, monitoring plugins, etc.)

## Cons
- Yet another config-management system. Now Datasette users will need to know about metadata, settings, CLI flags, _and_ `datasette.toml`. However with enough documentation + announcements + examples, I think we can get ahead of it.
- If toml is chosen, would need to add a toml parser for Python version <3.11
- Multiple sources of config require priority. For example: Would `--setting default_allow_sql off` override the value inside `[settings]`? What about `--port`? 

## Other Notes

### Toml

I chose toml over json because toml supports comments. I chose toml over yaml because Python 3.11 has builtin support for it. I also find toml easier to work with since it doesn't have the odd ""gotchas"" that YAML has (""ex `3.10` resolving to `3.1`, Norway `NO` resolving to `false`, etc.). It also mimics `pyproject.toml` which is nice. Happy to change my mind about this however


### Plugin config will be difficult

Plugin config is currently in `metadata.json` in two places:

1. Top level, under `""plugins.[plugin-name]""`. This fits well into `datasette.toml` as `[plugins.plugin-name]`
2. Table level, under `""databases.[db-name].tables.[table-name].plugins.[plugin-name]`. This doesn't fit that well into `datasette.toml`, unless it's nested under `[metadata]`?

### Extensions, static, one-off plugins?

We could also include equivalents of `--plugins-dir`, `--static`, and `--load-extension` into `datasette.toml`, but I'd imagine there's a few security concerns there to think through. 


### Explicitly list with plugins to use?

I believe Datasette by default will load all install plugins on startup, but maybe `datasette.toml` can specify a list of plugins to use? For example, a dev version of `datasette.toml` can specify `datasette-pretty-traces`, but the prod version can leave it out",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2093/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1783304750,I_kwDOBm6k_c5qSxIu,2094,JS Plugin Hooks for the Code Editor,15178711,open,0,,,0,2023-07-01T00:51:57Z,2023-07-01T00:51:57Z,,CONTRIBUTOR,,"When #2052 merges, I'd like to add support to add extensions/functions to the Datasette code editor. 

I'd eventually like to build a JS plugin for [`sqlite-docs`](https://github.com/asg017/sqlite-docs), to add things like:

- Inline documentation for tables/columns on hover
- Inline docs for custom functions that are loaded in
- More detailed autocomplete for tables/columns/functions

I did some hacking to see what this would look like, see here:

<img width=""1223"" alt=""image"" src=""https://github.com/simonw/datasette/assets/15178711/64f95cbc-1492-4365-896f-b88c6d08a649"">
<img width=""1223"" alt=""image"" src=""https://github.com/simonw/datasette/assets/15178711/73e602ba-5f45-417a-997e-5aea1738527a"">

There can be a new hook that allows JS plugins to add new ""extension"" in the CodeMirror editorview here:

https://github.com/simonw/datasette/blob/8cd60fd1d899952f1153460469b3175465f33f80/datasette/static/cm-editor-6.0.1.js#L25

Will need some more planning. For example, the Codemirror bundle in Datasette has functions that we could re-export for plugins to use (so we don't load 2 version of `""@codemirror/autocomplete""`, for example. ",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2094/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1801394744,I_kwDOCGYnMM5rXxo4,567,Plugin system,15178711,open,0,,,2,2023-07-12T17:02:14Z,2023-07-17T21:42:37Z,,NONE,,"I'd like there to be a plugin system for sqlite-utils, similar to the datasette/llm plugins. I'd like to make plugins that would do things like:

- Register SQLite extensions for more SQL functions + virtual tables
- Register new subcommands
- Different input file formats for `sqlite-utils memory`
- Different output file formats (in addition to `--csv` `--tsv` `--nl` etc.

A few real-world use-cases of plugins I'd like to see in sqlite-utils:

- Register many of my sqlite extensions in sqlite-utils (`sqlite-http`, `sqlite-lines`, `sqlite-regex`, etc.)
- New subcommands to work with `sqlite-vss` vector tables
- Input/ouput Parquet/Avro/Arrow IPC files with `sqlite-arrow`",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/567/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,