issues
8 rows where user = 15178711
This data as json, CSV (advanced)
Suggested facets: state, comments, author_association, repo, type, created_at (date), updated_at (date), closed_at (date)
id ▼ | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | pull_request | body | repo | type | active_lock_reason | performed_via_github_app | reactions | draft | state_reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
751195017 | MDU6SXNzdWU3NTExOTUwMTc= | 1111 | Accessing a database's `.json` is slow for very large SQLite files | asg017 15178711 | open | 0 | 3 | 2020-11-26T00:27:27Z | 2021-01-04T19:57:53Z | CONTRIBUTOR | I have a SQLite DB that's pretty large, 23GB and something like 300 million rows. I expect that most queries I run on it will be slow, which is fine, but there are some things that Datasette does that makes working with the DB very slow. Specifically, when I access the `.json` metadata for a table (which I believe it comes from `datasette/views/database.py`, it takes 43 seconds for the request to come in: ```bash $ time curl localhost:9999/out.json {"database": "out", "size": 24291454976, "tables": [{"name": "PageviewsHour", "columns": ["file", "code", "page", "pageviews"], "primary_keys": [], "count": null, "hidden": false, "fts_table": null, "foreign_keys": {"incoming": [], "outgoing": [{"other_table": "PageviewsHourFiles", "column": "file", "other_column": "file_id"}]}, "private": false}, {"name": "PageviewsHourFiles", "columns": ["file_id", "filename", "sha256", "size", "day", "hour"], "primary_keys": ["file_id"], "count": null, "hidden": false, "fts_table": null, "foreign_keys": {"incoming": [{"other_table": "PageviewsHour", "column": "file_id", "other_column": "file"}], "outgoing": []}, "private": false}, {"name": "sqlite_sequence", "columns": ["name", "seq"], "primary_keys": [], "count": 1, "hidden": false, "fts_table": null, "foreign_keys": {"incoming": [], "outgoing": []}, "private": false}], "hidden_count": 0, "views": [], "queries": [], "private": false, "allow_execute_sql": true, "query_ms": 43340.23213386536} real 0m43.417s user 0m0.006s sys 0m0.016s ``` I suspect this is because a `COUNT(*)` is happening under the hood, which, when I run it through sqlite directly, does take around the same time: ```bash $ time sqlite3 out.db < <(echo "select count(*) from PageviewsHour;") 362794272 real 0m44.523s user 0m2.497s sys 0m6.703s ``` I'm using the `.json` request in the [Observable Datasette Client](https://observablehq.com/@asg017/datasette-client) to 1) verify that a link passed in is a reachable Datasette instance, and 2) a quick way to look at metadata for a db. A few differe… | datasette 107914493 | issue | {"url": "https://api.github.com/repos/simonw/datasette/issues/1111/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | ||||||||
1060631257 | I_kwDOBm6k_c4_N_LZ | 1528 | Add new `"sql_file"` key to Canned Queries in metadata? | asg017 15178711 | open | 0 | 3 | 2021-11-22T21:58:01Z | 2022-06-10T03:23:08Z | CONTRIBUTOR | Currently for canned queries, you have to inline SQL in your `metadata.yaml` like so: ```yaml databases: fixtures: queries: neighborhood_search: sql: |- select neighborhood, facet_cities.name, state from facetable join facet_cities on facetable.city_id = facet_cities.id where neighborhood like '%' || :text || '%' order by neighborhood title: Search neighborhoods ``` This works fine, but for a few reasons, I usually have my canned queries already written in separate `.sql` files. I'd like to instead re-use those instead of re-writing it. So, I'd like to see a new `"sql_file"` key that works like so: `metadata.yaml`: ```yaml databases: fixtures: queries: neighborhood_search: sql_file: neighborhood_search.sql title: Search neighborhoods ``` `neighborhood_search.sql`: ```sql select neighborhood, facet_cities.name, state from facetable join facet_cities on facetable.city_id = facet_cities.id where neighborhood like '%' || :text || '%' order by neighborhood ``` Both of these would work in the exact same way, where Datasette would instead open + include `neighborhood_search.sql` on startup. A few reasons why I'd like to keep my canned queries SQL separate from metadata.yaml: - Keeping SQL in standalone SQL files means syntax highlighting and other text editor integrations in my code - Multiline strings in yaml, while functional, are a tad cumbersome and are hard to edit - Works well with other tools (can pipe `.sql` files into the `sqlite3` CLI, or use with other SQLite clients easier) - Typically my canned queries are quite long compared to everything else in my metadata.yaml, so I'd love to separate it where possible Let me know if this is a feature you'd like to see, I can try to send up a PR if this sounds right! | datasette 107914493 | issue | {"url": "https://api.github.com/repos/simonw/datasette/issues/1528/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | ||||||||
1339663518 | I_kwDOBm6k_c5P2aSe | 1784 | Include "entrypoint" option on `--load-extension`? | asg017 15178711 | closed | 0 | 2 | 2022-08-16T00:22:57Z | 2022-08-23T18:34:31Z | 2022-08-23T18:34:31Z | CONTRIBUTOR | ## Problem SQLite extensions have the option to define multiple "entrypoints" in each loadable extension. For example, the upcoming version of `sqlite-lines` will have 2 entrypoints: the default `sqlite3_lines_init` (which SQLite will automatically guess for) and `sqlite3_lines_noread_init`. The `sqlite3_lines_noread_init` version omits functions that read from the filesystem, which is necessary for security purposes when running untrusted SQL (which Datasette does). (Similar multiple entrypoints will also be added for sqlite-http). The `--load-extension` flag, however, doesn't give the option to specify a different entrypoint, so the default one is always used. ## Proposal I want there to be a new command line option of the `--load-extension` flag to specify a custom entrypoint like so: ``` datasette my.db \ --load-extension ./lines0 sqlite3_lines0_noread_init ``` Then, under the hood, this line of code: https://github.com/simonw/datasette/blob/7af67b54b7d9bca43e948510fc62f6db2b748fa8/datasette/app.py#L562 Would look something like this: ```python conn.execute("SELECT load_extension(?, ?)", [extension, entrypoint]) ``` One potential problem: For backward compatibility, I'm not sure if Click allows cli flags to have variable number of options ("arity"). So I guess it could also use a `:` delimiter like `--static`: ``` datasette my.db \ --load-extension ./lines0:sqlite3_lines0_noread_init ``` Or maybe even a new flag name? ``` datasette my.db \ --load-extension-entrypoint ./lines0 sqlite3_lines0_noread_init ``` Personally I prefer the `:` option... and maybe even `--load-extension` -> `--load`? Definitely out of scope for this issue tho ``` datasette my.db \ --load./lines0:sqlite3_lines0_noread_init ``` | datasette 107914493 | issue | {"url": "https://api.github.com/repos/simonw/datasette/issues/1784/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | completed | ||||||
1344823170 | PR_kwDOBm6k_c49e3_k | 1789 | Add new entrypoint option to `--load-extension` | asg017 15178711 | closed | 0 | 9 | 2022-08-19T19:27:47Z | 2022-08-23T18:42:52Z | 2022-08-23T18:34:30Z | CONTRIBUTOR | simonw/datasette/pulls/1789 | Closes #1784 The `--load-extension` flag can now accept an optional "entrypoint" value, to specify which entrypoint SQLite should load from the given extension. ```bash # would load default entrypoint like before datasette data.db --load-extension ext # loads the extensions with the "sqlite3_foo_init" entrpoint datasette data.db --load-extension ext:sqlite3_foo_init # loads the extensions with the "sqlite3_bar_init" entrpoint datasette data.db --load-extension ext:sqlite3_bar_init ``` For testing, I added a small SQLite extension in C at `tests/ext.c`. If compiled, then pytest will run the unit tests in `test_load_extensions.py`to verify that Datasette loads in extensions correctly (and loads the correct entrypoints). Compiling the extension requires a C compiler, I compiled it on my Mac with: ``` gcc ext.c -I path/to/sqlite -fPIC -shared -o ext.dylib ``` Where `path/to/sqlite` is a directory that contains the SQLite amalgamation header files. Re documentation: I added a bit to the help text for `--load-extension` (which I believe should auto-add to documentation?), and the existing extension documentation is spatialite specific. Let me know if a new extensions documentation page would be helpful! | datasette 107914493 | pull | {"url": "https://api.github.com/repos/simonw/datasette/issues/1789/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | 0 | |||||
1620515757 | I_kwDOBm6k_c5glxut | 2039 | Subtle bug with `--load-extension` and `--static` flags with absolute Windows paths with`C:\` | asg017 15178711 | open | 0 | 0 | 2023-03-12T21:18:52Z | 2023-03-12T21:18:52Z | CONTRIBUTOR | From the Datasette discord: A user tried running the following command on windows: ``` datasette --load-extension="C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll" ``` This failed with `"The specified module could not be found"`, because the entrypoint option introduced in #1789 splits the input differently. Instead of loading the extension found at `"C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll"`, it instead tried to load the extension at `"C"` with entrypoint `"\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll". This is hard because most absolute windows paths have a colon in them, like `C:\foo.txt` or `D:\bar.txt`. I'd image the `--static` flag is also vulnerable to this type of bug. The "solution" is to use a relative path instead, but that doesn't feel that great. | datasette 107914493 | issue | {"url": "https://api.github.com/repos/simonw/datasette/issues/2039/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | ||||||||
1781530343 | I_kwDOBm6k_c5qL_7n | 2093 | Proposal: Combine settings, metadata, static, etc. into a single `datasette.toml` File | asg017 15178711 | open | 0 | 7 | 2023-06-29T21:18:23Z | 2023-07-02T02:17:47Z | CONTRIBUTOR | Very often I get tripped up when trying to configure my Datasette instances. For example: if I want to change the port my app listen too, do I do that with a CLI flag, a `--setting` flag, inside `metadata.json`, or an env var? If I want to up the time limit of SQL statements, is that under `metadata.json` or a setting? Where does my plugin configuration go? Normally I need to look it up in Datasette docs, and I quickly find my answer, but the number of places where "config" goes it overwhelming. - Flat CLI flags like `--port`, `--host`, `--cors`, etc. - `--setting`, like `default_page_size`, `sql_time_limit_ms` etc - Inside `metadata.json`, including plugin configuration Typically my Datasette deploys are extremely long shell commands, with multiple `--setting` and other CLI flags. ## Proposal: Consolidate all "config" into `datasette.toml` I propose that we add a new `datasette.toml` that combines "settings", "metadata", and other common CLI flags like `--port` and `--cors` into a single file. It would be similar to "Cargo.toml" in Rust projects, "package.json" in Node projects, and "pyproject.toml" in Python, etc. A sample of what it could look like: ```toml # "top level" configuration that are currently CLI flags on `datasette serve` [config] port = 8020 host = "0.0.0.0" cors = true # replaces multiple `--setting` flags [settings] base_url = "/app/datasette/" default_allow_sql = true sql_time_limit_ms = 3500 # replaces `metadata.json`. # The contents of datasette-metadata.json could be defined in this file instead, but supporting separate files is nice (since those are easy to machine-generate) [metadata] include="./datasette-metadata.json" # plugin-specific [plugins] [plugins.datasette-auth-github] client_id = {env = "DATASETTE_AUTH_GITHUB_CLIENT_ID"} client_secret = {env = "GITHUB_CLIENT_SECRET"} [plugins.datasette-cluster-map] latitude_column = "lat" longitude_column = "lon" ``` ## Pros - Instead of multiple files and CLI flags, everything could b… | datasette 107914493 | issue | {"url": "https://api.github.com/repos/simonw/datasette/issues/2093/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | ||||||||
1783304750 | I_kwDOBm6k_c5qSxIu | 2094 | JS Plugin Hooks for the Code Editor | asg017 15178711 | open | 0 | 0 | 2023-07-01T00:51:57Z | 2023-07-01T00:51:57Z | CONTRIBUTOR | When #2052 merges, I'd like to add support to add extensions/functions to the Datasette code editor. I'd eventually like to build a JS plugin for [`sqlite-docs`](https://github.com/asg017/sqlite-docs), to add things like: - Inline documentation for tables/columns on hover - Inline docs for custom functions that are loaded in - More detailed autocomplete for tables/columns/functions I did some hacking to see what this would look like, see here: <img width="1223" alt="image" src="https://github.com/simonw/datasette/assets/15178711/64f95cbc-1492-4365-896f-b88c6d08a649"> <img width="1223" alt="image" src="https://github.com/simonw/datasette/assets/15178711/73e602ba-5f45-417a-997e-5aea1738527a"> There can be a new hook that allows JS plugins to add new "extension" in the CodeMirror editorview here: https://github.com/simonw/datasette/blob/8cd60fd1d899952f1153460469b3175465f33f80/datasette/static/cm-editor-6.0.1.js#L25 Will need some more planning. For example, the Codemirror bundle in Datasette has functions that we could re-export for plugins to use (so we don't load 2 version of `"@codemirror/autocomplete"`, for example. | datasette 107914493 | issue | {"url": "https://api.github.com/repos/simonw/datasette/issues/2094/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} | ||||||||
1801394744 | I_kwDOCGYnMM5rXxo4 | 567 | Plugin system | asg017 15178711 | open | 0 | 2 | 2023-07-12T17:02:14Z | 2023-07-17T21:42:37Z | NONE | I'd like there to be a plugin system for sqlite-utils, similar to the datasette/llm plugins. I'd like to make plugins that would do things like: - Register SQLite extensions for more SQL functions + virtual tables - Register new subcommands - Different input file formats for `sqlite-utils memory` - Different output file formats (in addition to `--csv` `--tsv` `--nl` etc. A few real-world use-cases of plugins I'd like to see in sqlite-utils: - Register many of my sqlite extensions in sqlite-utils (`sqlite-http`, `sqlite-lines`, `sqlite-regex`, etc.) - New subcommands to work with `sqlite-vss` vector tables - Input/ouput Parquet/Avro/Arrow IPC files with `sqlite-arrow` | sqlite-utils 140912432 | issue | {"url": "https://api.github.com/repos/simonw/sqlite-utils/issues/567/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0} |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [pull_request] TEXT, [body] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT , [active_lock_reason] TEXT, [performed_via_github_app] TEXT, [reactions] TEXT, [draft] INTEGER, [state_reason] TEXT); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);