{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-397842667", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 397842667, "node_id": "MDEyOklzc3VlQ29tbWVudDM5Nzg0MjY2Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-16T22:38:15Z", "updated_at": "2018-06-18T05:55:11Z", "author_association": "OWNER", "body": "Still todo:\r\n\r\n- [x] Streaming version\r\n- [ ] Tidy up the \"This data as ...\" UI\r\n- [x] Default .csv (and .json) links to use `?_labels=on` (only if at least one foreign key detected)\r\n\r\n\r\n\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-397915258", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 397915258, "node_id": "MDEyOklzc3VlQ29tbWVudDM5NzkxNTI1OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T00:01:05Z", "updated_at": "2018-06-18T00:01:05Z", "author_association": "OWNER", "body": "Someone malicious could use a UNION to generate an unpleasantly large CSV response. I'll add another config setting which limits the response size to 100MB but can be turned off by setting it to 0.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-397915403", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 397915403, "node_id": "MDEyOklzc3VlQ29tbWVudDM5NzkxNTQwMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T00:03:17Z", "updated_at": "2018-06-18T00:14:37Z", "author_association": "OWNER", "body": "Since CSV streaming export doesn't work for custom SQL queries (since they don't support `_next=` pagination) there's no need to provide a option that disables streams just for custom SQL.\r\n\r\nRelated: the UI should not show the option to download everything on custom SQL pages.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-397916091", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 397916091, "node_id": "MDEyOklzc3VlQ29tbWVudDM5NzkxNjA5MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T00:13:43Z", "updated_at": "2018-06-18T00:15:50Z", "author_association": "OWNER", "body": "I was also worried about the performance of pagination over custom `_sort` orders or views which use offset pagination - but Datasette's SQL time limits should prevent those from getting out of hand. This does mean that a streaming CSV file may be truncated with an error - if this happens we should ensure the error is written out as the last line of the CSV so anyone who tried to import it gets a relevant error message informing them that the export did not complete.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-397916321", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 397916321, "node_id": "MDEyOklzc3VlQ29tbWVudDM5NzkxNjMyMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T00:17:44Z", "updated_at": "2018-06-18T00:18:05Z", "author_association": "OWNER", "body": "The export UI could be a GET form controlling various parameters. This would discourage crawlers from hitting the export links and would also allow us to express the full range of export options.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-397918264", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 397918264, "node_id": "MDEyOklzc3VlQ29tbWVudDM5NzkxODI2NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T00:49:35Z", "updated_at": "2018-06-18T00:49:35Z", "author_association": "OWNER", "body": "Simpler design: the top of the page will link to basic .json and .csv and \"advanced\" - which will fragment link to an advanced export format the bottom of the page.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-397923253", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 397923253, "node_id": "MDEyOklzc3VlQ29tbWVudDM5NzkyMzI1Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T01:49:52Z", "updated_at": "2018-06-18T03:02:28Z", "author_association": "OWNER", "body": "Ideally the downloadable filenames of exported CSVs would differ across different querystring parameters. Maybe S`treet_Trees-56cbd54.csv` where `56cbd54` is a hash of the querystring?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-397949002", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 397949002, "node_id": "MDEyOklzc3VlQ29tbWVudDM5Nzk0OTAwMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T05:53:17Z", "updated_at": "2018-06-18T05:53:17Z", "author_association": "OWNER", "body": "Advanced export pane:\r\n\r\n![2018-06-17 at 10 52 pm](https://user-images.githubusercontent.com/9599/41520166-3809a45a-7281-11e8-9dfa-2b10f4cb9672.png)\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-397952129", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 397952129, "node_id": "MDEyOklzc3VlQ29tbWVudDM5Nzk1MjEyOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T06:15:36Z", "updated_at": "2018-06-18T06:15:51Z", "author_association": "OWNER", "body": "Advanced export pane demo: https://latest.datasette.io/fixtures-35b6eb6/facetable?_size=4", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/316#issuecomment-398030903", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/316", "id": 398030903, "node_id": "MDEyOklzc3VlQ29tbWVudDM5ODAzMDkwMw==", "user": {"value": 132230, "label": "gavinband"}, "created_at": "2018-06-18T12:00:43Z", "updated_at": "2018-06-18T12:00:43Z", "author_association": "NONE", "body": "I should add that I'm using datasette version 0.22, Python 2.7.10 on Mac OS X.  Happy to send more info if helpful.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 333238932, "label": "datasette inspect takes a very long time on large dbs"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/266#issuecomment-398098582", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/266", "id": 398098582, "node_id": "MDEyOklzc3VlQ29tbWVudDM5ODA5ODU4Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T15:40:32Z", "updated_at": "2018-06-18T15:40:32Z", "author_association": "OWNER", "body": "This is now released in Datasette 0.23! http://datasette.readthedocs.io/en/latest/changelog.html#v0-23", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323681589, "label": "Export to CSV"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/316#issuecomment-398101670", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/316", "id": 398101670, "node_id": "MDEyOklzc3VlQ29tbWVudDM5ODEwMTY3MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T15:49:35Z", "updated_at": "2018-06-18T15:50:38Z", "author_association": "OWNER", "body": "Wow, I've gone as high as 7GB but I've never tried it against 600GB.\r\n\r\n`datasette inspect` is indeed expected to take a long time for large databases. That's why it's available as a separate command: by running `datasette inspect` to generate `inspect-data.json` you can execute it just once against a large database and then have `datasette serve` take advantage of that cached metadata (hence avoiding `datasette serve` hanging on startup).\r\n\r\nAs you spotted, most of the time is spent in those counts. I imagine you don't need those row counts in order for the rest of Datasette to function correctly (they are mainly used for display purposes - on the https://latest.datasette.io/fixtures index page for example).\r\n\r\nIf your database changes infrequently, for the moment I recommend running `datasette inspect` once to generate the `inspect-data.json` file (let me know how long it takes) and then passing that file to `datasette serve mydb.db --inspect-file=inspect-data.json`\r\n\r\nIf your database DOES change frequently then this workaround won't help you much. Let me know and I'll see how much work it would take to have those row counts be optional rather than required.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 333238932, "label": "datasette inspect takes a very long time on large dbs"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/265#issuecomment-398102537", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/265", "id": 398102537, "node_id": "MDEyOklzc3VlQ29tbWVudDM5ODEwMjUzNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T15:52:15Z", "updated_at": "2018-06-18T15:52:15Z", "author_association": "OWNER", "body": "https://latest.datasette.io/ now always hosts the latest version of the code. I've started linking to it from our documentation.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 323677499, "label": "Add links to example Datasette instances to appropiate places in docs"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/316#issuecomment-398109204", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/316", "id": 398109204, "node_id": "MDEyOklzc3VlQ29tbWVudDM5ODEwOTIwNA==", "user": {"value": 132230, "label": "gavinband"}, "created_at": "2018-06-18T16:12:45Z", "updated_at": "2018-06-18T16:12:45Z", "author_association": "NONE", "body": "Hi Simon,\r\nThanks for the response.  Ok I'll try running `datasette inspect` up front.\r\nIn principle the db won't change.  However, the site's in development and it's likely I'll need to add views and some auxiliary (smaller) tables as I go along.  I will need to be careful with this if it involves an inspect step in each iteration, though.\r\ng.\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 333238932, "label": "datasette inspect takes a very long time on large dbs"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/271#issuecomment-398133924", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/271", "id": 398133924, "node_id": "MDEyOklzc3VlQ29tbWVudDM5ODEzMzkyNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2018-06-18T17:32:22Z", "updated_at": "2018-06-18T17:32:22Z", "author_association": "OWNER", "body": "As seen in #316 inspect is already taking a VERY long time to run against large (600GB) databases.\r\n\r\nTo get this working I may have to make inspect an optional optimization and run introspection for columns and primary keys in demand.\r\n\r\nThe one catch here is the `count(*)` queries - Datasette may need to learn not to return full table counts in circumstances where the count has not been pre-calculates and takes more than Xms to generate.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 324162476, "label": "Mechanism for automatically picking up changes when on-disk .db file changes"}, "performed_via_github_app": null}