home / github

Menu
  • GraphQL API

issue_comments_fts

9,947 rows

✎ View and edit SQL

This data as json, CSV (advanced)

Link rowid ▼ body issue_comments_fts rank
338523957 338523957 I also need to solve for weird primary keys. If it’s a single integer or a single char field that’s easy. But what if it is a compound key with more than one chat field? What delimiter can I use that will definitely be safe? Let’s say I use hyphen. Now I need to find a durable encoding for any hyphens that might exist in the key fields themselves. How about I use URLencoding for every non-alpha-numeric character? That will turn hyphens into (I think) %2D. It should also solve for unicode characters, but it means the vast majority of keys (integers) will display neatly, including a compound key of eg 5678-345 1  
338524454 338524454 Table rendering logic needs to detect the primary key field and turn it into a hyperlink. If there is a compound primary key it should add an extra column at the start of the table which displays the compound key as a link 1  
338524857 338524857 https://stackoverflow.com/a/14468878/6083 Looks like I should order by compound primary key and implement cursor-based pagination. 1  
338526148 338526148 https://github.com/ahupp/python-magic/blob/master/README.md 1  
338530389 338530389 This means I need a good solution for these compile time options while running in development mode 1  
338530480 338530480 How about when the service starts up it checks for a compile.json file and, if it is missing, creates it using the same code we run at compile time normally 1  
338530704 338530704 Needed by https://github.com/simonw/stateless-datasets/issues/4#issuecomment-338530389 1  
338531827 338531827 Many of the applications I want to implement with this would benefit from having permanent real URLs. So let’s have both. The sha1 urls will serve far future cache headers (and an etag derived from their path). The non sha1 URLs will serve 302 uncached redirects to the sha1 locations. We will have a setting that lets people opt out of this behavior. 1  
338697223 338697223 Now returning this: { "error": "attempt to write a readonly database", "ok": false } 1  
338768860 338768860 I could use the table-reflow mechanism demonstrated here: http://demos.jquerymobile.com/1.4.3/table-reflow/ 1  
338769538 338769538 Maybe this should be handled by views instead? https://stateless-datasets-wreplxalgu.now.sh/ lists some views https://stateless-datasets-wreplxalgu.now.sh/?sql=select%20*%20from%20%22Order%20Subtotals%22 is an example showing the content of a view. What would the URL to views be? I don't think a view can share a name with a table, so the same URL scheme could work for both. 1  
338789734 338789734 URL design: /database/table.json - redirects to /database-6753f4a/table.json So we always redirect to the version with the truncated hash in the URL. 1  
338797522 338797522 https://stackoverflow.com/a/18134919/6083 is a good answer about how many characters of the hash are needed to be unique. I say we default to 7 characters, like git does - but allow extras to be configured. 1  
338799438 338799438 Can I take advantage of HTTP/2 so even if you get redirected I start serving you the correct resource straight away? 1  
338804173 338804173 Looks like the easiest way to implement HTTP/2 server push today is to run behind Cloudflare and use this: Link: </asset/to/push.js>; rel=preload; as=script https://blog.cloudflare.com/announcing-support-for-http-2-server-push-2/ Here's the W3C draft: https://w3c.github.io/preload/ From https://w3c.github.io/preload/#as-attribute it looks like I should use `as=fetch` if the content is intended for consumption by fetch() or XMLHTTPRequest. Unclear if I should throw `as=fetch crossorigin` in there. Need to experiment on that. 1  
338806718 338806718 Here's what the homepage of cloudflare.com does (with newlines added within the link header for clarity): $ curl -i 'https://www.cloudflare.com/' HTTP/1.1 200 OK Date: Mon, 23 Oct 2017 21:45:58 GMT Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Connection: keep-alive link: </vendor/bitdashplayer.min.css>; rel=preload; as=style, </vendor/bitdash-controls.min.css>; rel=preload; as=style, </video/marketing-video/cloudflare-marketing-video.mpd>; rel=preload, </video/marketing-video/cloudflare-marketing-video.m3u8>; rel=preload, </video/marketing-video/video_0_800000/dash/init.mp4>; rel=preload; as=video, </video/marketing-video/audio_0_128000/dash/init.mp4>; rel=preload; as=video, </video/marketing-video/video_0_800000/dash/segment_0.m4s>; rel=preload; as=video, </video/marketing-video/audio_0_128000/dash/segment_0.m4s>; rel=preload; as=video, </video/new-short-optimized-22.mp4>; rel=preload; as=video, </video/marketing-video/poster.jpg>; rel=preload; as=image The original header looked like this: link: </vendor/bitdashplayer.min.css>; rel=preload; as=style, </vendor/bitdash-controls.min.css>; rel=preload; as=style, </video/marketing-video/cloudflare-marketing-video.mpd>; rel=preload, </video/marketing-video/cloudflare-marketing-video.m3u8>; rel=preload, </video/marketing-video/video_0_800000/dash/init.mp4>; rel=preload; as=video, </video/marketing-video/audio_0_128000/dash/init.mp4>; rel=preload; as=video, </video/marketing-video/video_0_800000/dash/segment_0.m4s>; rel=preload; as=video, </video/marketing-video/audio_0_128000/dash/segment_0.m4s>; rel=preload; as=video, </video/new-short-optimized-22.mp4>; rel=preload; as=video, </video/marketing-video/poster.jpg>; rel=preload; as=image 1  
338834213 338834213 If I can’t setect a primary key, I won’t provide a URL for those records 1  
338852971 338852971 I'm not going to bother with this. 1  
338853083 338853083 Fixed in 9d219140694551453bfa528e0624919eb065f9d6 1  
338854988 338854988 /database-name/table-name?name__contains=simon&sort=id+desc Note that if there's a column called "sort" you can still do sort__exact=blah 1  
338857568 338857568 I can find the primary keys using: PRAGMA table_info(myTable) 1  
338859620 338859620 I’m going to implement everything in https://docs.djangoproject.com/en/1.11/ref/models/querysets/#field-lookups with the exception of range and the various date ones. 1  
338859709 338859709 I’m going to need to write unit tests for this, is this depends on #9 1  
338861511 338861511 Some tables won't have primary keys, in which case I won't generate pages for individual records. 1  
338863155 338863155 I’m going to use py.test and start with all tests in a single tests.py module 1  
338872286 338872286 I'm going to use `,` as the separator between elements of a compound primary key. If those elements themselves include a comma I will use `%2C` in its place. 1  
338882110 338882110 Well, I've started it at least. 1  
338882207 338882207 Next step: generate links to these. 1  
339003850 339003850 As of b46e370ee6126aa2fa85cf789a31da38aed98496 this is done. 1  
339019873 339019873 Here's what I've got now: <img width="693" alt="localhost_8006_northwind-40d049b_categories_json" src="https://user-images.githubusercontent.com/9599/31950748-1bc85f3e-b891-11e7-8205-d89da6129f4b.png"> 1  
339027711 339027711 I have code to detect primary keys on tables... but what should I do for tables that lack primary keys? How should I even sort them? 1  
339028979 339028979 Looks like I can use the SQLite specific “rowid” in that case. It isn’t guaranteed to stay consistent across a VACUUM but that’s ok because we are immutable anyway. https://www.sqlite.org/lang_createtable.html#rowid 1  
339138809 339138809 May as well support most of https://sqlite.org/lang_expr.html 1  
339186887 339186887 Still to do: - [x] `gt`, `gte`, `lt`, `lte` - [x] `like` - [x] `glob` 1  
339210353 339210353 I'm going to call this one done for the moment. The date filters can go in a stretch goal. 1  
339366612 339366612 I had to manually set the content disposition header: return await response.file_stream( filepath, headers={ 'Content-Disposition': 'attachment; filename="{}"'.format(ilepath) } ) In the next release of Sanic I can just use the filename= argument instead: https://github.com/channelcat/sanic/commit/07e95dba4f5983afc1e673df14bdd278817288aa 1  
339382054 339382054 Could this be as simple as using the iterative JSON encoder and adding a yield statement in between each chunk? 1  
339388215 339388215 First experiment: hook up an iterative CSV dump (just because that’s a tiny bit easier to get started with than iterative a JSON). Have it execute a big select statement and then iterate through the result set 100 rows at a time using sqite fetchmany() - also have it async sleep for a second in between each batch of 100. Can this work without needing python threads? 1  
339388771 339388771 If this does work, I need to figure it what to do about the HTML view. ASsuming I can iteratively produce JSON and CSV, what to do about HTML? One option: render the first 500 rows as HTML, then hand off to an infinite scroll experience that iteratively loads more rows as JSON. 1  
339389105 339389105 The gold standard here is to be able to serve up increasingly large datasets without blocking the event loop and while using a sustainable amount of RAM 1  
339389328 339389328 Ideally we can get some serious gains from the fact that our database file is opened with the immutable option. 1  
339395551 339395551 Simplest implementation will be to create a temporary directory somewhere, copy in a Dockerfile and the databases and run “now” in it. Ideally I can use symlinks rather than copying potentially large database files around. 1  
339406634 339406634 It certainly looks like some of the stuff in https://sqlite.org/pragma.html could be used to screw around with things. Example: `PRAGMA case_sensitive_like = 1` - would that affect future queries? 1  
339413825 339413825 Could I use https://sqlparse.readthedocs.io/en/latest/ to parse incoming statements and ensure they are pure SELECTs? Would that prevent people from using a compound SELECT statement to trigger an evil PRAGMA of some sort? 1  
339420462 339420462 https://sitesforprofit.com/responsive-table-plugins-and-patterns has some useful links. I really like the pattern from https://css-tricks.com/responsive-data-tables/ /* Max width before this PARTICULAR table gets nasty This query will take effect for any screen smaller than 760px and also iPads specifically. */ @media only screen and (max-width: 760px), (min-device-width: 768px) and (max-device-width: 1024px) { /* Force table to not be like tables anymore */ table, thead, tbody, th, td, tr { display: block; } /* Hide table headers (but not display: none;, for accessibility) */ thead tr { position: absolute; top: -9999px; left: -9999px; } tr { border: 1px solid #ccc; } td { /* Behave like a "row" */ border: none; border-bottom: 1px solid #eee; position: relative; padding-left: 50%; } td:before { /* Now like a table header */ position: absolute; /* Top/left values mimic padding */ top: 6px; left: 6px; width: 45%; padding-right: 10px; white-space: nowrap; } /* Label the data */ td:nth-of-type(1):before { content: "First Name"; } td:nth-of-type(2):before { content: "Last Name"; } td:nth-of-type(3):before { content: "Job Title"; } td:nth-of-type(4):before { content: "Favorite Color"; } td:nth-of-type(5):before { content: "Wars of Trek?"; } td:nth-of-type(6):before { content: "Porn Name"; } td:nth-of-type(7):before { content: "Date of Birth"; } td:nth-of-type(8):before { content: "Dream Vacation City"; } td:nth-of-type(9):before { content: "GPA"; } td:nth-of-type(10):before { conten… 1  
339510770 339510770 It looks like I should double quote my columns and ensure they are correctly escaped https://blog.christosoft.de/2012/10/sqlite-escaping-table-acolumn-names/ - hopefully using ? placeholders for column names will work. I should use ? for tables too. 1  
339514819 339514819 I’m going to have a single command-line app that does everything. Name to be decided - options include dataset, stateless, datasite (I quite like that - it reflects SQLite and the fact that you create a website) 1  
339515822 339515822 datasite . - starts web app in current directory, serving all DB files datasite . -p 8001 - serves on custom port datasite blah.db blah2.db - serves specified files You can’t specify more than one directory. You can specify as many files as you like. If you specify two files with different oaths but the same name then they must be accessed by hash. datasite publish . - publishes current directory to the internet! Uses now by default, if it detects it on your path. Other publishers will be eventually added as plugins. datasite publish http://path-to-db.db - publishes a DB available at a URL. Works by constructing the Dockerfile with wget calls in it. datasite blah.db -m metadata.json If you specify a directory it looks for metadata.json in that directory. Otherwise you can pass an explicit metadata file oath with -m or —metadata 1  
339516032 339516032 Another potential name: datapi 1  
339517846 339517846 I’m going to use Click for this http://nvie.com/posts/writing-a-cli-in-python-in-under-60-seconds/ https://kushaldas.in/posts/building-command-line-tools-in-python-with-click.html 1  
339724700 339724700 Here’s how to make the “serve” subcommand the default if it is called with no arguments: @click.group(invoke_without_command=True) def serve(): # ... 1  
339866724 339866724 <img width="624" alt="databases" src="https://user-images.githubusercontent.com/9599/32087550-4e10b8a4-ba91-11e7-992b-e628d77114d0.png"> 1  
339891755 339891755 Deploys to Now aren't working at the moment - they aren't showing the uploaded databases, because I've broken the path handling somehow. I need to do a bit more work here. 1  
340561577 340561577 http://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/quickstart.html describes how to package this for PyPI 1  
340787868 340787868 Here’s how I can (I think) provide safe execution of arbitrary SQL while blocking PRAGMA calls: let people use names parameters in their SQL and apply strict filtering to the SQL query but not to the parameter values. cur.execute( "select * from people where name_last=:who and age=:age", { "who": who, "age": age }) In URL form: ?sql=select...&who=Terry&age=34 Now we can apply strict, dumb validation rules to the SQL part while allowing anything in the named queries - so people can execute a search for PRAGMA without being able to execute a PRAGMA statement. 1  
341938424 341938424 Done: https://github.com/simonw/stateless-datasets/commit/edaa10587e60946e0c1935333f6b79553db33798 1  
341945420 341945420 To simplify things a bit, I'm going to require that every database is explicitly listed in the command line. I won't support "serve everything in this directory" for the moment. 1  
342030075 342030075 ... I tried that, I don't like it. I'm going to bring back "directory serving" by allowing you to pass a directory as an argument to `datasite` (including `datasite .`). I may even make `.` the default if you don't provide anything at all. 1  
342032943 342032943 Default look with Bootstrap 4 looks like this: <img width="871" alt="flights" src="https://user-images.githubusercontent.com/9599/32423242-1b4458ce-c25a-11e7-910f-2dc1de909b8f.png"> 1  
342484889 342484889 I’m going to call this feature “count values” 1  
342521344 342521344 GDS Registries could be fun too: https://registers.cloudapps.digital/ 1  
343164111 343164111 Implemented in 31b21f5c5e15fc3acab7fabb170c1da71dc3c98c 1  
343168796 343168796 Won't fix: ujson is not compatible with the custom JSON encoder I'm using here: https://github.com/simonw/immutabase/blob/b2dee11fcd989d9e2a7bf4de1e23dbc320c05013/immutabase/app.py#L401-L416 1  
343237982 343237982 More terms: * publish * share * docker * host * stateless I want to capture the idea of publishing an immutable database in a stateless container. 1  
343238262 343238262 The name should ideally be available on PyPI and should make sense as both a command line application and a library. 1  
343239062 343239062 This looks promising: https://github.com/esnme/ultrajson/issues/124#issuecomment-323882878 1  
343266326 343266326 http://sanic.readthedocs.io/en/latest/sanic/testing.html 1  
343281876 343281876 How about datasette? 1  
343551356 343551356 I'm going with datasette. 1  
343557070 343557070 https://file.io/ looks like it could be good for this. It's been around since 2015, and lets you upload a temporary file which can be downloaded once. $ curl -s -F "file=@database.db" "https://file.io/?expires=1d" {"success":true,"key":"ySrl1j","link":"https://file.io/ySrl1j","expiry":"1 day"} Downloading from that URL serves up the data with a `Content-disposition` header containing the filename: simonw$ curl -vv https://file.io/ySrl1j | more % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 34.232.1.167... * Connected to file.io (34.232.1.167) port 443 (#0) * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: file.io * Server certificate: Amazon * Server certificate: Amazon Root CA 1 * Server certificate: Starfield Services Root Certificate Authority - G2 > GET /ySrl1j HTTP/1.1 > Host: file.io > User-Agent: curl/7.43.0 > Accept: */* > < HTTP/1.1 200 OK < Date: Fri, 10 Nov 2017 18:14:38 GMT < Content-Type: undefined < Transfer-Encoding: chunked < Connection: keep-alive < X-Powered-By: Express < X-RateLimit-Limit: 5 < X-RateLimit-Remaining: 4 < Access-Control-Allow-Origin: * < Access-Control-Allow-Headers: Cache-Control,X-reqed-With,x-requested-with < Content-disposition: attachment; filename=database.db ... 1  
343581130 343581130 I'm going to handle this a different way. I'm going to support a local history of your own queries stored in localStorage, but if you want to share a query you have to do it with a URL. If people really want canned query support, they can do that using custom templates - see #12 - or by adding views to their database before they publish it. 1  
343581332 343581332 I'm not going to use Sanic's mechanism for this. I'll use arguments passed to my cli instead. 1  
343643332 343643332 Here's what a table looks like now at a smaller screen size: <img width="496" alt="parlgov-development__info_data_source" src="https://user-images.githubusercontent.com/9599/32686818-7ffa691e-c662-11e7-964d-201fce0694e4.png"> 1  
343644891 343644891 I can detect something is a view like this: SELECT name from sqlite_master WHERE type ='view'; 1  
343644976 343644976 Simplest version of this: 1. Create a temporary directory 2. Write a Dockerfile into it that pulls an image and pip installs datasette 3. Add symlinks to the DBs they listed (so we don't have to copy them) 4. Shell out to "now" 5. Done! 1  
343645249 343645249 Doing this works: import os os.link('/tmp/databases/northwind.db', '/tmp/tmp-blah/northwind.db') That creates a link in tmp-blah - and then when I delete that entire directory like so: import shutil shutil.rmtree('/tmp/tmp-blah') The original database is not deleted, just the link. 1  
343645327 343645327 I can create the temporary directory like so: import tempfile t = tempfile.TemporaryDirectory() t <TemporaryDirectory '/var/folders/w9/0xm39tk94ng9h52g06z4b54c0000gp/T/tmpkym70wlp'> t.name '/var/folders/w9/0xm39tk94ng9h52g06z4b54c0000gp/T/tmpkym70wlp' And then to delete it all: t.cleanup() 1  
343646740 343646740 I'm happy with this now that I've implemented the publish command in #26 1  
343647102 343647102 http://2016.padjo.org/tutorials/data-primer-census-acs1-demographics/ has a sqlite database: http://2016.padjo.org/files/data/starterpack/census-acs-1year/acs-1-year-2015.sqlite I tested this by deploying it here: https://datasette-fewuggrvwr.now.sh/ 1  
343647300 343647300 Still needed: - [ ] A link to the homepage from some kind of navigation bar in the header - [ ] link to github.com/simonw/datasette in the footer - [ ] Slightly better titles (maybe ditch the visited link colours for titles only? should keep those for primary key links) - [ ] Links to the .json and .jsono versions of every view 1  
343675165 343675165 The plugin system can also allow alternative providers for the `publish` command - e.g. maybe hook up hyper.sh as an option for publishing containers. 1  
343676574 343676574 See also #14 1  
343683566 343683566 I’m going to solve this by making it an optional argument you can pass to the serve command. Then the Dockerfile can still build and use it but it won’t interfere with tests or dev. If argument is not passed, we will calculate hashes on startup and calculate table row counts on demand. 1  
343690060 343690060 "parlgov-development.db": { "url": "http://www.parlgov.org/" }, "nhsadmin.sqlite": { "url": "https://github.com/psychemedia/openHealthDataDoodles" } 1  
343691342 343691342 Closing this, opening a fresh ticket for the navigation stuff. 1  
343697291 343697291 I'm going to bundle sql and sql_params together into a query nested object like this: { "query": { "sql": "select ...", "params": { "p0": "blah" } } } 1  
343698214 343698214 I'm closing #50 - more tests will be added in the future, but the framework is neatly in place for them now. 1  
343699115 343699115 This needs to incorporate a sensible way of presenting custom SQL query results too. And let's get a textarea in there for executing SQL while we're at it. 1  
343705966 343705966 https://github.com/fivethirtyeight/data has a ton of CSVs 1  
343707624 343707624 Split the SQL thing out into #65 1  
343707676 343707676 Here's the new design: <img width="691" alt="parlgov-development" src="https://user-images.githubusercontent.com/9599/32695161-82821226-c708-11e7-835c-b3d91850b2e0.png"> Also lists views at the bottom (refs #54): <img width="345" alt="parlgov-development" src="https://user-images.githubusercontent.com/9599/32695164-99efa7de-c708-11e7-8272-bc5f5b870b84.png"> 1  
343708447 343708447 I ditched the metadata file concept. 1  
343709217 343709217 <img width="982" alt="nhsadmin" src="https://user-images.githubusercontent.com/9599/32695392-3ea12612-c70f-11e7-873b-9e6ad2c869e7.png"> 1  
343715915 343715915 con = sqlite3.connect('existing_db.db') with open('dump.sql', 'w') as f: for line in con.iterdump(): f.write('%s\n' % line) 1  
343752404 343752404 Re-opening this - I've decided to bring back this concept, see #68 1  
343752579 343752579 By default I'll allow LIMIT and OFFSET up to a maximum of X (where X is let's say 50,000 to start with, but can be custom configured to a larger number or set to None for no limit). 1  
343752683 343752683 Maybe SQL views should have their own Sanic view class (`ViewView` is kinda funny), subclassed from `TableView`? 1  
343753999 343753999 For initial launch, I could just support this as some optional command line arguments you pass to the publish command: datasette publish data.db --title="Title" --source="url" 1  
343754058 343754058 I’m going to store this stuff in a file called metadata.json and move the existing automatically generated metadata to a file called build.json 1  
343769692 343769692 I have created a Docker Hub public repository for this: https://hub.docker.com/r/simonwillison/datasette/ 1  

Next page

Advanced export

JSON shape: default, array, newline-delimited

CSV options:

CREATE VIRTUAL TABLE [issue_comments_fts] USING FTS5 (
                [body],
                content=[issue_comments]
            );
Powered by Datasette · Queries took 88.545ms · About: simonw/datasette-graphql