github: issue_comments: 13 rows where issue = 612287234

13 rows where issue = 612287234

Search:

descending

id ▼	html_url	issue_url	node_id	user	created_at	updated_at	author_association	body	reactions	issue
623805823	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623805823	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzgwNTgyMw==	simonw 9599	2020-05-05T02:45:56Z	2020-05-05T02:45:56Z	MEMBER	I filed an issue with `osxphotos` about this here: https://github.com/RhetTbull/osxphotos/issues/121	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623806085	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623806085	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzgwNjA4NQ==	simonw 9599	2020-05-05T02:47:18Z	2020-05-05T02:47:18Z	MEMBER	In https://github.com/RhetTbull/osxphotos/issues/121#issuecomment-623249263 Rhet Turnbull spotted a table called `ZSCENEIDENTIFIER` which looked like it might have the right data, but the columns in it aren't particularly helpful: ``` Z_PK,Z_ENT,Z_OPT,ZSCENEIDENTIFIER,ZASSETATTRIBUTES,ZCONFIDENCE 8,49,1,731,5,0.11834716796875 9,49,1,684,6,0.0233648251742125 10,49,1,1702,1,0.026153564453125 ``` I love the look of those confidence scores, but what do the numbers mean?	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623806533	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623806533	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzgwNjUzMw==	simonw 9599	2020-05-05T02:50:16Z	2020-05-05T02:50:16Z	MEMBER	I figured there must be a separate database that Photos uses to store the text of the identified labels. I used "Open Files and Ports" in Activity Monitor against the Photos app to try and spot candidates... and found `/Users/simon/Pictures/Photos Library.photoslibrary/database/search/psi.sqlite` - a 53MB SQLite database file. <img width="1365" alt="Item-0_and_Item-0_and_Item-0_and_Item-0" src="https://user-images.githubusercontent.com/9599/81031213-61ea0800-8e40-11ea-8237-cfce4a5128e0.png"> Here's the schema of that file: ``` $ sqlite3 psi.sqlite .schema CREATE TABLE word_embedding(word TEXT, extended_word TEXT, score DOUBLE); CREATE INDEX word_embedding_index ON word_embedding(word); CREATE VIRTUAL TABLE word_embedding_prefix USING fts5(extended_word) /* word_embedding_prefix(extended_word) */; CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_data'(id INTEGER PRIMARY KEY, block BLOB); CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID; CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_content'(id INTEGER PRIMARY KEY, c0); CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_docsize'(id INTEGER PRIMARY KEY, sz BLOB); CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_config'(k PRIMARY KEY, v) WITHOUT ROWID; CREATE TABLE groups(category INT2, owning_groupid INT, content_string TEXT, normalized_string TEXT, lookup_identifier TEXT, token_ranges_0 INT8, token_ranges_1 INT8, UNIQUE(category, owning_groupid, content_string, lookup_identifier, token_ranges_0, token_ranges_1)); CREATE TABLE assets(uuid_0 INT, uuid_1 INT, creationDate INT, UNIQUE(uuid_0, uuid_1)); CREATE TABLE ga(groupid INT, assetid INT, PRIMARY KEY(groupid, assetid)); CREATE TABLE collections(uuid_0 INT, uuid_1 INT, startDate INT, endDate INT, title TEXT, subtitle TEXT, keyAssetUUID_0 INT, keyAssetUUID_1 INT, typeAndNumberOfAssets INT32, sortDate DOUBLE, UNIQUE(uuid_0, uuid_1)); CREATE TABLE gc(groupid INT, collectionid INT, PRIMARY KEY(groupid, collectionid)); CREATE…	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623806687	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623806687	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzgwNjY4Nw==	simonw 9599	2020-05-05T02:51:16Z	2020-05-05T02:51:16Z	MEMBER	Running datasette against it directly doesn't work: ``` simon@Simons-MacBook-Pro search % datasette psi.sqlite Serve! files=('psi.sqlite',) (immutables=()) on port 8001 Usage: datasette serve [OPTIONS] [FILES]... Error: Connection to psi.sqlite failed check: no such tokenizer: PSITokenizer ``` Instead, I created a new SQLite database with a copy of some of the key tables, like this: ``` sqlite-utils rows psi.sqlite groups \| sqlite-utils insert /tmp/search.db groups - sqlite-utils rows psi.sqlite assets \| sqlite-utils insert /tmp/search.db assets - sqlite-utils rows psi.sqlite ga \| sqlite-utils insert /tmp/search.db ga - sqlite-utils rows psi.sqlite collections \| sqlite-utils insert /tmp/search.db collections - sqlite-utils rows psi.sqlite gc \| sqlite-utils insert /tmp/search.db gc - sqlite-utils rows psi.sqlite lookup \| sqlite-utils insert /tmp/search.db lookup - ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623807568	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623807568	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzgwNzU2OA==	simonw 9599	2020-05-05T02:56:06Z	2020-05-05T02:56:06Z	MEMBER	I'm pretty sure this is what I'm after. The `groups` table has what looks like identified labels in the rows with category = 2025: <img width="1122" alt="words__groups__2_528_rows_where_where_category___2025" src="https://user-images.githubusercontent.com/9599/81031361-e0df4080-8e40-11ea-9060-6d850aa52140.png"> Then there's a `ga` table that maps groups to assets: <img width="304" alt="words__ga__633_653_rows" src="https://user-images.githubusercontent.com/9599/81031387-f48aa700-8e40-11ea-9a3d-da23903be928.png"> And an `assets` table which looks like it has one row for every one of my photos: <img width="645" alt="words__assets__40_419_rows" src="https://user-images.githubusercontent.com/9599/81031402-04a28680-8e41-11ea-8047-e9199d068563.png"> One major challenge: these UUIDs are split into two integer numbers, `uuid_0` and `uuid_1` - but the main photos database uses regular UUIDs like this: ![image](https://user-images.githubusercontent.com/9599/81031481-39164280-8e41-11ea-983b-005ced641a18.png) I need to figure out how to match up these two different UUID representations. I asked on Twitter if anyone has any ideas: https://twitter.com/simonw/status/1257500689019703296	{"total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623811131	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623811131	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzgxMTEzMQ==	simonw 9599	2020-05-05T03:16:18Z	2020-05-05T03:16:18Z	MEMBER	Here's how to convert two integers unto a UUID using Java. Not sure if it's the solution I need though (or how to do the same thing in Python): https://repl.it/repls/EuphoricSomberClasslibrary <img width="1494" alt="Repl_it_-_EuphoricSomberClasslibrary" src="https://user-images.githubusercontent.com/9599/81032267-0d488c00-8e44-11ea-9be7-680eaccd1611.png"> ```java import java.util.UUID; class Main { public static void main(String[] args) { java.util.UUID uuid = new java.util.UUID( 2544182952487526660L, -3640314103732024685L ); System.out.println( uuid ); } } ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623845014	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623845014	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzg0NTAxNA==	RhetTbull 41546558	2020-05-05T03:55:14Z	2020-05-05T03:56:24Z	CONTRIBUTOR	I'm traveling w/o access to my Mac so can't help with any code right now. I suspected ZSCENEIDENTIFIER was a foreign key into one of these psi.sqlite tables. But looks like you're on to something connecting groups to assets. As for the UUID, I think there's two ints because each is 64-bits but UUIDs are 128-bits. Thus they need to be combined to get the 128 bit UUID. You might be able to use Apple's [NSUUID](https://developer.apple.com/documentation/foundation/nsuuid?language=objc), for example, by wrapping with pyObjC. Here's one [example](https://github.com/ronaldoussoren/pyobjc/blob/881c82a7ba90f193934b52b44143360c80dce5e5/pyobjc-framework-Cocoa/PyObjCTest/test_nsuuid.py) of using this in PyObjC's test suite. Interesting it's stored this way instead of a UUIDString as in Photos.sqlite. Perhaps it for faster indexing.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623846880	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623846880	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzg0Njg4MA==	simonw 9599	2020-05-05T04:06:08Z	2020-05-05T04:06:08Z	MEMBER	This function seems to convert them into UUIDs that match my photos: ```python def to_uuid(uuid_0, uuid_1): b = uuid_0.to_bytes(8, 'little', signed=True) + uuid_1.to_bytes(8, 'little', signed=True) return str(uuid.UUID(bytes=b)).upper() ```	{"total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623855841	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623855841	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzg1NTg0MQ==	simonw 9599	2020-05-05T04:54:28Z	2020-05-05T04:54:28Z	MEMBER	Things were not matching up for me correctly: <img width="1143" alt="search__select_json_object__img_src____https___photos_simonwillison_net_i______photos_sha256___________photos_ext______w_400___as_photo__groups_content_string__assets_uuid_0__assets_uuid_1__to_uuid_assets_uuid_0__assets_uuid_1__as_uuid__pho" src="https://user-images.githubusercontent.com/9599/81035923-ca8db080-8e51-11ea-95a7-6ee60bae7502.png"> I think that's because my import script didn't correctly import the existing `rowid` values.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623855885	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623855885	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzg1NTg4NQ==	simonw 9599	2020-05-05T04:54:39Z	2020-05-05T04:54:53Z	MEMBER	Trying this import mechanism instead: `sqlite3 /Users/simon/Pictures/Photos\ Library.photoslibrary/database/search/psi.sqlite .dump \| grep -v 'CREATE INDEX' \| grep -v 'CREATE TRIGGER' \| grep -v 'CREATE VIRTUAL TABLE' \| sqlite3 search.db`	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623857417	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623857417	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzg1NzQxNw==	simonw 9599	2020-05-05T05:01:47Z	2020-05-05T05:01:47Z	MEMBER	Even that didn't work - it didn't copy across the rowid values. I'm pretty sure that's what's wrong here: ``` sqlite3 /Users/simon/Pictures/Photos\ Library.photoslibrary/database/search/psi.sqlite 'select rowid, uuid_0, uuid_1 from assets limit 10' 1619605\|-9205353363298198838\|4814875488794983828 1641378\|-9205348195631362269\|390804289838822030 1634974\|-9205331524553603243\|-3834026796261633148 1619083\|-9205326176986145401\|7563404215614709654 22131\|-9205315724827218763\|8370531509591906734 1645633\|-9205247376092758131\|-1311540150497601346 ```	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623863902	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623863902	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzg2MzkwMg==	simonw 9599	2020-05-05T05:31:53Z	2020-05-05T05:31:53Z	MEMBER	Yes! Turning those `rowid` values into `id` with this script did the job: ```python import sqlite3 import sqlite_utils conn = sqlite3.connect( "/Users/simon/Pictures/Photos Library.photoslibrary/database/search/psi.sqlite" ) def all_rows(table): result = conn.execute("select rowid as id, * from {}".format(table)) cols = [c[0] for c in result.description] for row in result.fetchall(): yield dict(zip(cols, row)) if __name__ == "__main__": db = sqlite_utils.Database("psi_copy.db") for table in ("assets", "collections", "ga", "gc", "groups"): db[table].upsert_all(all_rows(table), pk="id", alter=True) ``` Then I ran this query: ```sql select json_object('img_src', 'https://photos.simonwillison.net/i/' \|\| photos.sha256 \|\| '.' \|\| photos.ext \|\| '?w=400') as photo, group_concat(strip_null_chars(groups.content_string), ' ') as words, assets.uuid_0, assets.uuid_1, to_uuid(assets.uuid_0, assets.uuid_1) as uuid from assets join ga on assets.id = ga.assetid join groups on ga.groupid = groups.id join photos on photos.uuid = to_uuid(assets.uuid_0, assets.uuid_1) where groups.category = 2024 group by assets.id order by random() limit 10 ``` And got these results! <img width="1054" alt="psi_copy__select_json_object__img_src____https___photos_simonwillison_net_i______photos_sha256___________photos_ext______w_400___as_photo__group_concat_strip_null_chars_groups_content_string________as_words__assets_uuid_0__assets_uuid_1__to" src="https://user-images.githubusercontent.com/9599/81037264-f1021a80-8e56-11ea-9924-6f9f55a0fb4b.png">	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234
623865250	https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623865250	https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16	MDEyOklzc3VlQ29tbWVudDYyMzg2NTI1MA==	simonw 9599	2020-05-05T05:38:16Z	2020-05-05T05:38:16Z	MEMBER	It looks like `groups.content_string` often has a null byte in it. I should clean this up as part of the import.	{"total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0}	Import machine-learning detected labels (dog, llama etc) from Apple Photos 612287234

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);