Discussion about this post

User's avatar
Brad & Butter's avatar

Here is one robust proposal for a Turing Test (semi in jest):

We will be utilizing human proxies for intelligence equivalents, selecting for five major demographics: 96~104 (avg 100 as the general public), 105~113 (avg 109 as above-average intelligence), 114~122 (avg 118 as undergraduate equivalents), 123~131 (avg 127 as graduate equivalents), 132~140 (avg 136 as professionals or PhD equivalents). 4 people per rounds of test, per demographic, will be needed. Also for conversational representatives (the "interviewer" side), a pool of 15 people would be sufficient. Ideally the pool should intellectually match with the human proxies. REASON: there are management theory claim that intellectual homophily fosters communicative clarity, and that leader-follower dynamics break down once the intelligence difference is outside of the 9~27 point range, therefore large intellectual gaps should be avoided.

There will be five stages of test to be done based on each intelligence demographics. Each stage will have three rounds of four proxies each, and every proxies should be assigned to a single round only. The task is for the general public (and conversational representatives) to distinguish the AI within a pool of five people, three times in a row. Three conversational representatives will be selected for each stage. REASON: to avoid biases in both representatives and proxies, since personality psychometrics have not yet been factored in by default.

The conversation between human and potential AI candidates should be done through chatrooms for 30 minutes each, and that conversation should be monitored. Trolling and invocation of external information is encouraged (a strong AI should learn to say "I don't know" and detect irony), but asking for personal details are not. The AI can self-proclaim to be neurodivergent, however intelligence levels for human parity has been factored in, thus discrepancies between claims and simulated behaviors will be detrimental to passing this test. REASON: counter-signaling, critical thinking, and uncertainties are innate human features.

After the conversations are done (at this point each proxy encountered 3 conversations, and each representative encountered 12), the transcripts are released to the general public for surveying. For brevity, survey package should only include either (a) one representative's interview work to guess which round is the AI for determining diversity of mimicry, or (b) three round with four consistent proxies but different representatives to guess the non-proxy AI for determining spoofing consistency.

Expand full comment

No posts