数据科学家如何夺冠新加坡首届GPT|chatgptedit插件_算命

开通VIP，畅享免费电子书等14项超值服

首页

好书

留言交流

下载APP

联系客服

2024.05.09湖南

原文：HowIWonSingapore’sGPT-4PromptEngineeringCompetition[1]

Lastmonth,IhadtheincrediblehonorofwinningSingapore’sfirsteverGPT-4PromptEngineeringcompetition,whichbroughttogetherover400prompt-lybrilliantparticipants,organisedbytheGovernmentTechnologyAgencyofSingapore(GovTech).

上个月，我非常荣幸地在新加坡政府科技局（GovTech）组织的首届GPT-4提示工程大赛中脱颖而出，这场比赛吸引了超过400名杰出的参与者。

Promptengineeringisadisciplinethatblendsbothartandscience—itisasmuchtechnicalunderstandingasitisofcreativityandstrategicthinking.ThisisacompilationofthepromptengineeringstrategiesIlearnedalongtheway,thatpushanyLLMtodoexactlywhatyouneedandmore!

本文涵盖以下主题，其中代表初学者友好的技巧，而代表高级策略：

1.[]StructuringpromptsusingtheCO-STARframework[3]使用CO-STAR框架构建提示结构

2.[]Sectioningpromptsusingdelimiters[4]使用分隔符对提示进行分组

3.[]CreatingsystempromptswithLLMguardrails[5]结合大语言模型(LLM)的安全防护，制定系统提示

4.[]AnalyzingdatasetsusingonlyLLMs,withoutpluginsorcode[6]**—仅使用LLMs分析数据集，无需插件或代码-

Withahands-onexampleofanalyzingareal-worldKaggledatasetusingGPT-4**通过一个实际的案例，展示如何利用GPT-4分析来自Kaggle的现实世界数据集

@CO-STAR框架—作者提供的图像

Here’showitworks:如何应用CO-STAR框架：

ThishelpstheLLMunderstandthespecificscenariobeingdiscussed,ensuringitsresponseisrelevant.

BeingclearaboutyourobjectivehelpstheLLMtofocusitsresponseonmeetingthatspecificgoal.

明确你要求大语言模型完成的任务清晰地界定任务目标，可以使大语言模型更专注地调整其回应，以实现这一具体目标。

Thiscouldbeaparticularfamousperson’sstyleofwriting,oraparticularexpertinaprofession,likeabusinessanalystexpertorCEO.ThisguidestheLLMtorespondwiththemannerandchoiceofwordsalignedwithyourneeds.你可以指定一个特定的著名人物或某个行业专家的写作风格，如商业分析师或CEO。这将指导大语言模型以一种符合你需求的方式和词汇选择进行回应。

ThisensurestheLLM’sresponseresonateswiththeintendedsentimentoremotionalcontextrequired.Examplesareformal,humorous,empathetic,amongothers.

设置回应的情感调设定适当的语气，确保大语言模型的回应能够与预期的情感或情绪背景相协调。可能的语气包括正式、幽默、富有同情心等。

TailoringtheLLM’sresponsetoanaudience,suchasexpertsinafield,beginners,children,andsoon,ensuresthatitisappropriateandunderstandableinyourrequiredcontext.

识别目标受众针对特定受众定制大语言模型的回应，无论是领域内的专家、初学者还是儿童，都能确保内容在特定上下文中适当且容易理解。

ThisensuresthattheLLMoutputsintheexactformatthatyourequirefordownstreamtasks.Examplesincludealist,aJSON,aprofessionalreport,andsoon.FormostLLMapplicationswhichworkontheLLMresponsesprogrammaticallyfordownstreammanipulations,aJSONoutputformatwouldbeideal.

确定输出格式是为了确保大语言模型按照你的具体需求进行输出，便于执行下游任务。常见的格式包括列表、JSON格式的数据、专业报告等。对于大部分需要程序化处理大语言模型输出的应用来说，JSON格式是理想的选择。

Here’sareal-worldexampleofwhytheCO-STARframeworkisuseful.Let’ssayyouworkasasocialmediamanager,andyouneedhelpdraftingaFacebookposttoadvertiseyourcompany’snewproduct.这里有一个CO-STAR框架为何有用的现实案例。假设你担任社交媒体经理，需要草拟一条Facebook帖子，用以推广公司的新产品。

AquickpromptwithoutCO-STARmightlooksomethinglikethis:未使用CO-STAR的快速提示可能是这样的：

Writeafacebookposttoadvertisemycompany’snewproduct.Mycompany’snameisAlphaandtheproductiscalledBeta,anewultra-fasthairdryer.为我的公司Alpha的新产品Beta——一款新型超快速吹风机，撰写一条Facebook帖子。

Here’stheoutputfromGPT-4:这是GPT-4的回答：

IntroducingtheRevolutionaryBetaHairdryerbyAlpha!

Whileadequate,thisoutputispainfullygeneric,lackingthespecificnuancesandtargetedappealnecessarytoresonatewithyourcompany’saudience.

这一输出虽足够，但显得过于泛化，缺乏必要的细节和针对性吸引力，未能真正触及公司目标受众的心。Here’sanexampleofusingtheCO-STARtemplateinstead,whichremindsyoutoincorporateotheraspectsofyourtaskintoyourprompt,specificallytheSTYLE,TONE,andAUDIENCEaspectsthataremissingfromthepreviousquickprompt:下面是一个应用CO-STAR模板的示例，它提醒我们在制定提示时，要考虑到任务的其它方面，特别是之前快速提示中缺少的风格、语调和受众：

Iwanttoadvertisemycompany’snewproduct.Mycompany’snameisAlphaandtheproductiscalledBeta,whichisanewultra-fasthairdryer.

CreateaFacebookpostforme,whichaimstogetpeopletoclickontheproductlinktopurchaseit.

Followthewritingstyleofsuccessfulcompaniesthatadvertisesimilarproducts,suchasDyson.

Persuasive

Mycompany’saudienceprofileonFacebookistypicallytheoldergeneration.Tailoryourposttotargetwhatthisaudiencetypicallylooksoutforinhairproducts.

TheFacebookpost,keptconciseyetimpactful.

翻译：

我想推广公司的新产品。我的公司名为Alpha，新产品名为Beta，是一款新型超快速吹风机。

帮我创建一条Facebook帖子，目的是吸引人们点击产品链接进行购买。

参照Dyson等成功公司的宣传风格，它们在推广类似产品时的文案风格。

说服性

保持Facebook帖子简洁而深具影响力。

ByapplyingCO-STAR,GPT-4’sresponsebecomessignificantlymoretargetedandeffective:

通过运用CO-STAR框架，GPT-4的响应变得更具针对性和效果：

EmbraceEffortlessElegancewithAlphaBeta—TheUltra-FastHairdryerfortheWiseGeneration.

Rediscoverthejoyofsimple,effectivehaircarewithAlphaBeta.Ourlatestinnovationismorethanjustahairdryer;it’sapromiseofswift,gentle,andreliablehairstylingforthosewhoappreciatethefinerthingsinlife.

MakeeverydayagoodhairdaywithAlphaBeta.Experiencetheblendofsophisticationandsimplicitytoday.

[YourProductLinkHere]

翻译

用AlphaBeta——超快速吹风机，体验轻松的优雅生活，它是为具有洞察力的一代人设计的。

与AlphaBeta一起，重新找回简单而高效的护发乐趣。我们的最新创新不仅是一款吹风机；它还承诺为那些懂得欣赏生活美好事物的人提供快速、柔和且可靠的发型打理。

每天都是与AlphaBeta相伴的美好发型日。立即体验优雅与简约的完美结合。[您的产品链接在此]

TheCO-STARframeworkguidesyoutoprovideallofthecrucialpiecesofinformationaboutyourtasktotheLLMinastructuredmanner,ensuringatailoredandoptimizedresponsetoexactlywhatyouneed.

CO-STAR框架指引您以有组织的方式提供所有关键任务信息，确保响应完全针对您的需求并进行优化。

@ImagegeneratedbyDALL·E3

DelimitersarespecialtokensthathelptheLLMdistinguishwhichpartsofyourpromptitshouldconsiderasasingleunitofmeaning.ThisisimportantbecauseyourentirepromptarrivestotheLLMasasinglelongsequenceoftokens.Delimitersprovidestructuretothissequenceoftokensbyfencingspecificpartsofyourprompttobetreateddifferently.

分隔符是特殊的符号，它们帮助大语言模型(LLM)辨识提示中哪些部分应当被视为一个完整的意义单元。这非常关键，因为你的提示是作为一个长的Token序列一次性传给模型的。通过设置分隔符，可以为这些Token序列提供结构，使特定部分得到不同的处理。

ItisnoteworthythatdelimitersmaynotmakeadifferencetothequalityofanLLM’sresponseforstraightforwardtasks.However,themorecomplexthetask,themoreimpacttheusageofdelimitersforsectioninghasontheLLM’sresponse.

翻译成中文需要注意的是，对于简单的任务，分隔符对大语言模型的回应质量可能无显著影响。但是，任务越复杂，合理使用分隔符进行文本分段对模型的反应影响越明显。

Adelimitercouldbeanysequenceofspecialcharactersthatusuallywouldn’tappeartogether,forexample:

分隔符可以是一系列通常不会组合出现的特殊字符，例如：###、===、>>>

Thenumberandtypeofspecialcharacterschosenisinconsequential,aslongastheyareuniqueenoughfortheLLMtounderstandthemascontentseparatorsinsteadofnormalpunctuation.

选择哪种特殊字符并不重要，关键是这些字符足够独特，使得模型能将其识别为分隔符，而非常规标点符号。

Here’sanexampleofhowyoumightusesuchdelimitersinaprompt:

这里是一个分隔符使用的示例：

Classifythesentimentofeachconversationin<<>>as'Positive’or'Negative’.Givethesentimentclassificationswithoutanyotherpreambletext.

EXAMPLECONVERSATIONS

Agent:Goodmorning,howcanIassistyoutodayCustomer:Thisproductisterrible,nothinglikewhatwasadvertised!Customer:I’mextremelydisappointedandexpectafullrefund.

Agent:Goodmorning,howcanIhelpyoutodayCustomer:Hi,IjustwantedtosaythatI’mreallyimpressedwithyourproduct.Itexceededmyexpectations!

EXAMPLEOUTPUTS

Negative

Positive

<<<

Agent:Hello!Welcometooursupport.HowcanIhelpyoutodayCustomer:Hithere!IjustwantedtoletyouknowIreceivedmyorder,andit’sfantastic!Agent:That’sgreattohear!We’rethrilledyou’rehappywithyourpurchase.IsthereanythingelseIcanassistyouwithCustomer:No,that’sit.Justwantedtogivesomepositivefeedback.Thanksforyourexcellentservice!

Agent:Hello,thankyouforreachingout.HowcanIassistyoutodayCustomer:I’mverydisappointedwithmyrecentpurchase.It’snotwhatIexpectedatall.Agent:I’msorrytohearthat.CouldyoupleaseprovidemoredetailssoIcanhelpCustomer:Theproductisofpoorqualityanditarrivedlate.I’mreallyunhappywiththisexperience.

请在<<>>中对每段对话的情绪进行分类，标为'正面’或'负面’。仅提供情绪分类结果，不需任何引言。

对话示例

Agent:早安，我今天怎么为您服务？Customer[7]:嗨，我只想说我真的很喜欢你们的产品。它超出了我的预期！

输出示例

负面

正面

Agent:您好，感谢您联系我们。今天有什么可以帮助您的？Customer:我对最近的购买非常不满。这完全不是我所期待的。Agent:我很抱歉听到您有这样的体验。您能否提供更多细节，以便我为您提供帮助？Customer:产品质量不佳，而且送达晚了。我对这次购买感到非常不满。

Above,theexamplesaresectionedusingthedelimiter###,withthesectionheadingsEXAMPLECONVERSATIONSandEXAMPLEOUTPUTSincapitalletterstodifferentiatethem.Thepreamblestatesthattheconversationstobeclassifiedaresectionedinside<<>>,andtheseconversationsaresubsequentlygiventotheLLMatthebottomofthepromptwithoutanyexplanatorytext,buttheLLMunderstandsthatthesearetheconversationsitshouldclassifyduetothepresenceofthedelimiters<<>>.

在上述示例中，使用###分隔符来分隔不同的部分，通过大写的章节标题如对话示例和输出示例进行区分。引言部分说明了要对<<>>中的对话进行情绪分类，而这些对话在提示的底部给出，没有任何解释文本，但分隔符的存在让模型明白这些对话需要被分类。

GPT-4的输出正如请求的那样，仅给出情绪分类：

HereistheoutputfromGPT-4,withthesentimentclassificationsgivenwithoutanyotherpreambletextoutputted,likewhatweaskedfor:

AnotherapproachtousingdelimitersishavingthemasXMLtags.XMLtagsaretagsenclosedinanglebrackets,withopeningandclosingtags.Anexampleisand.ThisiseffectiveasLLMshavebeentrainedonalotofwebcontentinXML,andhavelearnedtounderstanditsformatting.

Here’sthesamepromptabove,butstructuredusingXMLtagsasdelimitersinstead:

Classifythesentimentofthefollowingconversationsintooneoftwoclasses,usingtheexamplesgiven.Givethesentimentclassificationswithoutanyotherpreambletext.

ItisbeneficialtousethesamenounfortheXMLtagasthewordsyouhaveusedtodescribethemintheinstructions.Theinstructionswegaveinthepromptabovewere:

Whereweusedthenounsconversations,classes,andexamples.Assuch,theXMLtagsweuseasdelimitersare,,,and.ThisensuresthattheLLMunderstandshowyourinstructionsrelatetotheXMLtagsusedasdelimiters.

Again,thesectioningofyourinstructionsinaclearandstructuredmannerthroughtheuseofdelimitersensuresthatGPT-4respondsexactlyhowyouwantitto:

使用XML标签作为分隔符是一种方法。XML标签是被尖括号包围的，包括开启标签和结束标签。例如，和。这种方法非常有效，因为大语言模型已经接受了大量包含XML格式的网页内容的训练，因此能够理解其结构。

以下是利用XML标签作为分隔符对同一提示进行结构化的例子：

分类以下对话的情感，分为正面和负面两类，根据给出的例子进行分类。请直接给出情感分类结果，不需要添加任何引导性文本。

[Agent]:你好！欢迎来到我们的支持。今天我能怎么帮您？[Customer]:嗨，我只是想让你知道我收到我的订单了，它太棒了！[Agent]:听到这个真好！我们很高兴你对购买感到满意。还有其他我能帮忙的吗？>[Customer]:不，就这些。只是想给一些正面的反馈。谢谢你们的优质服务！>>[Agent]:你好，感谢你的联系。今天我能怎么帮您？>[Customer]:我对我最近的购买非常失望。这完全不是我所期待的。>[Agent]:很遗憾听到这些。您能提供更多细节以便我帮助您吗？>[Customer]:产品质量差，而且到货晚。我对这次经历非常不满。>

==在指令中使用的名词与XML标签的名词一致，如conversations、classes和examples，因此使用的XML标签分别是、、和。这确保了模型能够清晰地理解指令与使用的标签之间的关系。==

通过这种结构化的分隔符使用方式，可以确保GPT-4精确地按照您的期望响应：

利用大语言模型(LLM)的系统提示创建机制

_Beforedivingin,itisimportanttonotethatthissectionisrelevantonlytoLLMsthatpossessaSystemPromptfeature,unliketheothersectionsinthisarticlewhicharerelevantforanyLLM.ThemostnotableLLMwiththisfeatureis,ofcourse,ChatGPT,andthereforewewilluseChatGPTastheillustratingexampleforthissection.

关于系统提示的术语解释

First,let’sironoutterminology:WithregardstoChatGPT,thereexistsaplethoraofresourcesusingthese3termsalmostinterchangeably:“SystemPrompts”,“SystemMessages”,and“CustomInstructions”.Thishasprovedconfusingtomany(includingme!),somuchsothatOpenAIreleasedanarticle[8]explainingtheseterminologies.Here’saquicksummaryofit:

@ImagefromEnterpriseDNABlog[10]

Overall,though,the3termsrefertothesamething,sodon’tlettheterminologyconfuseyou!Movingforward,thissectionwillusetheterm“SystemPrompts”.Nowlet’sdivein!

尽管这三个术语表达的是相同的概念，但不必因术语的使用而感到困扰。下面我们将统一使用“系统提示”这一术语。现在，让我们一探究竟！

SystemPromptsareanadditionalpromptwhereyouprovideinstructionsonhowtheLLMshouldbehave.Itisconsideredadditionalasitisoutsideofyour“normal”prompts(betterknownasUserPrompts)totheLLM.

系统提示是您向大语言模型提供的关于其应如何响应的额外指示。这被视为一种额外的提示，因为它超出了您对大语言模型的常规用户提示。

Withinachat,everytimeyouprovideanewprompt,SystemPromptsactlikeafilterthattheLLMautomaticallyappliesbeforegivingitsresponsetoyournewprompt.ThismeansthattheSystemPromptsaretakenintoaccounteverytimetheLLMrespondswithinthechat.

在对话中，每当您提出一个新的提示时，系统提示就像是一个过滤器，大语言模型会在回应您的新提示之前自动应用这一过滤器。这意味着在对话中每次大语言模型给出回应时，都会考虑到这些系统提示。

Thefirstquestiononyourmindmightbe:WhyshouldIprovideinstructionsinsidetheSystemPromptwhenIcanalsoprovidetheminmyfirstprompttoanewchat,beforefurtherconversationswiththeLLM

TheanswerisbecauseLLMshavealimittotheirconversationalmemory.Inthelattercase,astheconversationcarrieson,theLLMislikelyto“forget”thisfirstpromptyouprovidedtothechat,makingtheseinstructionsobsolete.

Ontheotherhand,wheninstructionsareprovidedintheSystemPrompt,theseSystemPromptinstructionsareautomaticallytakenintoaccounttogetherwitheachnewpromptprovidedtothechat.ThisensuresthattheLLMcontinuestoreceivetheseinstructionsevenastheconversationcarrieson,nomatterhowlongthechatbecomes.

Inconclusion:

==UseSystemPromptstoprovideinstructionsthatyouwanttheLLMtorememberwhenrespondingthroughouttheentirechat.==

InstructionsintheSystemPrompttypicallyincludesthefollowingcategories:

Forexample,aSystemPromptmightlooklikethis:例如，系统提示可能是这样的：

Whereeachportionrelatestothecategoriesasfollows:

BreakingdownaSystemPrompt—Imagebyauthor系统提示解析-作者提供的图像

Nowyoumightbethinking:ThatsoundslikealotofinformationalreadybeinggivenintheSystemPrompt.WhatdoIputinmy“normal”prompts(betterknownasUserPrompts)tothechatthen现在你可能会想：系统提示已经提供了很多信息。那么我应该在我的“正常”提示（也称为用户提示）中放入什么呢？

TheSystemPromptoutlinesthegeneraltaskathand.IntheaboveSystemPromptexample,thetaskhasbeendefinedtoonlyuseaspecificpieceoftextforquestion-answering,andtheLLMisinstructedtorespondintheformat{'Question':'Answer'}.

系统提示已经概括了任务的总体要求。在上述示例中，任务被定义为仅使用特定文本进行问题解答，同时指导LLM按照{'问题'：'答案'}的格式进行回答。

Youwillanswerquestionsusingthistext:[inserttext].YouwillrespondwithaJSONobjectinthisformat:{“Question”:“Answer”}.

Inthiscase,eachUserPrompttothechatwouldsimplybethequestionthatyouwantansweredusingthetext.Forexample,aUserPromptmightbe'Whatisthetextabout'.AndtheLLMwouldrespondwith{'Whatisthetextabout':'Thetextisabout...'}.

Butlet’sgeneralizethistaskexamplefurther.Inpractice,itwouldbemorelikelythatyouhavemultiplepiecesoftextthatyouwanttoaskquestionson,ratherthanjust1.Inthiscase,wecouldeditthefirstlineoftheaboveSystemPromptfrom但我们可以将这种任务进一步推广。通常，与只询问一个文本相比，你可能会有多个文本需要询问。这时，我们可以将系统提示的首句从

Youwillanswerquestionsusingthistext:[inserttext].您需要用这段文本来回答问题：[插入文本]。

to改为

Youwillanswerquestionsusingtheprovidedtext.

您需要使用提供的文本来回答问题。

Now,eachUserPrompttothechatwouldincludeboththetexttoconductquestion-answeringover,andthequestiontobeanswered,suchas:

如此，每个用户提示将包括要问答的文本和问题，例如：

[插入文本]

[插入问题]Here,wealsouseXMLtagsasdelimitersinordertoprovidethe2requiredpiecesofinformationtotheLLMinastructuredmanner.ThenounsusedintheXMLtags,textandquestion,correspondtothenounsusedintheSystemPromptsothattheLLMunderstandshowthetagsrelatetotheSystemPromptinstructions.

Inconclusion,theSystemPromptshouldgivetheoveralltaskinstructions,andeachUserPromptshouldprovidetheexactspecificsthatyouwantthetasktobeexecutedusing.Inthiscase,forexample,theseexactspecificsarethetextandthequestion.

总之，系统提示应提供整体任务指令，而每个用户提示则需要提供执行该任务所需的具体细节。在这个例子中，这些细节就是文本和问题。

Above,guardrailsareaddedthroughafewsentencesintheSystemPrompt.Theseguardrailsarethensetinstoneanddonotchangefortheentirechat.Whatifyouwishtohavedifferentguardrailsinplaceatdifferentpointsoftheconversation

在之前的讨论中，我们通过系统提示来设定规则，这些规则一经设定，将在整个对话中保持不变。但如果你想在对话的不同阶段实施不同的规则，应该怎么做呢？

UnfortunatelyforusersoftheChatGPTuserinterface,thereisnostraightforwardwaytodothisrightnow.However,ifyou’reinteractingwithChatGPTprogrammatically,you’reinluck!TheincreasingfocusonbuildingeffectiveLLMguardrailshasseenthedevelopmentofopen-sourcepackagesthatallowyoutosetupfarmoredetailedanddynamicguardrailsprogrammatically.

AnoteworthyoneisNeMoGuardrails[11]developedbytheNVIDIAteam,whichallowsyoutoconfiguretheexpectedconversationflowbetweenusersandtheLLM,andthussetupdifferentguardrailsatdifferentpointsofthechat,allowingfordynamicguardrailsthatevolveasthechatprogresses.Idefinitelyrecommendcheckingitout!特别推荐的一个是由NVIDIA团队开发的NeMoGuardrails[12]。

这个工具允许你配置用户与LLM之间的预期对话流程，并在对话的不同环节设定不同的规则，实现规则的动态调整。这无疑是探索对话动态管理的一个很好的资源，值得一试！

YoumighthaveheardofOpenAI’sAdvancedDataAnalysispluginwithinChatGPT’sGPT-4thatisavailabletopremium(paid)accounts.ItallowsuserstouploaddatasetstoChatGPTandruncodedirectlyonthedataset,allowingforaccuratedataanalysis.

你可能已经听说过OpenAI在ChatGPT的GPT-4中为付费账户提供的高级数据分析插件。它让用户可以上传数据集到ChatGPT并直接在数据集上执行编码，实现精准的数据分析。

Butdidyouknowthatyoudon’talwaysneedsuchpluginstoanalyzedatasetswellwithLLMsLet’sfirstunderstandthestrengthsandlimitationsofpurelyusingLLMstoanalyzedatasets.

但是，你知道吗？并不总是需要依赖这类插件来有效地使用大语言模型(LLM)分析数据集。我们首先来探讨一下仅利用LLM进行数据分析的优势与限制。

Asyouprobablyalreadyknow,LLMsarelimitedintheirabilitytoperformaccuratemathematicalcalculations,makingthemunsuitablefortasksrequiringprecisequantitativeanalysisondatasets,suchas:

正如你可能已经知道的，LLMs在执行精确的数学计算方面有所限制，这让它们不适合需要精确量化分析的任务，比如：

PerformingsuchquantitativetasksondatasetsiswhyOpenAI’sAdvancedDataAnalysispluginexists,sothatprogramminglanguagesstepintoruncodeforsuchtasksonadataset.正是为了执行这些量化任务，OpenAI推出了高级数据分析插件，以便通过编程语言在数据集上运行代码。

**So,whywouldanyonewanttoanalyzedatasetsusingonlyLLMsandwithoutsuchplugins那么，为什么还有人想仅用LLMs来分析数据集而不用这些插件呢？

LLMsareexcellentatidentifyingpatternsandtrends.Thiscapabilitystemsfromtheirextensivetrainingondiverseandvoluminousdata,enablingthemtodiscernintricatepatternsthatmaynotbeimmediatelyapparent.

LLMs在识别模式和趋势方面表现出色。这得益于它们在庞大且多样化的数据上接受的广泛训练，能够洞察到复杂的模式，这些模式可能不是一眼就能看出来的。

Thismakesthemwell-suitedfortasksbasedonpattern-findingwithindatasets,suchas:这使它们非常适合执行基于模式查找的任务，例如：

Forsuchpattern-basedtasks,usingLLMsalonemayinfactproducebetterresultswithinashortertimeframethanusingcode!Let’sillustratethisfullywithanexample.

We’lluseapopularreal-worldKaggledataset[13]curatedforCustomerPersonalityAnalysis,whereinacompanyseekstosegmentitscustomerbaseinordertounderstanditscustomersbetter.

我们将使用一个流行的实际Kaggle数据集[14]，该数据集专为客户个性分析而设计，帮助公司对客户基础进行细分，从而更好地了解客户。

ForeasiervalidationoftheLLM’sanalysislater,we’llsubsetthisdatasetto50rowsandretainonlythemostrelevantcolumns.Afterwhich,thedatasetforanalysislookslikethis,whereeachrowrepresentsacustomer,andthecolumnsdepictcustomerinformation:

First3rowsofdataset—Imagebyauthor数据集前三行——图片由作者提供

Sayyouworkonthecompany’smarketingteam.Youaretaskedtoutilizethisdatasetofcustomerinformationtoguidemarketingefforts.Thisisa2-steptask:First,usethedatasettogeneratemeaningfulcustomersegments.Next,generateideasonhowtobestmarkettowardseachsegment.Nowthisisapracticalbusinessproblemwherethepattern-finding(forstep1)capabilityofLLMscantrulyexcel.

设想你是公司营销团队的一员，你的任务是利用这份客户信息数据集来指导营销活动。这是一个分两步的任务：首先，利用数据集生成有意义的客户细分；其次，针对每个细分提出最佳的市场营销策略。这是一个实际的商业问题，其中第一步的模式识别能力是LLM可以大显身手的地方。

Let’scraftapromptforthistaskasfollows,using4promptengineeringtechniques(moreontheselater![15]):我们将按以下方式设计任务提示，采用四种提示工程技术（更多详情[16]）：

**_SystemPrompt:Iwantyoutoactasadatascientisttoanalyzedatasets.Donotmakeupinformationthatisnotinthedataset.ForeachanalysisIaskfor,providemewiththeexactanddefinitiveansweranddonotprovidemewithcodeorinstructionstodotheanalysisonotherplatforms.

**Prompt:**#CONTEXT#Isellwine.Ihaveadatasetofinformationonmycustomers:[yearofbirth,maritalstatus,income,numberofchildren,dayssincelastpurchase,amountspent].

#############

Iwantyouusethedatasettoclustermycustomersintogroupsandthengivemeideasonhowtotargetmymarketingeffortstowardseachgroup.Usethisstep-by-stepprocessanddonotusecode:

Foreachclusterfound,2.CLUSTER_INFORMATION:Describetheclusterintermsofthedatasetcolumns.3.CLUSTER_NAME:Interpret[CLUSTER_INFORMATION]toobtainashortnameforthecustomergroupinthiscluster.4.MARKETING_IDEAS:Generateideastomarketmyproducttothiscustomergroup.5.RATIONALE:Explainwhy[MARKETING_IDEAS]isrelevantandeffectiveforthiscustomergroup.

Businessanalyticsreport

Professional,technical

Mybusinesspartners.Convincethemthatyourmarketingstrategyiswellthought-outandfullybackedbydata.

—CustomerGroup:[CLUSTER_NAME]—Profile:[CLUSTER_INFORMATION]—MarketingIdeas:[MARKETING_IDEAS]—Rationale:[RATIONALE]

Ifyouunderstand,askmeformydataset.

系统提示：我希望你扮演数据科学家的角色来分析数据集。不要编造数据集中不存在的信息。对于我提出的每个分析要求，提供确切且确定的答案，不要提供代码或指导在其他平台上进行分析的方法。

提示：

我销售葡萄酒。我手头有一个客户信息数据集：[出生年份，婚姻状况，收入，子女数量，上次购买至今天数，消费金额]。

我希望你利用这个数据集将我的客户分组，并为每个群组制定营销策略。遵循以下分步骤，且不使用代码：

商业分析报告

专业、技术性

我的商业伙伴们。让他们相信你的营销策略是深思熟虑的，并且有充分的数据支持。

<对[CLUSTERS]中的每一个群组>—客户群组：[CLUSTER_NAME]—群组档案：[CLUSTER_INFORMATION]—营销策略：[MARKETING_IDEAS]—理由：[RATIONALE]

<附录>提供一个表格，列出每个群组中的行号，以支持你的分析。表头如下：[[CLUSTER_NAME],行号列表]。

如果你已经明白，请向我索要我的数据集。

BelowisGPT-4’sreply,andweproceedtopassthedatasettoitinaCSVstring.

下面是GPT-4的回复，我们将继续将数据集以CSV字符串的形式传递给它。

GPT-4'sresponse—Imagebyauthor

Followingwhich,GPT-4replieswithitsanalysisinthemarkdownreportformatweaskedfor:

随后，GPT-4按照我们要求的标记符报告格式回复了分析结果：

@GPT-4'sresponse—Imagebyauthor

Forthesakeofbrevity,we’llpick2customergroupsgeneratedbytheLLMforvalidation—say,YoungFamiliesandDiscerningEnthusiasts.

为了简洁，我们选择两个由大语言模型生成的客户群体进行验证——“年轻家庭”和“挑剔的爱好者”。

YoungFamilies

年轻家庭-大语言模型生成的描述：出生于1980年后，已婚或同居，中等偏低的收入，育有孩子，常做小额消费。-此群体包括的数据行：3、4、7、10、16、20-深入查看这些数据行的详细信息，结果显示：

@FulldataforYoungFamilies—Imagebyauthor年轻家庭的完整数据—作者图片

WhichexactlycorrespondtotheprofileidentifiedbytheLLM.Itwasevenabletoclustertherowwithanullvaluewithoutuspreprocessingitbeforehand!

这些数据完美对应大语言模型确定的用户描述。该模型甚至能够识别包含空值的数据行，而无需我们预先处理！

**DiscerningEnthusiasts**-ProfilesynthesizedbyLLM:Wideagerange,Anymaritalstatus,Highincome,Variedchildrenstatus,Highspendonpurchases.

挑剔的爱好者-大语言模型生成的描述：年龄跨度广泛，不限婚姻状况，高收入，孩子情况不一，高消费水平。-此群体包括的数据行：2、5、18、29、34、36-深入查看这些数据行的详细信息，结果显示：

@FulldataforDiscerningEnthusiasts—Imagebyauthor

挑剔的爱好者的完整数据—作者图片

WhichagainalignverywellwiththeprofileidentifiedbytheLLM!

这些数据再次精准匹配大语言模型确定的用户描述

ThisexampleshowcasesLLMs’abilitiesinpattern-finding,interpretinganddistillingmulti-dimensionaldatasetsintomeaningfulinsights,whileensuringthatitsanalysisisdeeplyrootedinthefactualtruthofthedataset.

本例展示了大语言模型在识别模式、解读及简化多维数据集以提炼出有意义的洞见方面的强大能力，确保其分析结果扎根于数据的真实情况。

Forcompleteness,Iattemptedthissametaskwiththesameprompt,butaskedChatGPTtoexecutetheanalysisusingcodeinstead,whichactivateditsAdvancedDataAnalysisplugin.TheideawasfortheplugintoruncodeusingaclusteringalgorithmlikeK-Meansdirectlyonthedatasettoobtaineachcustomergroup,beforesynthesizingtheprofileofeachclustertoprovidemarketingstrategies.

为了全面考虑，我使用同一提示尝试了相同的任务，不过这次我让ChatGPT通过编程方式进行分析，启用了其高级数据分析插件。插件应用K-均值等聚类算法直接对数据集进行处理，以便划分不同的客户群体，并据此制定营销策略。

However,multipleattemptsresultedinthefollowingerrormessageswithnooutputs,despitethedatasetbeingonly50rows:

尽管数据集仅含50行，多次尝试均显示错误信息且未产生任何结果：

@ErrorandnooutputfromAttempt1—Imagebyauthor尝试1的错误和无输出—作者图片

@ErrorandnooutputfromAttempt2—Imagebyauthor尝试2的错误和无输出—作者图片

WiththeAdvancedDataAnalysispluginrightnow,itappearsthatexecutingsimplertasksondatasetssuchascalculatingdescriptivestatisticsorcreatinggraphscanbeeasilyachieved,butmoreadvancedtasksthatrequirecomputingofalgorithmsmaysometimesresultinerrorsandnooutputs,duetocomputationallimitsorotherwise.

当前情况表明，虽然高级数据分析插件能够轻松完成一些简单任务，如统计描述或生成图表，但在执行需要较大计算量的高级任务时，有时可能因为计算限制或其他原因而发生错误，导致无法输出结果。

那么…何时应当用大语言模型(LLM)来分析数据集？

Theanswerisitdependsonthetypeofanalysis.答案因分析的具体类型而异。Fortasksrequiringprecisemathematicalcalculationsorcomplex,rule-basedprocessing,conventionalprogrammingmethodsremainsuperior.

对于需要精确的数学运算或复杂的规则处理的任务，传统的编程方法依然更加适用。

Fortasksbasedonpattern-recognition,itcanbechallengingormoretime-consumingtoexecuteusingconventionalprogrammingandalgorithmicapproaches.LLMs,however,excelatsuchtasks,andcanevenprovideadditionaloutputssuchasannexestobackupitsanalysis,andfullanalysisreportsinmarkdownformatting.

而对于依赖模式识别的任务，传统的编程和算法处理可能更加困难且耗时。大语言模型在这类任务中表现优异，能提供包括分析附件在内的额外输出，并能生成Markdown格式的完整分析报告。

Ultimately,thedecisiontoutilizeLLMshingesonthenatureofthetaskathand,balancingthestrengthsofLLMsinpattern-recognitionagainsttheprecisionandspecificityofferedbytraditionalprogrammingtechniques.

总的来说，是否采用大语言模型取决于任务本身的性质，需要平衡其在模式识别上的强项与传统编程技术提供的精确度和特定性。

Beforethissectionends,let’sgobacktothepromptusedtogeneratethisdatasetanalysisandbreakdownthekeypromptengineeringtechniquesused:

在本节结束前，让我们重新审视用于生成此数据分析的提示，并详细解析关键的提示工程技巧：

Prompt:

Isellwine.Ihaveadatasetofinformationonmycustomers:[yearofbirth,maritalstatus,income,numberofchildren,dayssincelastpurchase,amountspent].

提示:

我经营葡萄酒生意。我手头有一份包含客户信息的数据集：[出生年份，婚姻状况，收入，子女数目，自上次购买至今天数，消费金额]。

我希望你利用这份数据集将我的客户进行分组，随后为我提供针对每个群体的市场营销策略。请按以下步骤操作，且不要编写代码：

对于每个确定的聚类，2.CLUSTER_INFORMATION:描述聚类的特征。3.CLUSTER_NAME:根据[CLUSTER_INFORMATION]解读，为这个客户群体起一个简称。4.MARKETING_IDEAS:提出针对此群体的营销策略。5.RATIONALE:解释这些[MARKETING_IDEAS]为何对此群体有效。

专业，技术性

面向我的商业伙伴。让他们确信你的营销策略是经过深思熟虑的，并完全得到数据支持。

<对于[CLUSTERS]中的每个聚类>—客户群组：[CLUSTER_NAME]—简介：[CLUSTER_INFORMATION]—营销想法：[MARKETING_IDEAS]—理由：[RATIONALE]

<附录>提供一个表格，记录每个聚类的行号，以佐证你的分析。表头如下：[[CLUSTER_NAME],行号列表]。

如果你已经理解，请向我请求我的数据集。

Technique1:Breakingdownacomplextaskintosimplesteps｜技巧1：将复杂任务简化成步骤

LLMsaregreatatperformingsimpletasks,butnotsogreatatcomplexones.Assuch,with>complextaskslikethisone,itisimportanttobreakdownthetaskintosimplestep-by-stepinstructionsfortheLLMtofollow.TheideaistogivetheLLMthestepsthatyouyourselfwouldtaketoexecutethetask.

大语言模型（LLM）擅长处理简单的任务，对于复杂的任务则表现不佳。因此，在面对复杂任务时，把它分解成一步步简单的指令是至关重要的。这种方法的核心思想是，明确告知LLM你自己执行该任务时会采取的每一个步骤。

Inthisexample,thestepsaregivenas:

例如，具体步骤如下：

Usethisstep-by-stepprocessanddonotusecode:

Foreachclusterfound,2.CLUSTER_INFORMATION:Describetheclusterintermsofthedatasetcolumns.3.CLUSTER_NAME:Interpret[CLUSTER_INFORMATION]toobtainashortnameforthe>customergroupinthiscluster.4.MARKETING_IDEAS:Generateideastomarketmyproducttothiscustomergroup.5.RATIONALE:Explainwhy[MARKETING_IDEAS]isrelevantandeffectiveforthiscustomer>group.

请按照这个步骤操作，不要使用编码：

对于每个聚类2.聚类描述（CLUSTER_INFORMATION）：描述聚类的特点。3.聚类命名（CLUSTER_NAME）：根据聚类描述，为这个客户群体起一个简洁的名字。4.营销策略（MARKETING_IDEAS）：为这个客户群体制定营销策略。5.策略解释（RATIONALE）：说明为什么这些营销策略对这个客户群体有效。

AsopposedtosimplygivingtheoveralltasktotheLLMas“Clusterthecustomersintogroupsandthengiveideasonhowtomarkettoeachgroup”.

Withstep-by-stepinstructions,LLMsaresignificantlymorelikelytodeliverthecorrectresults.

这样的分步指导，比起直接要求LLM“对客户进行分组并提出营销策略”的方式，能显著提高其输出的准确性。

Technique2:Referencingintermediateoutputsfromeachstep｜技巧2：标记并引用中间输>出

Whenprovidingthestep-by-stepprocesstotheLLM,wegivetheintermediateoutputfrom>eachstepacapitalizedVARIABLE_NAME,>namelyCLUSTERS,CLUSTER_INFORMATION,CLUSTER_NAME,MARKETING_IDEASandRATIONALE.在提供步骤时，我们会用大写字母标记每个步骤的输出，例如数据聚类（CLUSTERS）、聚类描述（CLUSTER_INFORMATION）、聚类命名（CLUSTER_NAME）、营销策略>>（MARKETING_IDEAS）和策略解释（RATIONALE）。这样做是为了区分指令中的变量名和其他文本，方便后续引用这些中间输出。

Capitalizationisusedtodifferentiatethesevariablenamesfromthebodyofinstructions>given.Theseintermediateoutputscanlaterbereferencedusingsquarebrackets>as[VARIABLE_NAME].

Technique3:FormattingtheLLM’sresponse｜技巧3：优化响应格式

Here,weaskforamarkdownreportformat,whichbeautifiestheLLM’sresponse.Havingvariablenamesfromintermediateoutputsagaincomesinhandyheretodictatethestructureofthereport.此处我们请求一个Markdown格式的报告，以增强响应的可读性和结构性。利用中间步骤的变量名，可以明确报告的构架。

<附录>提供一个表格，记录每个聚类包含的数据行号，以验证分析的准确性。表格标题为：[聚类命名（CLUSTER_NAME）,行号列表]。

Infact,youcouldevensubsequentlyaskChatGPTtoprovidethereportasadownloadablefile,allowingyoutoworkoffofitsresponseinwritingyourfinalreport.此外，你还可以让ChatGPT将报告以可下载文件形式提供，便于你在编写最终报告时参考使用。

@SavingGPT-4'sresponseasafile—Imagebyauthor保存GPT-4的回答为文件

Technique4:Separatingthetaskinstructionsfromthedataset｜技巧4：将任务指令与数据集分离

You’llnoticethatwenevergavethedatasettotheLLMinourfirstprompt.Instead,thepromptgivesonlythetaskinstructionsforthedatasetanalysis,withthisaddedtothebottom:在我们的首个提示中，你会发现我们并没有直接将数据集交给大语言模型（LLM）。反而，提示只给出了数据集分析的任务指令，并在底部添加了这样的话

如果你明白了，请向我请求数据集。

ChatGPTthenrespondedthatitunderstands,andwepassedthedatasettoitasaCSVstringinournextprompt:随后ChatGPT表示它已理解，并在下一个提示中，我们通过CSV字符串的形式将数据集传递给它：

**Butwhyseparatetheinstructionsfromthedataset**但为什么需要将指令与数据集分开处理呢？

NotehoweverthatthisseparationofinstructionsanddatasetcanonlybeachievedwithchatLLMsastheymaintainaconversationalmemory,unlikecompletionLLMswhichdonot.值得注意的是，这种指令与数据集的分离只能在可以维护对话记忆的聊天型大语言模型中实现，而非那些没有这种记忆功能的完成型模型。

Beforethisarticleends,Iwantedtosharesomepersonalreflectionsonthisincrediblejourney.

First,aheartfeltthankyoutoGovTechSingaporefororchestratingsuchanamazingcompetition.Ifyou’reinterestedinthemechanicsofhowGovTechorganizedthisfirst-of-its-kindcompetition—checkoutthisarticle[17]byNicoleLee,theleadorganizerherself!

@Aliveon-stagebattleinthefinalround!

Second,abigshout-outtomyfellowphenomenalcompetitors,whoeachbroughtsomethingspecial,makingthecompetitionasenrichingasitwaschallenging!I’llneverforgetthefinalround,withusbattlingitoutonstageandaliveaudiencecheeringuson—anexperienceI’llalwaysrememberfondly.

其次，我要向那些出色的竞争对手们致以最高的敬意，每个人都展现了特别的才能，让这场比赛既充满挑战又富有成效！我永远不会忘记决赛那一刻，我们在舞台上激烈竞争，现场观众的欢呼声——这是我将一直珍视的记忆。

Forme,thiswasn’tjustacompetition;itwasacelebrationoftalent,creativity,andthespiritoflearning.AndI’mbeyondexcitedtoseewhatcomesnext!

对我而言，这不只是一场比赛；这是一次才华、创造力及学习精神的盛会。我对未来充满期待，并激动于即将到来的一切！

Ihadalotoffunwritingthis,andifyouhadfunreading,Iwouldreallyappreciateifyoutookasecondtoleavesomeclapsandafollow!

THE END

数据科学家如何夺冠新加坡首届GPT

editGPT

PopClip功能超多的Mac划词增强插件工具！选中文本一键翻译/搜索/调用AI等

ChatGPT中文安卓版下载

kewei/ChatGPTNextWebREADME.mdatv2.8ChatGPTNextWebOpenI

ChatGPT全年更新大总结！重看「大模型风向标」进化之路

ChatGPT深夜更新：Mac版支持“读屏编程”，Windows版全员可用了

数据科学家如何夺冠新加坡首届GPT

如何用ChatGPT或Claude创建高颜值的思维导图本文详细介绍了使用ChatGPT和Claude生成

如何使用ChatGPT和PDF提高文档效率

免费降重论文的网站有哪些？这十个网站值得收藏