網(wǎng)易首頁 > 網(wǎng)易號 > 正文申請入駐

剛剛，OpenAI最強智能體上線ChatGPT！編程革命徹底爆發(fā)！

2025-05-17 23:55:37　來源: 互聯(lián)網(wǎng)思想

廣東舉報

分享至

來源：新智元

【導讀】OpenAI最強AI編程智能體真的來了！Codex震撼上線，由o3優(yōu)化版codex-1加持，多任務(wù)并行，半小時干完數(shù)天軟件工程任務(wù)。

從今天起，AI編程正式開啟新時代！

剛剛，Greg Brockman帶隊與OpenAI六人團隊開啟線上直播，震撼發(fā)布了一款云端AI編程智能體——Codex。

用奧特曼的話來說就是，一個人就能打造無數(shù)爆款應(yīng)用的時代來了！

Codex由新模型codex-1加持，這是o3的一個特調(diào)版本，專為軟件工程量身打造。

它不僅能在云端沙盒環(huán)境中安全地并行處理多項任務(wù)，而且通過與GitHub無縫集成，還可以直接調(diào)用你的代碼庫。

它不僅僅是一款工具，更是一位「10x工程師」，能夠同時做到：

快速構(gòu)建功能模塊
深入解答代碼庫問題
精準修復(fù)代碼漏洞
提交PR
自動執(zhí)行測試驗證

過去，這些任務(wù)或許耗費開發(fā)者數(shù)小時乃至數(shù)日，如今Codex最多在30分鐘內(nèi)高效完成。

點擊ChatGPT側(cè)邊欄，輸入提示后，直接點擊「代碼」分配任務(wù)，或「提問」咨詢代碼庫相關(guān)問題

通過強化學習，Codex基于真實世界的編碼任務(wù)和多樣化環(huán)境訓練，生成的代碼不僅符合人類偏好，還能無縫融入標準工作流。

基準測試顯示，codex-1在SWE-bench上拿下72.1%的高分，一舉擊敗了Claude 3.7以及o3-high。

從今天起，Codex將向全球ChatGPT Pro、Enterprise和Team用戶正式開放，Plus和Edu用戶很快就能上手了。

可以說，AI編程智能體Codex的橫空出世，或?qū)⒅厮苘浖_發(fā)的底層邏輯，徹底點燃了編程革命的火種。

Codex多任務(wù)并行，AI編程超級加速器

早在2021年，OpenAI首次發(fā)布了CodeX模型，開啟了「氛圍編程」（vibe coding）的時代。

這種編程方式讓開發(fā)者與AI協(xié)同工作，代碼生產(chǎn)變得更加直觀、高效。

幾周前，OpenAI又推出了CodeX CLI，一款可在本地終端運行的智能體。

但這只是開始！

OpenAI今天推出全新的Codex智能體，再次將軟件工程推向一個全新的高度。

接下來，一睹Codex編碼的驚艷表現(xiàn)吧。

連接GitHub賬戶后，OpenAI研究員Thibault Sottiaux選擇了一個開源倉庫preparedness repo。

然后，他收到了三個任務(wù)：

第一個是提問：讓代碼智能體Codex解釋代碼庫，說明整體結(jié)構(gòu)
第二個是代碼任務(wù)：要求在代碼庫中查找并修復(fù)某個地方bug
第三個任務(wù)是提問：遍歷代碼庫，主動提出自己可以執(zhí)行的任務(wù)建議

接下來演示中，Thibault向Codex下達多個任務(wù)，比如拼寫和語法糾錯、智能任務(wù)委派、多倉庫適配。

在糾錯方面，他故意在指令中加入拼寫錯誤，Codex不僅理解了意圖，還主動找出了代碼庫中的拼寫和語法問題并修復(fù)，細致到令人驚嘆。

當Thibault提出希望代碼庫「易維護、無bug」的目標時，Codex遍歷代碼庫后，主動發(fā)現(xiàn)了可變默認值、不一致的超時設(shè)置等問題，并自行生成了修復(fù)任務(wù)。

這種「自我委派」能力，堪稱智能體的巔峰表現(xiàn)。

值得注意的是，Codex智能體運行在OpenAI計算基礎(chǔ)設(shè)施上，與強化學習共享同一套久經(jīng)考驗的系統(tǒng)。

每個任務(wù)都在獨立的虛擬沙盒中運行，配備專屬的文件系統(tǒng)、CPU、內(nèi)存、和網(wǎng)絡(luò)策略，確保了高效安全。

除了preparedness倉庫，Codex還無縫處理了CodeX CLI庫，展現(xiàn)其在不同項目中的泛化能力。

不論是開源項目，還是內(nèi)部代碼庫，Codex都游刃有余。

Codex接收到了用戶反饋的bug，因為特殊字符文件名導致了diff命令報錯。

在解決過程中，它不僅能復(fù)現(xiàn)問題，還可以編寫測試腳本、運行l(wèi)inter檢查，并生成PR，整個過程僅需幾分鐘。

Thibault直言，「這原本可能花費我30分鐘，甚至幾個小時完成」。

此外，OpenAI研究員Katy Shi演示中強調(diào)，Codex的PR包含了詳細的摘要，清晰說明了修改內(nèi)容和引用的代碼，測試結(jié)果一目了然。

一番演示下來，Greg表示，Codex讓自己深刻感受到了AGI！

對齊人類偏好

實戰(zhàn)4個開源庫

OpenAI訓練codex-1的一個主要目標，是確保其輸出能高度符合人類的編碼偏好與標準。

與OpenAI o3相比，codex-1能穩(wěn)定生成更為簡潔的代碼修改補丁，可以直接供人工審查并集成到標準工作流程中。

為了體現(xiàn)Codex生成代碼的簡潔和高效，OpenAI提供了Codex和o3對比的4個開源庫實戰(zhàn)實例：

astropy

astropy是一個用于天文學的Python開源庫。

第一個問題是astropy/astropy的倉庫中，Modeling模塊中的separability_matrix無法正確計算嵌套CompoundModels的可分離性。

可以看到，在修改前后的代碼版本對比中，使用Codex修改生成了十分簡潔的代碼。

相比之下，o3修改的代碼就顯得有些冗長了，甚至還將一些「不必要」的注釋加入了源代碼中。

matplotlib

Matplotlib是一個用于創(chuàng)建靜態(tài)、動畫和交互式可視化的Python綜合性庫。

這次問題是修復(fù)Bug：在mlab._spectral_helper中的窗口校正（windows correction）不正確。

同樣可以看到，Codex修改代碼的過程更為簡潔。

django

Django是基于Python的Web框架，這個問題是修復(fù)僅包含duration（時長）的表達式在SQLite和MySQL上無法正常工作。

Codex的修復(fù)過程依然優(yōu)雅，并且相比o3，還首先補上了缺少的依賴調(diào)用。

expensify

expensify是一個圍繞聊天的財務(wù)協(xié)作的開源軟件。

OpenAI給出的問題是「dd [HOLD for payment 2024-10-14] [$250] LHN - 刪除緩存后，成員聊天室名稱在LHN中未更新」。

同樣可以看到Codex的問題定位和修改更為精準和有效，o3甚至進行了一次無效的代碼的修改。

OpenAI團隊已經(jīng)用上了

OpenAI的技術(shù)團隊已經(jīng)開始將Codex作為他們?nèi)粘９ぞ甙囊徊糠帧?/p>

OpenAI的工程師最常使用Codex來執(zhí)行重復(fù)且范圍明確的任務(wù)，如重構(gòu)、重命名和編寫測試，這些任務(wù)會打斷他們的專注。

它同樣適用于搭建新功能、連接組件、修復(fù)錯誤和起草文檔。

團隊正在圍繞Codex建立新的習慣：處理值班問題、在一天開始時規(guī)劃任務(wù)，以及執(zhí)行后臺工作以保持進度。

通過減少上下文切換和提醒被遺忘的待辦事項，Codex幫助工程師更快地交付并專注于最重要的事情。

在正式發(fā)布前，OpenAI與少數(shù)外部測試者合作，評估Codex在不同代碼庫、開發(fā)流程與團隊環(huán)境中的實際表現(xiàn)：

Cisco作為早期設(shè)計合作伙伴，探索Codex在加速工程團隊構(gòu)思落地方面的潛力，并通過評估真實用例向OpenAI提供反饋，助力模型優(yōu)化。
Temporal借助Codex實現(xiàn)功能開發(fā)、問題調(diào)試、測試編寫與執(zhí)行的加速，并用于重構(gòu)大型代碼庫。Codex還能在后臺處理復(fù)雜任務(wù)，幫助工程師保持專注與高效迭代。
Superhuman利用Codex自動處理小型重復(fù)任務(wù)，如提高測試覆蓋率和修復(fù)集成故障；還使產(chǎn)品經(jīng)理能夠無需工程介入（除代碼審查外）完成輕量級代碼更改，提升配對效率。
Kodiak在Codex支持下加速調(diào)試工具開發(fā)、測試覆蓋和代碼重構(gòu)，推進其自動駕駛系統(tǒng)Kodiak Driver的研發(fā)。Codex也作為參考工具，幫助工程師理解陌生代碼棧，提供相關(guān)上下文與歷史更改。

根據(jù)目前的使用經(jīng)驗來看，OpenAI建議：可同時向多個代理分配邊界清晰的任務(wù)，并嘗試多種任務(wù)類型與提示方式，以更全面地發(fā)掘模型能力。

模型系統(tǒng)消息

通過以下系統(tǒng)消息，開發(fā)者可以了解codex-1的默認行為，并針對自己的工作流進行調(diào)整。

例如，系統(tǒng)消息會引導Codex運行AGENTS.md文件中提到的所有測試，但如果時間緊張，就可以要求Codex跳過這些測試。

# Instructions
- The user will provide a task.
- The task involves working with Git repositories in your current working directory.
- Wait for all terminal commands to be completed (or terminate them) before finishing.

# Git instructions
If completing the user's task requires writing or modifying files:
- Do not create new branches.
- Use git to commit your changes.
- If pre-commit fails, fix issues and retry.
- Check git status to confirm your commit. You must leave your worktree in a clean state.
- Only committed code will be evaluated.
- Do not modify or amend existing commits.

# AGENTS.md spec
- Containers often contain AGENTS.md files. These files can appear anywhere in the container's filesystem. Typical locations include `/`, `~`, and in various places inside of Git repos.
- These files are a way for humans to give you (the agent) instructions or tips for working within the container.
- Some examples might be: coding conventions, info about how code is organized, or instructions for how to run or test code.
- AGENTS.md files may provide instructions about PR messages (messages attached to a GitHub Pull Request produced by the agent, describing the PR). These instructions should be respected.
- Instructions in AGENTS.md files:
  - The scope of an AGENTS.md file is the entire directory tree rooted at the folder that contains it.
  - For every file you touch in the final patch, you must obey instructions in any AGENTS.md file whose scope includes that file.
  - Instructions about code style, structure, naming, etc. apply only to code within the AGENTS.md file's scope, unless the file states otherwise.
  - More-deeply-nested AGENTS.md files take precedence in the case of conflicting instructions.
  - Direct system/developer/user instructions (as part of a prompt) take precedence over AGENTS.md instructions.
- AGENTS.md files need not live only in Git repos. For example, you may find one in your home directory.
- If the AGENTS.md includes programmatic checks to verify your work, you MUST run all of them and make a best effort to validate that the checks pass AFTER all code changes have been made.
  - This applies even for changes that appear simple, i.e. documentation. You still must run all of the programmatic checks.

# Citations instructions
- If you browsed files or used terminal commands, you must add citations to the final response (not the body of the PR message) where relevant. Citations reference file paths and terminal outputs with the following formats:
  1) `【F: ?L (-L )?】`
  - File path citations must start with `F:`. `file_path` is the exact file path of the file relative to the root of the repository that contains the relevant text.
  -`line_start` is the 1-indexed start line number of the relevant output within that file.
  2) `【 ?L (-L )?】`
  - Where `chunk_id` is the chunk_id of the terminal output, `line_start` and `line_end` are the 1-indexed start and end line numbers of the relevant output within that chunk.
- Line ends are optional, and if not provided, line end is the same as line start, so only 1 line is cited.
- Ensure that the line numbers are correct, and that the cited file paths or terminal outputs are directly relevant to the word or clause before the citation.
- Do not cite completely empty lines inside the chunk, only cite lines that have content.
- Only cite from file paths and terminal outputs, DO NOT cite from previous pr diffs and comments, nor cite git hashes as chunk ids.
- Use file path citations that reference any code changes, documentation or files, and use terminal citations only for relevant terminal output.
- Prefer file citations over terminal citations unless the terminal output is directly relevant to the clauses before the citation, i.e. clauses on test results.
  - For PR creation tasks, use file citations when referring to code changes in the summary section of your final response, and terminal citations in the testing section.
  - For question-answering tasks, you should only use terminal citations if you need to programmatically verify an answer (i.e. counting lines of code). Otherwise, use file citations.

Codex CLI更新

上個月，OpenAI推出了一款輕量級開源工具——CodexCLI，可以讓o3和o4-mini等強大模型直接運行在本地終端中，幫助開發(fā)者更快完成任務(wù)。

這一次，OpenAI同時發(fā)布了專為Codex CLI優(yōu)化的小模型版本——codex-1的o4-mini版本。

它具備低延遲、強指令理解力和代碼編輯能力，現(xiàn)已成為Codex CLI的默認模型，同時也可通過API使用（名稱為codex-mini-latest），并將持續(xù)迭代更新。

此外，Codex CLI的登錄方式也簡化了，開發(fā)者現(xiàn)在可以直接用ChatGPT賬戶登錄，選擇API組織，系統(tǒng)將自動生成并配置API密鑰。

為了鼓勵使用，從今天起30天內(nèi)，使用ChatGPT賬戶登錄Codex CLI的用戶將獲得免費額度：Plus用戶獲得5美元API使用額度；Pro用戶獲得50美元。

Codex貴不貴

在接下來的幾周內(nèi)，所有用戶可以「量大管飽」的試用Codex功能。

隨后，OpenAI將引入限流機制和靈活定價，支持按需購買額外使用量。

對于開發(fā)者，codex-mini-latest模型已在Responses API上提供，價格為：

每百萬輸入Token：$1.50
每百萬輸出Token：$6.00
并享有75%的提示緩存折扣

Codex當前仍處于研究預(yù)覽階段，尚不支持圖像輸入等前端能力，也暫不具備在任務(wù)執(zhí)行中進行實時糾正的能力。

此外，委派任務(wù)給Codex智能體的響應(yīng)時間較長，用戶可能需要適應(yīng)這類異步協(xié)作的工作方式。

隨著模型能力不斷提升，Codex將能處理更復(fù)雜、更持久的開發(fā)任務(wù)，逐步成為更像「遠程開發(fā)伙伴」的存在。

下一步是什么

OpenAI的目標是開發(fā)者專注自己擅長的工作，其余任務(wù)交由AI代理處理，從而提升效率與生產(chǎn)力。

Codex將支持實時協(xié)作與異步任務(wù)委托，兩種工作模式將逐步融合。

Codex CLI等工具已經(jīng)成為開發(fā)者加速編碼的標配，而由ChatGPT中的Codex引領(lǐng)的異步、多智能體協(xié)作流程，有望成為工程師高效產(chǎn)出高質(zhì)量代碼的新范式。

未來，開發(fā)者將能在IDE和日常工具中與AI協(xié)同工作——提問、獲取建議、委派復(fù)雜任務(wù)，所有操作整合在一個統(tǒng)一的工作流程中。

OpenAI計劃進一步提升交互性和靈活性：

支持任務(wù)中途提供指導
與AI協(xié)作實施策略
接收主動進度更新
與常用工具（如GitHub、CLI、問題跟蹤器、CI系統(tǒng)）深度集成，便捷分配任務(wù)

軟件工程正成為首批因AI而大幅提效的行業(yè)之一，將全面釋放個人與小團隊的巨大潛力。

與此同時，OpenAI也正與合作伙伴共同研究智能體的廣泛應(yīng)用將如何影響開發(fā)流程、技能發(fā)展和全球人才分布。

參考資料：

https://www.youtube.com/watch?v=hhdpnbfH6NU

https://openai.com/index/introducing-codex/

為偉大思想而生！

AI+時代，互聯(lián)網(wǎng)思想（wanging0123)，

第一必讀自媒體

特別聲明：以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺“網(wǎng)易號”用戶上傳并發(fā)布，本平臺僅提供信息存儲服務(wù)。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.