在Colab部署大模型

简介

介绍如何通过 Ollama 在 Colab 上部署大模型

步骤

新建 notebook

File->new notebook in Drive
修改运行时类型

Connect-> Change runtime type
- 运行时类型选择 Python3
- 硬件加速器选择 T4GPU
连接->连接到托管的运行时：T4

在新的 notebook 中添加两段 Code，点击最侧的执行按钮

第一段

1 2	# 安装Ollama !curl https://ollama.ai/install.sh \| sh

第二段

# 安装python库和导入库
!pip install aiohttp pyngrok

import os
import asyncio

# 设置 GPU 库，让Ollma能够使用GPU进行推理
os.environ.update({'LD_LIBRARY_PATH':'/usr/lib64-nvidia'})

async def run_process(cmd):
  print('>>> starting', *cmd)
  p = await asyncio.subprocess.create_subprocess_exec(
      *cmd,
      stdout=asyncio.subprocess.PIPE,
      stderr=asyncio.subprocess.PIPE,
  )

  async def pipe(lines):
    async for line in lines:
      print(line.strip().decode('utf-8'))

  await asyncio.gather(
      pipe(p.stdout),
      pipe(p.stderr),
  )

#注册一个 ngrok 账户，获取Authtoken，替换下面的 ngrok-authtoken
await asyncio.gather(
    run_process(['ngrok', 'config', 'add-authtoken','<ngrok-authtoken>'])
)

await asyncio.gather(
    run_process(['ollama', 'serve']),
    run_process(['ngrok', 'http', '11434', '--host-header', 'localhost:11434'])
)

看到 starting ngrok http 11434 –host-header localhost:11434 等日志，表示部署成功
在 ngrok 官网 endpoints 查看公网URL，点击 URL->view site，显示 Ollama is running，表示代理成功
在本地修改 OLLAMA_HOST 环境变量为刚才的公网URL：https://4xxb-3x-1xx-1x1-1x4.ngrok-free.app

此时在本地客户端输入 ollama 的命令会在 colab 上执行
1
2
# 下载并启动大模型
ollama run qwen2:1.5b