基于VGG16神经网络实现图像艺术风格转换

基本原理

通过vgg16或其他神经网络提取图像特征,并使用格拉姆矩阵(Gram matrix)进行图像风格的迁移。

VGG16

不必多说,2014年ImageNet图像分类竞赛亚军,定位竞赛冠军;VGG网络采用连续的小卷积核(3x3)和池化层构建深度神经网络,网络深度可以达到16层或19层,其中VGG16和VGG19最为著名。VGG16和VGG19网络架构非常相似,都由多个卷积层和池化层交替堆叠而成,最后使用全连接层进行分类。两者的区别在于网络的深度和参数量,VGG19相对于VGG16增加了3个卷积层和一个全连接层,参数量也更多。

可在keras直接使用vgg16/19源码,自动下载相关预训练模型

1
2
from keras.applications.vgg16 import VGG16
from keras.applications.vgg19 import VGG19

这里结合transform,在torch中构建神经网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import torch
from collections import namedtuple
from torchvision import models
import torch.nn as nn
import torch.nn.functional as F


# VGG16神经网络定义
class VGG16(torch.nn.Module):
"""Vgg16 Net"""
def __init__(self, requires_grad=False):
super(VGG16, self).__init__()
vgg_pretrained_features = models.vgg16(pretrained=True).features
self.slice1 = torch.nn.Sequential()
self.slice2 = torch.nn.Sequential()
self.slice3 = torch.nn.Sequential()
self.slice4 = torch.nn.Sequential()

for x in range(4):
self.slice1.add_module(str(x), vgg_pretrained_features[x])

for x in range(4, 9):
self.slice2.add_module(str(x), vgg_pretrained_features[x])

for x in range(9, 16):
self.slice3.add_module(str(x), vgg_pretrained_features[x])

for x in range(16, 23):
self.slice4.add_module(str(x), vgg_pretrained_features[x])

if not requires_grad:
for param in self.parameters():
param.requires_grad = False

def forward(self, X):
h = self.slice1(X)
h_relu1_2 = h
h = self.slice2(h)
h_relu2_2 = h
h = self.slice3(h)
h_relu3_3 = h
h = self.slice4(h)
h_relu4_3 = h

vgg_outputs = namedtuple("VggOutputs", ["relu1_2", "relu2_2", "relu3_3", "relu4_3"])
output = vgg_outputs(h_relu1_2, h_relu2_2, h_relu3_3, h_relu4_3)

return output


class TransformerNet(torch.nn.Module):
def __init__(self):
super(TransformerNet, self).__init__()
self.model = nn.Sequential(
ConvBlock(3, 32, kernel_size=9, stride=1),
ConvBlock(32, 64, kernel_size=3, stride=2),
ConvBlock(64, 128, kernel_size=3, stride=2),
ResidualBlock(128),
ResidualBlock(128),
ResidualBlock(128),
ResidualBlock(128),
ResidualBlock(128),
ConvBlock(128, 64, kernel_size=3, upsample=True),
ConvBlock(64, 32, kernel_size=3, upsample=True),
ConvBlock(32, 3, kernel_size=9, stride=1, normalize=False, relu=False),
)

def forward(self, x):
return self.model(x)


class ResidualBlock(torch.nn.Module):
def __init__(self, channels):
super(ResidualBlock, self).__init__()
self.block = nn.Sequential(
ConvBlock(channels, channels, kernel_size=3, stride=1, normalize=True, relu=True),
ConvBlock(channels, channels, kernel_size=3, stride=1, normalize=True, relu=False),
)

def forward(self, x):
return self.block(x) + x


class ConvBlock(torch.nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, upsample=False, normalize=True, relu=True):
super(ConvBlock, self).__init__()
self.upsample = upsample
self.block = nn.Sequential(
nn.ReflectionPad2d(kernel_size // 2),
nn.Conv2d(in_channels, out_channels, kernel_size, (stride,))
)
self.norm = nn.InstanceNorm2d(out_channels, affine=True) if normalize else None
self.relu = relu

def forward(self, x):
if self.upsample:
x = F.interpolate(x, scale_factor=2)
x = self.block(x)
if self.norm is not None:
x = self.norm(x)
if self.relu:
x = F.relu(x)
return x


"""
测试模型
"""
if __name__ == '__main__':
input1 = torch.rand([224, 3, 224, 224])
model_x = VGG16()
print(model_x)

格拉姆矩阵

格拉姆矩阵(Gram matrix)即n维欧式空间中任意k个向量之间两两的内积所组成的矩阵,是一个对称矩阵。

更直观的理解:

输入图像的feature map为[ ch, h, w]。我们经过flatten(即是将hw进行平铺成一维向量)和矩阵转置操作,可以变形为[ ch, hw]和[ h*w, ch]的矩阵。再对两个作内积得到格拉姆矩阵。

使用格拉姆矩阵进行风格迁移:

1.准备目标图像和目标风格图像;

2.使用深层网络加白噪声提取目标图像和风格目标的特征向量。对两个图像的特征向量计算格拉姆矩阵,以矩阵差异最小化为优化目标,不断调整目标图像,使风格不断相似。

torch中格拉姆矩阵代码:

1
2
3
4
5
6
def gram_matrix(y):
(b, c, h, w) = y.size()
features = y.view(b, c, w * h)
features_t = features.transpose(1, 2)
gram = features.bmm(features_t) / (c * h * w)
return gram

开始训练

准备训练文件和风格图片,例如随机图像*20和梵高名作星月夜

utils.py工具

配置训练参数:

1
2
3
4
5
6
7
8
9
10
11
12
13
parser = argparse.ArgumentParser(description="Parser 4 Training")
parser.add_argument("--style", type=str, default="images/styles/the_starry_night.jpg", help="Path 2 style image")
parser.add_argument("--dataset", type=str, help="path 2 training dataset")
parser.add_argument("--epochs", type=int, default=1, help="Number of training epochs")
parser.add_argument("--batch_size", type=int, default=4, help="Batch size 4 training")
parser.add_argument("--image_size", type=int, default=256, help="Size of training images")
parser.add_argument("--style_size", type=int, help="Size of style image")
parser.add_argument("--lr", type=float, default=1e-3, help="Learning rate")
parser.add_argument("--lambda_img", type=float, default=1e5, help="Weight 4 image loss")
parser.add_argument("--lambda_style", type=float, default=1e10, help="Weight 4 style loss")
parser.add_argument("--model_path", type=str, help="Optional path 2 checkpoint model")
parser.add_argument("--model_checkpoint", type=int, default=1000, help="Batches 4 saving model")
parser.add_argument("--result_checkpoint", type=int, default=1000, help="Batches 4 saving image result")

使用神经网络进行风格训练

1
2
3
4
5
6
7
8
9
10
def train_transform(image_size):
transform = transforms.Compose(
[
transforms.Resize(int(image_size * 1.15)),
transforms.RandomCrop(image_size),
transforms.ToTensor(),
transforms.Normalize(mean, std),
]
)
return transform

使用神经网络进行风格转换

1
2
3
4
def style_transform(image_size=None):
resize = [transforms.Resize(image_size)] if image_size else []
transform = transforms.Compose(resize + [transforms.ToTensor(), transforms.Normalize(mean, std)])
return transform

使用均值和标准对图像张量进行反规范化

1
2
3
4
def denormalize(tensors):
for c in range(3):
tensors[:, c].mul_(std[c]).add_(mean[c])
return tensors

train.py训练脚本

训练配置

1
2
3
4
5
6
7
8
train_args = TrainArgs()
args = train_args.initialize().parse_args()

args.dataset = './dataset'
args.style = './images/styles/the_starry_night.jpg'
args.epochs = 2400 # epochs*(数据集/batch_size)是1000的公倍数
args.batch_size = 4
args.image_size = 256

训练流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
style_name = args.style.split("/")[-1].split(".")[0]
os.makedirs(f"images/train/{style_name}_training", exist_ok=True)
os.makedirs(f"checkpoints", exist_ok=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_dataset = datasets.ImageFolder(args.dataset, train_transform(args.image_size))
dataloader = DataLoader(train_dataset, batch_size=args.batch_size)
transformer = TransformerNet().to(device)
vgg = VGG16(requires_grad=False).to(device)
if args.model_path:
transformer.load_state_dict(torch.load(args.model_path))
optimizer = Adam(transformer.parameters(), args.lr)
l2_loss = torch.nn.MSELoss().to(device)
style = style_transform(args.style_size)(Image.open(args.style))
style = style.repeat(args.batch_size, 1, 1, 1).to(device)
features_style = vgg(style)
gram_style = [gram_matrix(y) for y in features_style]
image_samples = []
for path in random.sample(glob.glob(f"{args.dataset}/*/*"), len(train_dataset)):
image_samples += [style_transform(args.image_size)(Image.open(path).resize((224, 224)))]
image_samples = torch.stack(image_samples)

启动训练

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def save_result(sample):
transformer.eval()
with torch.no_grad():
output = transformer(image_samples.to(device))
image_rgb = denormalize(torch.cat((image_samples.cpu(), output.cpu()), 2))
save_image(image_rgb, f"images/train/{style_name}_training/{sample}.jpg", nrow=4)
transformer.train()


def save_model(sample):
torch.save(transformer.state_dict(), f"checkpoints/{style_name}_{sample}.pth")


for epoch in range(args.epochs):
for line in range(len(dataloader)):
batch_i = line
batches_done = epoch * len(dataloader) + batch_i + 1
images = list(dataloader)[line][0]
optimizer.zero_grad()

images_original = images.to(device)
images_transformed = transformer(images_original)

features_original = vgg(images_original)
features_transformed = vgg(images_transformed)

img_loss = args.lambda_img * l2_loss(features_transformed.relu2_2, features_original.relu2_2)

style_loss = 0
for ft_y, gm_s in zip(features_transformed, gram_style):
gm_y = gram_matrix(ft_y)
style_loss += l2_loss(gm_y, gm_s[: images.size(0), :, :])
style_loss *= args.lambda_style

total_loss = img_loss + style_loss
total_loss.backward()
optimizer.step()
if batches_done % args.result_checkpoint == 0:
save_result(batches_done)
if args.model_checkpoint > 0 and batches_done % args.model_checkpoint == 0:
save_model(batches_done)

第1000次迭代

第12000次迭代(2400epoch * (20/batch_size)),效果明显

到这一步,训练结束,可以预测结果

预测:

配置预测参数

1
2
3
4
predict_args = PredictArgs()
args = predict_args.initialize().parse_args()
args.image_path = './images/input/001.jpg'
args.model_path = './checkpoints/the_starry_night_12000.pth'

预测代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
os.makedirs("images/output", exist_ok=True)
device = torch.device('cpu')#("cuda" if torch.cuda.is_available() else "cpu")
transform = style_transform()
transformer = TransformerNet().to(device)
transformer.load_state_dict(torch.load(mod_path))
transformer.eval()
image_tensor = Variable(transform(Image.open(img_path))).to(device)
image_tensor = image_tensor.unsqueeze(0)

with torch.no_grad():
output_image = denormalize(transformer(image_tensor)).cpu()

name = img_path.split("/")[-1]
save_image(output_image, f"images/output/output_{name}")

思路·参考

https://github.com/elleryqueenhomels/fast_neural_style_transfer/tree/master

https://github.com/AaronJny/DeepLearningExamples/tree/master/tf2-neural-style-transfer

https://github.com/Huage001/PaintTransformer

https://github.com/eriklindernoren/Fast-Neural-Style-Transfer/tree/master

https://github.com/NeverGiveU/PaintTransformer-Pytorch-master

https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

paddleDetection Demo

PPHuman

行人属性识别

行人属性

cfg:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
crop_thresh: 0.5
attr_thresh: 0.5
kpt_thresh: 0.2
visual: True
warmup_frame: 50

DET:
model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
batch_size: 1

MOT:
model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
tracker_config: /exp/work/video/PaddleDetection/deploy/pipeline/config/tracker_config.yml
batch_size: 1
skip_frame_num: -1 # preferably no more than 3
enable: True

KPT:
model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip
batch_size: 8

ATTR:
model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip
batch_size: 8
enable: True

cli:

1
python deploy/pipeline/pipeline.py --config deploy/pipeline/config/cache/cfg_human.yml --device=gpu --video_file=demo_input/human.mp4 --output_dir=demo_output/

Read More

LiveGBS国标GB/T28181视频流媒体平台

软件包下载

LiveGBS GB28181流媒体服务下载地址:https://www.liveqing.com/docs/download/LiveGBS.html#%E7%89%88%E6%9C%AC%E4%B8%8B%E8%BD%BD

选择windows版本的LiveGBS 信令服务LiveGBS流媒体服务,免费版授权周期为26天,届时需要手动更新软件服务

安装LiveGBS GB28281

解压下载好的软件包,分别启动LiveCMS.exeLiveSMS.exe,如果有默认端口被占用的情况可以修改对应的livecms.inilivesms.ini配置文件,这里我将LiveGBS的默认端口从10000修改为10005

成功启动后后台出现livecms和livesms的图标

Read More

paddleDetection-视频OCR

PPOCR_V4

安装百度最新ppocr_v4库,使用虚拟环境为py39_vio,本虚拟环境不可与人脸识别(py38_arcface)兼容(opencv版本不兼容)

1
pip install paddleocr --user -i https://mirror.baidu.com/pypi/simple

代码

cfg_utils.py新增cfg--ocr,设置True为开启,默认False

1
2
3
4
5
parser.add_argument(
"--ocr",
type=bool,
default=False,
help="use paddlepaddle-ocr")

pipeline.py

1
from python.visualize import visualize_box_mask, visualize_attr, visualize_pose, visualize_action, visualize_vehicleplate, visualize_vehiclepress, visualize_lane, visualize_vehicle_retrograde, visualize_ocr
1
2
3
class PipePredictor(object):  
def __init__(self, args, cfg, is_video=True, multi_camera=False):
self.ocr = args.ocr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def visualize_video(self,
image_rgb,
result,
collector,
frame_id,
fps,
entrance=None,
records=None,
center_traj=None,
do_illegal_parking_recognition=False,
illegal_parking_dict=None):
image = cv2.cvtColor(image_rgb, cv2.COLOR_RGB2BGR)
mot_res = copy.deepcopy(result.get('mot'))

if self.ocr:
lock.acquire() # 加锁,paddleOCR是线程不安全的
ocr_result = ocr.ocr(image, cls=True)[0]
lock.release()
ocr_boxes = [line[0] for line in ocr_result]
ocr_txts = [line[1][0] for line in ocr_result]
ocr_scores = [line[1][1] for line in ocr_result]

image = visualize_ocr(image, ocr_boxes, ocr_txts, ocr_scores)

visualize.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def visualize_ocr(im, boxes, texts, score):
if isinstance(im, str):
im = Image.open(im)
im = np.ascontiguousarray(np.copy(im))
im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
else:
im = np.ascontiguousarray(np.copy(im))

# 创建透明图层,为图像添加文字水印
im = Image.fromarray(im)
im = im.convert('RGBA')
im_canvas = Image.new('RGBA', im.size, (255, 255, 255, 0))

for i, res in enumerate(texts):
if boxes is not None:
box = boxes[i]
text = res
if text == "":
continue

text_scale = max(1.0, int(box[2][1] - box[1][1]))

draw = ImageDraw.Draw(im_canvas)
draw.text(
(box[0][0], box[0][1]),
text,
font=ImageFont.truetype(font_file, size=int(text_scale)),
fill=(255, 255, 0, 85)) # 第四位是透明度
try:
draw.rectangle(
((box[0][0], box[0][1]), (box[2][0], box[2][1])),
fill=None,
outline=(255, 255, 0),
width=1)
except ValueError:
pass

# 复合图层
im = Image.alpha_composite(im, im_canvas)
im = im.convert('RGB')
# 还原连续存储数组
im = np.ascontiguousarray(np.copy(im))
return im

Read More

paddleDetection:OpenCV检测框转中文

注:OpenCV不能直接显示中文,通过PIL转换会损失一部分算力性能

修改源码(可视化)

./deploy/python/visualize.py

增加导入字体库和字体文件

1
2
3
from PIL import Image, ImageDraw, ImageFile, ImageFont

font_file = '/exp/work/video/PaddleDetection/deploy/pipeline/SourceHanSansCN-Medium.otf'

visualize_attr

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def visualize_attr(im, results, boxes=None, is_mtmct=False):
if isinstance(im, str):
im = Image.open(im)
im = np.ascontiguousarray(np.copy(im))
im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
else:
im = np.ascontiguousarray(np.copy(im))

line_inter = im.shape[0] / 40.
text_scale = max(0.5, im.shape[0] / 100.)
# 将nparray图像转PIL图像
im = Image.fromarray(im)

for i, res in enumerate(results):
print(i, res)
if boxes is None:
text_w = 3
text_h = 1
elif is_mtmct:
box = boxes[i] # multi camera, bbox shape is x,y, w,h
text_w = int(box[0]) + 3
text_h = int(box[1])
else:
box = boxes[i] # single camera, bbox shape is 0, 0, x,y, w,h
text_w = int(box[2]) + 3
text_h = int(box[3])
for text in res:
text_h += int(line_inter)
text_loc = (text_w, text_h)
# 写入
draw = ImageDraw.Draw(im)
draw.text(
text_loc,
text,
font=ImageFont.truetype(font_file, size=int(text_scale)), # 字体位置
fill=(0, 255, 255))
# 还原连续存储数组
im = np.ascontiguousarray(np.copy(im))
return im

Read More

arcface_paddle

环境

GPU(物理)

  • NVIDIA 3090*2

  • 显卡驱动 515.43.04

  • CUDA版本 11.7

  • CUDAtoolkit (cuda_11.7.0_515.43.04_linux)

  • cuDNN (v8.4.1)

  • paddlepaddle 多卡训练需要NCLL支持 (ncll v2.12.12 cuda11.7)

paddlepaddle版本

  • paddlepaddle-gpu==2.2.0rc0(虚拟环境cuda11.2)

python环境

  • CentOS7.9

  • anaconda3

  • python3.8

anaconda安装insightface

重要:pillow版本建议选择9.5 否则过高会导致安装insightface报错 (错误原因:pillow10移除了getsize方法,需要修改对应位置源码为getbboxgetlength

告警信息:

1
2
tools/test_recognition.py:627: DeprecationWarning: getsize is deprecated and will be removed in Pillow 10 (2023-07-01). Use getbbox or getlength instead.
tw = font.getsize(text)[0]

环境安装

1
2
3
4
5
6
7
8
# .
conda install paddlepaddle-gpu==2.2.0rc0 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
# insightface
pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
# insightface/recongition/arcface_paddle/
pip install -r requirement.txt -i https://mirror.baidu.com/pypi/simple
# insightface-paddle
pip install insightface-paddle -i https://mirror.baidu.com/pypi/simple

Read More

paddleDetection前置

环境

GPU

  • NVIDIA 3090*2

  • 显卡驱动 515.43.04

  • CUDA版本 11.7

  • CUDAtoolkit (cuda_11.7.0_515.43.04_linux)

  • cuDNN (v8.4.1)

  • paddlepaddle 多卡训练需要NCLL支持 (ncll v2.12.12 cuda11.7)

paddlepaddle版本

  • v2.4.2

paddleDetection版本

  • v2.6.0

python环境

  • CentOS7.9

  • anaconda3

  • python3.8

普通视频处理

h264格式视频被opencv解析帧率超过65535报错

源码:

./deploy/pipeline/pipeline.py predict_video

1
2
3
out_path = os.path.join(self.output_dir, video_out_name + ".mp4")
fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))

输入:cv2.VideoCapture()

输出:cv2.VideoWriter()

本GPU模式下对时长5分钟的行人检测视频处理时间约20分钟(高精度模型),视频体积增大13倍(100M->1.3G)

Read More

搭建nebula-CentOS集群(极速版)

环境准备

  • 系统

CentOS7.9

  • 机器

三台,分别为nebula01nebula02nebula03

  • 安装位置

/usr/local

  • nebula版本

2.6.1

  • nebula-graph-studio版本

3.2.3

快速开始

  • 下载tar.gz文件
1
2
3
cd /usr/local
wget https://oss-cdn.nebula-graph.com.cn/package/2.6.1/nebula-graph-2.6.1.el7.x86_64.tar.gz
wget https://oss-cdn.nebula-graph.com.cn/nebula-graph-studio/3.2.3/nebula-graph-studio-3.2.3.x86_64.tar.gz
  • 解压缩并重命名文件
1
2
tar -zxvf nebula-graph-2.6.1.el7.x86_64.tar.gz && mv nebula-graph-2.6.1.el7.x86_64 nebula
tar -zxvf nebula-graph-studio-3.2.3.x86_64.tar.gz && mv nebula-graph-studio-3.2.3.x86_64 nebula-graph-studio
  • 复制配置文件
1
2
3
4
cd nebula/etc
cp nebula-graphd.conf.default nebula-graphd.conf
cp nebula-metad.conf.default nebula-metad.conf
cp nebula-storaged.conf.default nebula-storaged.conf

Read More

Nebula-Spark和图算法

Nebula Spark Connector

下载地址&官方文档:【https://github.com/vesoft-inc/nebula-spark-connector

环境

· nebula:2.6.1
· hadoop:2.7
· spark:2.4.7
· pyspark:2.4.7
· python:3.7.16
· nebula-spark-connector:2.6.1

编译打包nebula-spark-connector

1
2
$ cd nebula-spark-connector-2.6.1/nebula-spark-connector
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true

成功后在nebula-spark-connector/target/ 目录下得到 nebula-spark-connector-2.6.1.jar文件

1
2
3
4
5
6
7
8
9
10
11
12
(base) [root@root target]# ll
total 106792
drwxr-xr-x 3 root root 17 Mar 11 14:14 classes
-rw-r--r-- 1 root root 1 Mar 11 14:14 classes.-497386701.timestamp
-rw-r--r-- 1 root root 1 Mar 11 14:14 classes.timestamp
-rw-r--r-- 1 root root 30701 Mar 11 14:15 jacoco.exec
drwxr-xr-x 2 root root 28 Mar 11 14:15 maven-archiver
-rw-r--r-- 1 root root 108375457 Mar 11 14:16 nebula-spark-connector-2.6.1.jar
-rw-r--r-- 1 root root 583482 Mar 11 14:16 nebula-spark-connector-2.6.1-javadoc.jar
-rw-r--r-- 1 root root 36358 Mar 11 14:16 nebula-spark-connector-2.6.1-sources.jar
-rw-r--r-- 1 root root 315392 Mar 11 14:15 original-nebula-spark-connector-2.6.1.jar
drwxr-xr-x 4 root root 37 Mar 11 14:15 site

PySpark 读取 NebulaGraph 数据

metaAddress"metad0:9559" 的 Nebula Graph 中读取整个 tag 下的数据为一个 dataframe:

1
2
3
4
5
6
7
8
df = spark.read.format(
"com.vesoft.nebula.connector.NebulaDataSource").option(
"type", "vertex").option(
"spaceName", "basketballplayer").option(
"label", "player").option(
"returnCols", "name,age").option(
"metaAddress", "metad0:9559").option(
"partitionNumber", 1).load()

然后可以像这样 show 这个 dataframe:

1
2
3
4
5
6
7
8
>>> df.show(n=2)
+---------+--------------+---+
|_vertexId| name|age|
+---------+--------------+---+
|player105| Danny Green| 31|
|player109|Tiago Splitter| 34|
+---------+--------------+---+
only showing top 2 rows

Read More

ElasticSearch环境搭建

* 配合nebula全文检索测试环境而搭建的es单机环境

安装

下载地址

https://www.elastic.co/cn/downloads/elasticsearch#ga-release】 (或前往elastic中文社区下载中心【https://elasticsearch.cn/download/】)

  • 选择linux版本

安装ES

  • 解压缩
1
$ tar xf elasticsearch-7.14.2-linux-x86_64.tar.gz
  • 创建es用户
1
$ useradd es && passwd es
  • 更名
1
$ mv elasticsearch-7.14.2 elasticsearch
  • 赋予es用户权限
1
$ chown -R es:es elasticsearch

配置

* 可使用es自带的java环境:ES_JAVA_HOME
1
$ vim /etc/profile
1
2
3
# ES_JAVA_HOME
export ES_JAVA_HOME=/data/elasticsearch/jdk/
export PATH=$ES_JAVA_HOME/bin:$PATH
1
$ source /etc/profile
  • elasticsearch config
1
2
$ cd elasticsearch/config
$ vim elasticsearch.yml
1
2
3
4
5
6
7
8
9
10
11
node.name: node-1                          ##节点名称
path.data: /usr/local/elasticsearch/data ##数据存放路径
path.logs: /usr/local/elasticsearch/logs ##日志存放路径
bootstrap.memory_lock: true ##避免es使用swap交换分区
indices.requests.cache.size: 5% ##缓存配置
indices.queries.cache.size: 10% ##缓存配置
network.host: 192.168.80.128 ##本机IP
http.port: 9200 ##默认端口
cluster.initial_master_nodes: ["node-1"] ##设置符合主节点条件的节点的主机名或 IP 地址来引导启动集群
http.cors.enabled: true ##跨域
http.cors.allow-origin: "*"
  • 将当前用户软硬限制调大
1
$ vim /etc/security/limits.conf
1
2
3
4
es soft nofile 65535
es hard nofile 65537
es soft memlock unlimited
es hard memlock unlimited
  • 修改vm.max_map_count内存
1
vim /etc/sysctl.conf
1
vm.max_map_count=655360
1
sysctl -p

启动

1
2
3
$ su es
$ cd ../bin
$ ./elasticsearch -d

Read More


Powered by Hexo and Hexo-theme-hiker

Copyright © 2017 - 2025 青域 All Rights Reserved.

UV : | PV :