跳到主要内容

45 篇文档带有标签「programming-languages」

查看所有标签

`DataFrame` data structure

The DataFrame data structure is the heart of the Panda's library. It's a primary object used in data analysis and cleaning tasks. Conceptually, a DataFrame is a two-dimensional series object with an index and multiple columns of content, each column having a label. Essentially, a DataFrame is a two-axes labeled array.

`Series` Data Structure

A Series in pandas is a one-dimensional array-like object that can hold various data types, similar to a list in Python, but with additional features. It combines elements of lists and dictionaries, storing items in order and allowing access via labels (index).

Basics

R is a powerful language used for data analysis, statistical computing, and graphical representation. This guide will introduce the basic syntax, data structures, control structures, and functions in R, providing a solid foundation for beginners.

Best Practices for Python Type Conventions

Python is a dynamically typed language, but with the introduction of type hints (PEP 484) and the growing adoption of static type checkers like mypy, type annotations have become essential for writing clear, maintainable code. This guide covers the best practices for using type hints effectively in Python.

Conditionals

Python allows us to compare values using comparison operators, enabling us to make decisions based on these comparisons. This is fundamental for controlling the flow of a program.

Data Cleaning with Pandas

In this lecture, we covered a basic data cleaning process using pandas. We focused on cleaning a dataset containing a list of US presidents from Wikipedia. The key operations included importing the dataset, cleaning up names, and converting date formats. Below are the detailed steps and methods used.

Data Import in R

In the world of data analysis, most data is not created within R itself but comes from various data collection software, hardware, and channels such as Excel and the internet. This chapter focuses on how to import data into R to begin data analysis. Readers can either systematically go through the chapter or select topics based on their actual needs and time constraints.

Date Functionality

In today's lecture, we explored time series and date functionality in pandas. Manipulating dates and times in pandas is highly flexible, enabling us to conduct advanced analysis such as time series analysis.

Decorators

Python decorators are a very powerful and useful tool that allows programmers to modify the behavior of a function or class. Decorators allow for the extension or modification of the function's behavior without permanently modifying it. Here’s how to use and create decorators in Python:

Dictionaries

Dictionaries in Python are data structures that store data in key-value pairs, allowing efficient data retrieval and modification based on unique keys.

Grouping Data

Grouping data is an essential task for data analysis, allowing us to understand and manipulate data at a group level. Pandas provides the groupby() function to facilitate this, implementing the split-apply-combine pattern. This pattern involves splitting the data into groups, applying a function to each group, and then combining the results.

Idioms

In Python programming, certain idioms are considered more appropriate and efficient. Pandas, a sub-language within Python, has its own idioms, often referred to as "pandorable." These idioms improve code readability and performance. Here are some key features to make your code pandorable.

index

Introduction to Testing in Python

Indexing `DataFrame`

In pandas, both Series and DataFrame objects can have indices applied to them. An index serves as a row-level label, corresponding to axis zero. Indices can be autogenerated or explicitly set. This guide covers various methods for handling indices in pandas, including setting, resetting, and using multi-level indices.

Lambda Functions

Lambda functions in Python are small, anonymous functions defined by the lambda keyword. They can have any number of arguments but only one expression. The syntax is simple and intended for short functions that are convenient to use inside other functions, particularly those that require a simple function as an argument.

Lists

Lists in Python are versatile data structures that allow efficient manipulation and storage of a collection of items. Unlike basic data types like integers, floats, Booleans, and strings, lists can hold multiple items, which can be of different data types, all within a single variable.

Loops

Loops are fundamental constructs in programming that allow the execution of a block of code multiple times. They are essential for automating repetitive tasks, making code efficient and concise.

Missing Values

Missing values are common in data cleaning activities and can occur for various reasons. Understanding the nature and handling of missing data is crucial for accurate data analysis.

Object-oriented Programming

Object-Oriented Programming (OOP) is a programming paradigm that uses "objects" to design software. It allows developers to create classes that encapsulate data and functions, promoting code reusability and modularity.

Pivot Tables

A pivot table is a way of summarizing data in a DataFrame for a particular purpose. It makes heavy use of the aggregation function. A pivot table is itself a DataFrame where the rows represent one variable, the columns represent another, and the cells contain some aggregate value. Pivot tables often include marginal values, which are the sums for each column and row. This allows for an easy visual representation of the relationship between two variables.

Processing Log Files

Log file analysis is a crucial task in system administration and software development, allowing experts to monitor system activities, debug issues, and extract valuable information. This guide demonstrates how to parse log files using Python to count the occurrences of usernames in CRON job entries.

Querying `Series`

A Pandas Series can be queried either by index position or index label. If an index is not specified during querying, the position and the label are effectively the same values.

R

Official Site

Reading and Writing Files

Interacting with files is a fundamental aspect of programming, especially in automation and data processing tasks. Python provides robust capabilities to manipulate files and directories, making it a powerful tool for IT specialists and system administrators.

Recursion

Recursion is the repeated application of the same procedure to a smaller problem. It allows complex tasks to be broken down into simpler, more manageable sub-tasks.

regex

Regular expressions (regexes) are patterns used to match character combinations in strings. They are essential tools for text processing in programming and data science, allowing for:

Scales

Creating a DataFrame with Letter Grades

Strings

Strings are an essential data type in Python, used to represent text data. They are sequences of characters enclosed within single (') or double (") quotes.

Sympy

SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.

Unit Tests

Unittest provides developers with a set of tools to construct and run tests on individual components or units of code to ensure their correctness. By running unittests, developers can identify and fix bugs, creating more reliable code.